Do I need GPT-4 or can I use a cheaper model?

A small model is fine for an agent that calls two or three tools. The example here uses gpt-4o-mini and works well. The model decides which tool to call; that decision does not need a frontier reasoning model. The main reason to reach for a bigger model is reliability on tricky multi-step tasks where the planner has to keep the goal in mind across more than a handful of steps.

Will this work with Claude or other providers?

Yes. The pattern is identical. Claude calls its tool feature "tool use" instead of "function calling," and the request shape is slightly different (an array of content blocks rather than a separate field), but the loop is the same: prompt, decide, run a tool, append the result, loop. If you want to swap providers later, isolate the call behind a small adapter function.

Why not use LangChain or LangGraph for this?

For learning, frameworks hide the thing you are trying to learn. Once you have written an agent loop yourself, a framework saves typing and gives you observability and a few opinionated patterns for free. Before that, frameworks make it harder to know where a bug is or what your agent is actually doing.

How do I handle errors when the model returns malformed function calls?

Validate the arguments against the tool schema. On failure, send the validation error back to the model as the tool result, with a note that the call failed because of the schema. Model gets to retry with the feedback. This single pattern handles most of the structured-output failures you will see in production.

How do I stop the agent from looping forever?

A max-steps limit plus repeat-call detection covers ninety-nine percent of it. Track every tool call with its arguments, and if the same call repeats more than twice with the same arguments, return an error to the model that breaks the loop. The example in this post includes both.

Build an AI Agent From Scratch in Python, No Framework

Key takeaways

A real agent is about 150 lines of plain Python with the OpenAI SDK: the agent loop, two typed tools, conversation memory, and bounding logic, no LangChain or CrewAI.
The agent loop has four moves: send the messages, check whether the model wants a tool, run the tool, append the result, and repeat until the model returns a final answer or the step budget runs out.
You describe the tools; the model picks the order. Given a user-lookup tool and a knowledge-base tool, it sequences them on its own without you hardcoding the steps.
The cheapest memory is the conversation history itself: persist the same messages list between turns. It holds up until the conversation gets long, at which point you add summarization or a vector store.
Two small guardrails move a toy agent toward production: a max-steps limit plus repeat-call detection to break infinite loops, and a try-except that feeds a bad tool call back to the model as feedback instead of crashing.

Share LinkedIn · X · Email ·

Most “build an AI agent” tutorials hand you a framework, and the framework hands you a finished agent. You learn the framework. You do not learn how an agent works.

This post does the opposite. We build a real agent in about 150 lines of plain Python, using nothing but the OpenAI SDK. By the end you will have the agent loop, two typed tools, conversation memory, and the bounding logic that keeps it from running forever. You will be able to swap the provider, swap the tools, or move the loop into any framework later, because you will know what the framework was doing.

We will follow the running example from Understanding Agentic AI Systems: an internal IT helpdesk assistant that grows from a chatbot into a real agent.

What we are building

A small helpdesk assistant that answers questions like “I’m user U-123 and my laptop will not connect to wifi.” To do that well, it has to:

Look up the user’s record (to know their device, location, recent tickets).
Search a tiny knowledge base for the right troubleshooting steps.
Combine the two into a coherent reply.

A chatbot cannot do this because the answer is not in one place. An agent can, because it picks the next action based on what it has seen so far.

We will build the agent in six steps, each adding one capability.

Setup

Install the SDK:

pip install openai

Set the API key in your environment:

export OPENAI_API_KEY="sk-..."

All the code below lives in a single file, agent.py. We will grow it step by step.

Step 1: A chatbot, no tools

Start with the simplest possible thing: ask the model once, get an answer.

from openai import OpenAI

client = OpenAI()
MODEL = "gpt-4o-mini"

def chat(user_message: str) -> str:
    response = client.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": "You help users with IT support."},
            {"role": "user", "content": user_message},
        ],
    )
    return response.choices[0].message.content

print(chat("I am user U-123 and my laptop will not connect to wifi."))

The model will respond with generic troubleshooting because it has no access to U-123’s record and no access to the company’s knowledge base. It is friendly but useless. That is the chatbot ceiling.

Step 2: Define one tool

A tool is a Python function the model is allowed to call. We give the model the function’s name, description, and parameter schema. The model decides when to call it.

KNOWLEDGE_BASE = {
    "wifi": "Restart the wifi adapter: settings, network, disable, re-enable. If still failing, forget the network and rejoin.",
    "vpn": "Reinstall the corporate VPN client from the internal portal. Restart laptop after install.",
    "printer": "Ensure printer queue is unblocked. Cancel stuck jobs, then re-add the printer from the network printer list.",
}

def search_kb(query: str) -> str:
    """Search the internal knowledge base."""
    query = query.lower()
    for keyword, entry in KNOWLEDGE_BASE.items():
        if keyword in query:
            return entry
    return "No matching knowledge base entry."

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_kb",
            "description": "Search the internal IT knowledge base for a troubleshooting article.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "What the user is having trouble with, in a few words.",
                    },
                },
                "required": ["query"],
            },
        },
    },
]

TOOL_FUNCTIONS = {"search_kb": search_kb}

The model now has the option to call search_kb. It still cannot, because we have not given it the loop that lets it act on that choice. That is the next step.

Step 3: The agent loop

This is the heart of the post. The loop has four moves: send messages to the model, see if it wants to call a tool, call the tool, append the result, and repeat until the model has a final answer.

import json

def run_agent(user_message: str, max_steps: int = 8) -> str:
    messages = [
        {"role": "system", "content": "You help users with IT support. Use tools when needed."},
        {"role": "user", "content": user_message},
    ]

    for step in range(max_steps):
        response = client.chat.completions.create(
            model=MODEL,
            messages=messages,
            tools=TOOLS,
        )
        message = response.choices[0].message
        messages.append(message)

        if not message.tool_calls:
            return message.content  # Final answer, no more tool calls.

        for call in message.tool_calls:
            name = call.function.name
            args = json.loads(call.function.arguments)
            result = TOOL_FUNCTIONS[name](**args)
            messages.append({
                "role": "tool",
                "tool_call_id": call.id,
                "content": result,
            })

    return "Step budget exceeded without a final answer."

That is the agent loop. Read it twice. Everything else in this post is a refinement of this twenty-line function.

The loop terminates two ways: the model returns a message with no tool calls (the final answer), or we hit the step budget. Without that budget, a confused model could loop forever; we will tighten this later.

Try it:

print(run_agent("I am user U-123 and my laptop will not connect to wifi."))

The model will call search_kb("wifi"), get the wifi troubleshooting steps back, and write a focused reply. It is no longer a chatbot. It picked an action, took it, observed the result, and continued.

Step 4: A second tool

The user mentioned “U-123” but we did nothing with that. Add a user lookup tool.

USERS = {
    "U-123": {"name": "Alex Chen", "laptop": "MacBook Pro M3", "location": "London"},
    "U-456": {"name": "Sam Patel", "laptop": "ThinkPad X1", "location": "New York"},
}

def lookup_user(user_id: str) -> str:
    """Look up a user record by their employee ID."""
    user = USERS.get(user_id)
    if not user:
        return f"No user found with id {user_id}."
    return json.dumps(user)

TOOLS.append({
    "type": "function",
    "function": {
        "name": "lookup_user",
        "description": "Look up an employee by their user ID (format: U-NNN).",
        "parameters": {
            "type": "object",
            "properties": {
                "user_id": {
                    "type": "string",
                    "description": "The user ID, e.g. U-123.",
                },
            },
            "required": ["user_id"],
        },
    },
})
TOOL_FUNCTIONS["lookup_user"] = lookup_user

Run the agent again. With two tools and the same user message, the model will typically call lookup_user("U-123") first, then search_kb("wifi connection"), and synthesize a personalized reply that knows the user is on a MacBook in London. The order is not hardcoded by us. The model picked it.

That is the part that surprises people the first time. We never told the model to look up the user before searching the KB. We described both tools. It figured out the sequence on its own. That is the agent loop earning its name.

Step 5: Memory

Right now each call to run_agent starts fresh. For a real assistant, you want to keep the conversation going. The simplest memory is the same messages list we already use, persisted between turns.

class Agent:
    def __init__(self):
        self.messages = [
            {"role": "system", "content": "You help users with IT support. Use tools when needed."},
        ]

    def chat(self, user_message: str, max_steps: int = 8) -> str:
        self.messages.append({"role": "user", "content": user_message})

        for step in range(max_steps):
            response = client.chat.completions.create(
                model=MODEL,
                messages=self.messages,
                tools=TOOLS,
            )
            message = response.choices[0].message
            self.messages.append(message)

            if not message.tool_calls:
                return message.content

            for call in message.tool_calls:
                name = call.function.name
                args = json.loads(call.function.arguments)
                result = TOOL_FUNCTIONS[name](**args)
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": result,
                })

        return "Step budget exceeded."

Now you can have a follow-up conversation:

agent = Agent()
print(agent.chat("I am user U-123 and my laptop will not connect to wifi."))
print(agent.chat("What about my VPN, is that part of the same fix?"))

The second message inherits the context of the first, including the tool results. This is the cheapest possible memory: full conversation history in the context window. It works until the conversation gets long, at which point you need summarization or a vector store for older turns. We will not build that here; the interview questions post covers what production memory looks like (short-term + long-term + explicit writes).

Step 6: Bounding

The agent can already do useful work, but it can also misbehave. Two failure modes are common enough to defend against in this tutorial. Looping (the agent calls the same tool with the same arguments forever) and overspending (the agent burns through its step budget on a confused task).

Step budget is already in place via max_steps. Add repeat-call detection:

class Agent:
    def __init__(self):
        self.messages = [
            {"role": "system", "content": "You help users with IT support. Use tools when needed."},
        ]
        self._recent_calls: list[tuple[str, str]] = []

    def _repeat_detected(self, name: str, args_json: str) -> bool:
        signature = (name, args_json)
        recent = self._recent_calls[-3:]
        self._recent_calls.append(signature)
        return recent.count(signature) >= 2

    def chat(self, user_message: str, max_steps: int = 8) -> str:
        self.messages.append({"role": "user", "content": user_message})

        for step in range(max_steps):
            response = client.chat.completions.create(
                model=MODEL,
                messages=self.messages,
                tools=TOOLS,
            )
            message = response.choices[0].message
            self.messages.append(message)

            if not message.tool_calls:
                return message.content

            for call in message.tool_calls:
                name = call.function.name
                args_json = call.function.arguments
                if self._repeat_detected(name, args_json):
                    self.messages.append({
                        "role": "tool",
                        "tool_call_id": call.id,
                        "content": "Repeated call detected. Stop calling this tool and try a different approach or return a final answer.",
                    })
                    continue
                try:
                    args = json.loads(args_json)
                    result = TOOL_FUNCTIONS[name](**args)
                except Exception as e:
                    result = f"Tool call failed: {e}. Check the arguments and try a different approach."
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": str(result),
                })

        return "Step budget exceeded."

Two changes. First, _repeat_detected tracks the last three tool calls and feeds the model a stop-and-rethink message if the same call is about to fire a third time. Second, the tool call itself is wrapped in a try-except so a bad argument schema becomes feedback to the model rather than a crash. These are the kind of small, undramatic guardrails that take a toy agent into something you would actually deploy.

The complete agent

Here is the whole thing in one file, around 150 lines:

import json
from openai import OpenAI

client = OpenAI()
MODEL = "gpt-4o-mini"

# ---------- Tools ----------

KNOWLEDGE_BASE = {
    "wifi": "Restart the wifi adapter: settings, network, disable, re-enable. If still failing, forget the network and rejoin.",
    "vpn": "Reinstall the corporate VPN client from the internal portal. Restart laptop after install.",
    "printer": "Ensure printer queue is unblocked. Cancel stuck jobs, then re-add the printer from the network printer list.",
}

USERS = {
    "U-123": {"name": "Alex Chen", "laptop": "MacBook Pro M3", "location": "London"},
    "U-456": {"name": "Sam Patel", "laptop": "ThinkPad X1", "location": "New York"},
}

def search_kb(query: str) -> str:
    """Search the internal knowledge base."""
    query = query.lower()
    for keyword, entry in KNOWLEDGE_BASE.items():
        if keyword in query:
            return entry
    return "No matching knowledge base entry."

def lookup_user(user_id: str) -> str:
    """Look up an employee record by their user ID."""
    user = USERS.get(user_id)
    if not user:
        return f"No user found with id {user_id}."
    return json.dumps(user)

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_kb",
            "description": "Search the internal IT knowledge base.",
            "parameters": {
                "type": "object",
                "properties": {"query": {"type": "string"}},
                "required": ["query"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "lookup_user",
            "description": "Look up an employee by user ID (format: U-NNN).",
            "parameters": {
                "type": "object",
                "properties": {"user_id": {"type": "string"}},
                "required": ["user_id"],
            },
        },
    },
]

TOOL_FUNCTIONS = {"search_kb": search_kb, "lookup_user": lookup_user}

# ---------- Agent ----------

SYSTEM = "You help users with IT support. Use tools when needed."

class Agent:
    def __init__(self):
        self.messages = [{"role": "system", "content": SYSTEM}]
        self._recent_calls: list[tuple[str, str]] = []

    def _repeat_detected(self, name: str, args_json: str) -> bool:
        signature = (name, args_json)
        recent = self._recent_calls[-3:]
        self._recent_calls.append(signature)
        return recent.count(signature) >= 2

    def chat(self, user_message: str, max_steps: int = 8) -> str:
        self.messages.append({"role": "user", "content": user_message})

        for step in range(max_steps):
            response = client.chat.completions.create(
                model=MODEL,
                messages=self.messages,
                tools=TOOLS,
            )
            message = response.choices[0].message
            self.messages.append(message)

            if not message.tool_calls:
                return message.content

            for call in message.tool_calls:
                name = call.function.name
                args_json = call.function.arguments
                if self._repeat_detected(name, args_json):
                    self.messages.append({
                        "role": "tool",
                        "tool_call_id": call.id,
                        "content": "Repeated call detected. Try a different approach or return a final answer.",
                    })
                    continue
                try:
                    args = json.loads(args_json)
                    result = TOOL_FUNCTIONS[name](**args)
                except Exception as e:
                    result = f"Tool call failed: {e}."
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": call.id,
                    "content": str(result),
                })

        return "Step budget exceeded."

# ---------- Demo ----------

if __name__ == "__main__":
    agent = Agent()
    print(agent.chat("I am user U-123 and my laptop will not connect to wifi."))

That is a working agent. It picks tools, runs them, reads the results, decides what to do next, keeps memory across turns, and refuses to loop forever. Everything an “agent framework” gives you, you now have in one file.

What we did not do, deliberately

This agent is intentionally small. Here is what a production system would add, and why we left each out of this post.

Long-term memory. We kept everything in the context window. A real system uses an external store (vector or structured) so the agent can recall past sessions without burning context tokens. Pattern is the same: writes happen through tools, reads happen through tools.

Streaming responses. We block until the final answer. In production, you would stream tokens to the user so the experience feels responsive even on long chains. The SDK supports it; we skipped it to keep the loop readable.

Cost and latency monitoring. We track nothing. Real systems log token counts, costs, latencies, and tool error rates per session, and alert when any of them drift.

Multiple models. We use one model for everything. Real systems route between a small cheap model for tool-selection and a larger model for synthesis, with explicit fallback rules.

Parallel tool calls. The OpenAI API supports them. Our loop runs tool calls serially. For two-tool problems that does not matter; for ten-tool problems it would.

Prompt registry. Our system prompt is hardcoded. In production, prompts live in a versioned store so you can roll them out and roll them back independent of code.

All of these are covered in the interview questions post and in depth in Designing Enterprise Agentic AI Systems. The point of this tutorial is to make sure the foundation under all of them is solid first.

When you actually need a framework

You do not need a framework to build agents. You may want one when:

You are running many agents concurrently and want shared observability (LangGraph, OpenAI’s Agents SDK).
You want a stateful workflow with persistence built in (LangGraph, Temporal-style state machines).
You want pre-built memory backends, vector store integrations, or a prompt registry without writing them.
Your team has more than two people working on the agent and you want a shared vocabulary for nodes and edges.

You probably do not need one when:

You are learning. Frameworks hide the very thing you want to learn.
The agent is small and embedded in a larger application. The 150 lines above are smaller than the framework’s import surface.
You want full control over the loop because you are doing something the framework opinions against.

A useful test: write the agent loop yourself first. Once it works, evaluate whether a framework would save you more code than it costs in lock-in.

Where to take this next

The beginner book takes this same example and grows it chapter by chapter into a secured, evaluated, multi-step agent. Each chapter adds one capability with the same do-it-yourself approach as this post.

For the production patterns this tutorial deliberately skipped (memory tiers, evaluation, security, cost), the interview questions post is the densest single resource, and Designing Enterprise Agentic AI Systems is the long form.

If you want to see how this same loop ends up looking when one developer uses an agent to write the code, the agentic coding mental model and the GitHub Copilot practical guide cover that side.

Build the 150-line version once. After that, every framework on the market will read like a paraphrase of code you already understand.

Share this post LinkedIn · X · Email ·

Frequently asked

Quick answers

Do I need GPT-4 or can I use a cheaper model?: A small model is fine for an agent that calls two or three tools. The example here uses gpt-4o-mini and works well. The model decides which tool to call; that decision does not need a frontier reasoning model. The main reason to reach for a bigger model is reliability on tricky multi-step tasks where the planner has to keep the goal in mind across more than a handful of steps.
Will this work with Claude or other providers?: Yes. The pattern is identical. Claude calls its tool feature "tool use" instead of "function calling," and the request shape is slightly different (an array of content blocks rather than a separate field), but the loop is the same: prompt, decide, run a tool, append the result, loop. If you want to swap providers later, isolate the call behind a small adapter function.
Why not use LangChain or LangGraph for this?: For learning, frameworks hide the thing you are trying to learn. Once you have written an agent loop yourself, a framework saves typing and gives you observability and a few opinionated patterns for free. Before that, frameworks make it harder to know where a bug is or what your agent is actually doing.
How do I handle errors when the model returns malformed function calls?: Validate the arguments against the tool schema. On failure, send the validation error back to the model as the tool result, with a note that the call failed because of the schema. Model gets to retry with the feedback. This single pattern handles most of the structured-output failures you will see in production.
How do I stop the agent from looping forever?: A max-steps limit plus repeat-call detection covers ninety-nine percent of it. Track every tool call with its arguments, and if the same call repeats more than twice with the same arguments, return an error to the model that breaks the loop. The example in this post includes both.

End · 12 min read ← All posts