Is the agent loop the same as ReAct?

ReAct is one specific shape of the agent loop. The four moves (perceive, decide, act, observe) describe every agent loop. ReAct fits that pattern by interleaving the decide and act phases tightly. Plan-and-Execute fits it by doing one big decide up front and then a series of acts and observes. Both are agent loops; ReAct is one way to wire them.

Does the agent loop need a language model at every step?

No. The loop runs the model on perceive and decide. The act phase runs your code, not the model. The observe phase reads a tool result back into context for the next perceive. A common confusion is to think "every step is an LLM call." In practice, one loop iteration is usually one LLM call plus one tool call plus a small amount of bookkeeping.

What stops an agent loop from running forever?

Three controls. A step limit (hard ceiling on iterations). A cost limit (stop if token spend crosses a threshold). And repeat-call detection (stop if the same action fires with the same arguments more than twice). Production systems use all three. A loop that runs without any of them will, occasionally, run forever.

Where does memory fit in the loop?

Memory lives outside the loop and gets read or written through tools. Short-term memory is the conversation context itself. Long-term memory (a vector store, a database, a key-value store) is accessed via tools the agent can call during decide or observe. Keeping memory writes explicit (as tool calls, not as silent side effects) is what makes the loop debuggable.

Is the agent loop a new idea?

No. The perceive-decide-act-observe shape is decades old in robotics and control systems. What is new in 2025 to 2026 is that a language model can play the decide role on open-ended tasks, and that the act phase can be a tool call against software systems rather than a motor command. The loop itself is older than AI agents.

The Agent Loop, Explained: Perceive, Decide, Act, Observe

Key takeaways

An agent loop is four moves: perceive (read context), decide (the model picks the next action), act (your code runs the tool), and observe (append the result). Everything else about agents is built around this cycle.
The model only does the heavy lifting on decide. The act phase runs your code, not the model: the model emits a request to run a tool, and your executor runs it.
If it is not in the context, the agent does not know it. Perceive is the full input the model sees: the goal, the system prompt, the tool list, any memory, and every previous action and observation in the session.
A chatbot does only perceive and decide, then stops. That is the chatbot ceiling: it cannot take a step and then decide again with new information, which is exactly what the loop adds.
Most agent bugs are a failure of one phase: looping, context bloat, tool misuse, stale results, or cost runaway. Diagnosing an agent is almost always answering which phase broke.

Share LinkedIn · X · Email ·

An agent loop has four moves: perceive, decide, act, observe. The model reads its context, picks the next action, the action runs, the result comes back, and the loop runs again. Everything else about agents (tools, memory, planning, bounding) is built around this four-step cycle.

This post is the conceptual companion to Build an AI Agent From Scratch in Python, which implements the loop in working code. If you have read What is agentic AI?, this post is the next step.

The four moves

Each move is one job. Read them in order; the order matters.

Perceive. The agent reads its context. Context here means the full input the model will see on this turn: the user’s goal, the system prompt, the list of tools available, any memory the agent has access to, and the history of every previous action and observation in this session. Nothing else is “perceived.” If it is not in the context, the agent does not know about it.

Decide. The model takes the perceived context and produces the next action. The action is either a tool call (with structured arguments) or a final answer. The decision is made in one inference pass. This is the only step in the loop where the language model does the heavy lifting.

Act. Your code runs the action. If the model emitted a tool call, an executor looks the tool up in the registry, validates the arguments, runs the function, and captures the result. The model does not run the tool itself; it only emits the request to run it. This is a deliberate design.

Observe. The result of the action is appended to the context. The next perceive will see it. If the act phase produced an error (a malformed argument, a timeout, a tool that returned a string the model did not expect), the observation includes the error, and the next decide step gets to react to it.

Then the loop goes back to perceive, with the appended observation as new context. The cycle continues until the model emits a final answer or a stop condition fires.

The diagram

The agent loop. Goal enters at Perceive; answer leaves once Decide emits a final answer instead of an action.

Four boxes, four arrows, one cycle. That is the entire model.

A worked example

User goal: “the printer in conference room B is jammed.”

Turn 1.

Perceive: context contains the system prompt, the tool registry (lookup_room, search_kb, create_ticket), and the user’s message.
Decide: the model emits search_kb(query="printer jam").
Act: the executor runs the function, returns the troubleshooting article.
Observe: the result is appended to context.

Turn 2.

Perceive: context now includes the troubleshooting article.
Decide: the model sees that the article requires the printer’s model number, and emits lookup_room(room="Conference B").
Act: the executor returns {printer: "HP LaserJet 4250"}.
Observe: appended.

Turn 3.

Perceive: context now contains both the article and the printer model.
Decide: the model has enough information and emits a final answer with the resolution steps tailored to the HP LaserJet.
The loop stops because the decision was “final answer,” not “tool call.”

Three turns, three perceives, three decides, two acts (the final turn does not call a tool), two observes. The agent solved the task by composing two pieces of information that no single tool returned on its own.

What goes wrong without the loop

A plain chatbot does perceive and decide. It reads the user’s message and produces a reply. It cannot act. It cannot observe. It stops after one decide. That is why a chatbot cannot triage a printer jam: it can describe what to do, but it cannot look up the printer model, cannot fetch the right troubleshooting article, cannot create a ticket. The loop is what gives the system the ability to take a step and then make the next decision with new information.

This is the chatbot ceiling. You can prompt-engineer a chatbot to look like an agent for one or two specific tasks, but the moment the task requires the model to read something it did not have at turn one, the chatbot is stuck. The agent loop is what gets past that ceiling.

What goes wrong inside the loop

The loop is forgiving in shape and unforgiving in execution. The common failure modes:

Looping. The agent calls the same tool with the same arguments over and over. Usually because the tool returned something the model did not interpret, so it tries again, hoping for a different answer. Fix: a step limit and repeat-call detection that breaks the cycle and tells the model to try a different approach.

Context bloat. Every observation appends to the context. On a long session, the context fills up, the model loses sight of the earlier turns, and earlier system-prompt instructions silently drop. Fix: summarize old turns into a compressed memory, or use a tool to “remember” facts to an external store instead of letting them sit in the context.

Tool misuse. The model emits a call with wrong argument types or hallucinated parameters. Fix: strict argument schemas, validation at the executor, and a retry pattern where the validation error is sent back to the model as the observation so it can correct itself.

Stale results. A tool returns cached data the model does not know is stale, and the agent reasons against the wrong picture. Fix: explicit freshness in tool responses (a timestamp, a version) so the model can decide whether to trust the answer.

Cost runaway. A confused agent burns through tokens trying to recover. Fix: a hard cost limit that stops the loop and escalates to a human.

Every one of these is a failure of one specific phase of the loop, not of the whole concept. Diagnosing agent bugs is almost always answering “which phase broke.”

Where the loop ends

A loop that never ends is not an agent; it is a meltdown. The loop ends in one of three ways.

The model emits a final answer instead of a tool call. The decide phase produces a synthesis, the cycle stops, the answer goes to the user.

A stop condition fires. Step limit reached, cost limit reached, repeat detected, time budget exhausted, irreversible action requires human approval. The system returns control to a human or to a fallback path.

An error is unrecoverable. A tool the agent depends on is down and the agent has no alternative. The system fails gracefully, logs the trace, and tells the user what could not be done.

The pattern in all three: the loop has an explicit exit. It is not the model’s job to know when to stop on its own. The surrounding software bounds the loop. Without bounding, you do not have an agent, you have a token-spending process.

Why this is the only mental model you need

Every agent framework, every agent pattern, every paper on agentic AI, is a refinement of these four moves. ReAct is the loop with reasoning interleaved into each decide step. Plan-and-Execute is the loop with one big decide up front and a sequence of acts and observes. Multi-agent systems are multiple loops talking to each other through tools. Memory is a tool the loop reads from and writes to.

If you can hold the four moves in your head and reason about each, you can read any agentic-AI paper, evaluate any framework, and design any agent. The vocabulary changes; the loop does not.

Where to take this next

Build an AI Agent From Scratch in Python implements this loop in about 150 lines, in plain Python with the OpenAI SDK. Same four moves, real code.

Understanding Agentic AI Systems walks the loop one chapter at a time, building a single project (an internal IT helpdesk assistant) from a chatbot into a real agent. Same four moves, longer treatment, more worked examples.

For how this loop scales into production architecture (cost, latency, evaluation, security), the interview questions post is the densest single resource, and Designing Enterprise Agentic AI Systems is the long form.

For where the loop fits inside the broader 2026 shift in software engineering, the agentic coding mental model post sets the picture.

Share this post LinkedIn · X · Email ·

Frequently asked

Quick answers

Is the agent loop the same as ReAct?: ReAct is one specific shape of the agent loop. The four moves (perceive, decide, act, observe) describe every agent loop. ReAct fits that pattern by interleaving the decide and act phases tightly. Plan-and-Execute fits it by doing one big decide up front and then a series of acts and observes. Both are agent loops; ReAct is one way to wire them.
Does the agent loop need a language model at every step?: No. The loop runs the model on perceive and decide. The act phase runs your code, not the model. The observe phase reads a tool result back into context for the next perceive. A common confusion is to think "every step is an LLM call." In practice, one loop iteration is usually one LLM call plus one tool call plus a small amount of bookkeeping.
What stops an agent loop from running forever?: Three controls. A step limit (hard ceiling on iterations). A cost limit (stop if token spend crosses a threshold). And repeat-call detection (stop if the same action fires with the same arguments more than twice). Production systems use all three. A loop that runs without any of them will, occasionally, run forever.
Where does memory fit in the loop?: Memory lives outside the loop and gets read or written through tools. Short-term memory is the conversation context itself. Long-term memory (a vector store, a database, a key-value store) is accessed via tools the agent can call during decide or observe. Keeping memory writes explicit (as tool calls, not as silent side effects) is what makes the loop debuggable.
Is the agent loop a new idea?: No. The perceive-decide-act-observe shape is decades old in robotics and control systems. What is new in 2025 to 2026 is that a language model can play the decide role on open-ended tasks, and that the act phase can be a tool call against software systems rather than a motor command. The loop itself is older than AI agents.

End · 7 min read ← All posts