A Building Agentic AI

Blog / Beginner / When to Use an Agent Framework vs Build It Yourself (LangGraph, CrewAI, and More)

When to Use an Agent Framework vs Build It Yourself (LangGraph, CrewAI, and More)

The honest answer is not "always" or "never." Build the raw loop yourself until you hit a specific named pain, then adopt a framework to solve that pain, not on principle. A decision diagram, a map of the frameworks worth knowing (LangGraph, CrewAI, AG2, LlamaIndex, OpenAI Agents SDK, Pydantic AI, plus the AWS, Microsoft, and Google platforms) and what each is good at, and the signals that tell you which side of the line you are on.

Muhammad Arbab

Muhammad Arbab · 14 years shipping AI

· 12 min read · Beginner

Share LinkedIn · X · Email ·

The honest answer to “framework or build it yourself” is not “always” and it is not “never.” It is this: build the raw loop yourself until you hit a specific, nameable pain, then adopt a framework to solve that pain, and not before. A framework adopted on principle, or out of fear that hand-rolled code looks unserious, trades code you understand for abstractions you do not. A framework adopted to solve a real coordination, state, or scale problem you have already felt is one of the best decisions you can make.

And to be clear about the point of building it yourself: the raw loop is not a detour you throw away once you adopt a framework. It is where you isolate the assets you actually own, your tools, your prompts, your decision logic, so that reaching for a framework later is plugging that core into it rather than marrying your logic to it. Build first to understand and to own, not to avoid frameworks forever.

This post gives you the decision as a diagram, a map of the frameworks worth knowing and what each is genuinely good at, and the concrete signals that tell you which side of the line you are standing on. It is the capstone of the build-from-scratch cluster: if you have not yet built an agent from scratch in Python, wired up tool calling from first principles, or given your agent memory, read those first. This is the post where the abstractions those pieces deliberately avoided start to mean something. The framework chapter of Understanding Agentic AI Systems walks the same decision with a single running example.

The decision in one diagram

The whole rule fits on one page. You start from a working raw loop, you ask one question, and the answer routes you.

Working raw loop ~100 lines, you own it Can you name a specific pain the loop can't carry? Keep building it yourself readable beats abstract Adopt a framework for THAT pain, not on principle NO YES Pains that count: durable state · multi-agent handoffs · parallel fan-out · human checkpoints · tracing at scale · team velocity
The build-then-wrap decision. One question, two routes, but the same core underneath: you start from a loop you own, and a nameable pain decides whether a framework wraps it.

The trap is answering “yes” to the question before you have actually felt the pain. “We might need multi-agent coordination later” is not a pain. “Our three agents keep clobbering each other’s state and we have spent two days debugging it” is. Build until the pain is concrete, then name it, then adopt the smallest thing that solves the named pain.

What you are actually choosing between

Before the signals, it helps to know the terrain. “Framework” is not one thing. There are four kinds of choice in front of you, and they are not mutually exclusive.

  • Raw loop. You own the control flow. Model call, tool dispatch, append result, repeat. Maximum clarity, maximum responsibility.
  • Graph / state-machine frameworks. You describe the agent as nodes and edges with explicit, durable state. Strong when control flow is complex and a run must survive a crash.
  • Role-based orchestration. You describe a team of agents with roles that hand work between each other. Strong when the natural decomposition is several specialists, not one loop.
  • Managed cloud runtimes. You write the agent in some SDK and let a cloud platform handle autoscaling, persistence, identity, and observability. This is a deployment choice that increasingly sits underneath any of the above.

Here is the map of the names worth knowing in 2026, and what each is genuinely good at. Strengths only, because the point is to know where each one shines, not to rank them.

INDEPENDENT FRAMEWORKS LangGraph Durable, stateful graphs. Resumable runs + human-in-the-loop. CrewAI Role-based agent crews. Fast to assemble a team of specialists. AG2 Conversational multi-agent. Community successor to AutoGen. LlamaIndex Data- and RAG-centric agents. Retrieval-first over your own corpus. OpenAI Agents SDK Lightweight handoffs + guardrails. The production heir to Swarm. Pydantic AI Type-safe, structured-output-first. Pythonic and minimal. CLOUD-NATIVE AGENT PLATFORMS AWS Strands Agents SDK + Bedrock AgentCore. Open SDK, managed, framework-agnostic runtime. Microsoft Agent Framework (1.0). Unifies Semantic Kernel + AutoGen. .NET and Python, enterprise-grade. Google Agent Development Kit (ADK) + Agent Engine. Code-first, multi-model, multi-language. Strengths only. The point is to know where each one shines, not to rank them.
The agent framework landscape in mid-2026. Independent frameworks on top, the three hyperscaler platforms below.

A word on why the map already looks different from a year ago, because that is the most important thing it teaches. In early 2026 Microsoft folded AutoGen and Semantic Kernel into a single Agent Framework and moved both originals into maintenance mode. AG2 is the community fork that carries the open-source AutoGen lineage forward. OpenAI’s experimental Swarm became the production Agents SDK. The lesson is not which name won. It is that any specific framework you anchor on today may be renamed, merged, or deprecated within a year, which is exactly why the portable assets you want to protect are your loop, your tools, and your prompts, not your choice of framework.

The signals that a framework earns its cost

These are the nameable pains. When you hit one of them for real, the framework stops being overhead and starts paying for itself.

1. Durable, resumable state. Your agent runs for minutes or hours, and a crash, deploy, or timeout in the middle should not start it over from zero. Hand-rolling checkpointing, serialization, and resume-from-step is real systems work. This is LangGraph’s home turf, and the managed runtimes (Bedrock AgentCore, Agent Engine) sell exactly this.

2. Multi-agent handoffs. The work genuinely decomposes into specialists that pass results to each other: a planner, a researcher, a writer, a reviewer. Coordinating that by hand means inventing a message protocol, turn-taking, and shared state. Role-based frameworks (CrewAI, AG2) and the orchestration layers in the cloud platforms exist for this.

3. Parallel fan-out and join. You need to run ten tool calls or ten sub-agents at once and merge the results. Doing this safely in a raw loop means managing concurrency, partial failure, and result aggregation yourself. Graph frameworks model fan-out and join as first-class edges.

4. Human-in-the-loop checkpoints. A run needs to pause, surface a decision to a person, and resume with their input, sometimes hours later. That requires durable state plus an interrupt-and-resume mechanism. Building it from scratch is a project; LangGraph and the managed platforms ship it.

5. Observability and tracing at scale. Once you are running thousands of agent invocations a day, “read the logs” stops working. You need spans, replay, token accounting, and per-step latency across many runs. Frameworks and their attached platforms bring tracing you would otherwise build from nothing.

6. Team velocity and shared vocabulary. Five engineers all hand-rolling slightly different loops is its own cost. A framework gives a shared mental model, shared docs, and a hiring pool that already knows the abstractions. This signal is real, but it is the one most often used to justify a framework before any of the others are true. Be honest about whether it is the actual reason.

The signals that say keep building it yourself

The other half of the decision matters just as much, because the default in 2026 is to over-adopt.

  • Three or four tools and one loop. If your agent is a single perceive-decide-act-observe cycle over a handful of tools, a framework adds indirection you will fight every time you debug. The raw loop is the right answer and stays the right answer.
  • You still need to understand your own failures. Early on, the most valuable thing is being able to read every line of why the agent did what it did. Frameworks move that logic behind abstractions. Until you have internalized the loop, that opacity costs you more than the convenience saves.
  • Latency-sensitive paths. Every abstraction layer is overhead. If you are fighting for milliseconds, a thin hand-written loop you can profile beats a general-purpose framework you cannot.
  • Lock-in and abstraction debt. The more your business logic is written in a framework rather than called by one, the more expensive it is to leave, and the landscape above is proof that leaving is sometimes forced on you. The cheapest insurance is to keep tools and prompts as plain functions and data.

A note on the cloud platforms

The AWS, Microsoft, and Google entries are a slightly different decision from the open-source frameworks, and it is worth being precise about it. They are mostly selling you the runtime, not the agent. Bedrock AgentCore, Agent Engine, and Microsoft Foundry give you the managed parts most teams do not want to build: autoscaling, session and memory persistence, identity and auth, observability, and long-running execution.

The important 2026 detail is that these runtimes are increasingly framework-agnostic. Bedrock AgentCore will run an agent written in Strands, LangGraph, CrewAI, or your own loop. So the realistic question is usually not “framework or cloud platform.” It is two questions stacked: which framework (or raw loop) do I write the agent in, and which managed runtime, if any, do I deploy it on. If you are already deep in one cloud, the gravity of “the platform our data and identity already live in” is a legitimate input. Just keep the agent itself portable so that gravity is a choice and not a cage.

If you adopt one, keep the loop legible

When you do cross the line and adopt a framework, the goal is to wrap the loop, not surrender it. You should still be able to point at where the model decides and where your code acts. A few rules that keep a framework an asset instead of a fog:

  • Tools stay plain functions. A tool is a function with a schema. It should run, and be testable, with the framework removed. If your tools only make sense inside the framework, you have written business logic in a place you cannot easily leave.
  • Prompts stay version-controlled text. Not strings buried in framework config. They are the most important thing in the system; treat them as first-class, reviewed assets.
  • You can still trace a single turn. If you cannot follow one request through perceive, decide, act, and observe, the framework has hidden the one thing you most need to debug. That is a smell, not a feature.
  • Adopt the smallest piece that solves the named pain. Need durable state? Use the persistence layer; you do not have to buy the entire orchestration philosophy on day one.

The takeaway

There is no framework that is correct in the abstract, and there is no virtue in hand-rolling everything forever. The discipline is the same one this whole cluster has been building toward: understand the loop well enough that you can name exactly what a framework would do for you before you reach for it. Build the raw version first, both because it is the only way to feel the pains honestly and because it is where you isolate the tools, prompts, and logic you keep no matter which framework you eventually land on. That work is not throwaway. Then, when a pain is concrete and named, adopt the smallest thing that solves it, keep your loop and tools and prompts portable, and let the framework be a layer you could remove rather than a foundation you cannot.

The senior position is unglamorous: most agents need no framework for a surprisingly long time, the ones that do need it for one or two specific reasons you can say out loud, and the engineer who can name those reasons is worth more than the one who can recite framework feature lists. Know the map, know the pains, and let the pain pick the tool.

Share this post LinkedIn · X · Email ·

Frequently asked

Quick answers

Should I use a framework or build my agent from scratch?
Build the raw loop yourself first, then adopt a framework only when you can name the specific pain it solves. The raw version is about 100 lines and teaches you the control flow you will otherwise be debugging blind. Reach for a framework when you hit durable resumable state, multi-agent handoffs, parallel fan-out, human-in-the-loop checkpoints, or tracing at production scale. Adopting one before you have felt a real pain trades code you understand for abstractions you do not.
Is LangGraph or CrewAI better?
They solve different problems. LangGraph models an agent as an explicit graph with durable, resumable state and first-class human-in-the-loop, so it is strong when your control flow is complex and a run must survive a restart. CrewAI models a team of role-playing agents that hand work between each other, so it is strong when the natural decomposition is "a researcher, a writer, and an editor" rather than a single loop. Pick by the shape of your problem, not by popularity.
Do I need an agent framework at all?
No. A single-loop agent with three or four tools needs no framework at all, and forcing one in adds indirection you will fight every time you debug. The raw loop (model call, tool dispatch, append result, repeat) is the right starting point for most agents and stays the right answer for many of them. Frameworks earn their place when coordination, state, or scale grow past what a readable loop can carry.
What do the cloud agent platforms (AWS, Microsoft, Google) give me over an open-source framework?
Mostly the runtime, not the agent. AWS Bedrock AgentCore, Google Agent Engine, and Microsoft Foundry give you the managed parts most teams do not want to build: autoscaling, session and memory persistence, identity, observability, and long-running execution. They are increasingly framework-agnostic (Bedrock AgentCore runs Strands, LangGraph, CrewAI, and others), so the realistic 2026 choice is often "which open-source framework to write the agent in" plus "which managed runtime to deploy it on," not one or the other.
Will a framework lock me in?
Some lock-in is real and some is avoidable. The avoidable kind comes from letting framework abstractions leak into your business logic so that your prompts, tools, and control flow only make sense inside that framework. The durable kind comes from a managed cloud runtime that owns your state and identity layer. Keep your tools and prompts as plain functions and data the framework calls, rather than logic written in the framework, and most of the switching cost stays low.
The framework landscape keeps changing. How do I avoid betting on the wrong one?
Do not over-anchor on a specific framework, because the map moves fast. In early 2026 Microsoft folded AutoGen and Semantic Kernel into a single Agent Framework and put both originals in maintenance mode, AG2 became the community fork carrying the open-source AutoGen lineage, and OpenAI shipped Swarm as the production Agents SDK. The hedge is to keep the agent loop, the tools, and the prompts as portable assets you own, and treat the framework as a replaceable layer around them.
End · 12 min read ← All posts

Keep reading

Related posts

Beginner ·

Giving Your Agent Memory: A Minimal Implementation

"Memory" is one word for four different problems your agent has. The conversation buffer, summarization, episodic recall, semantic retrieval, and key-value preferences, each built from scratch in raw Python with no framework, plus the decision guide for which one you actually need.

Beginner ·

Tool Calling From First Principles (Before You Touch LangChain)

Function calling, demystified. The under-the-hood mental model of how a model "calls a tool," a 40-line runnable example with no framework, the four things that go wrong in production, and when reaching for a framework actually helps.