A Building Agentic AI

Blog / Enterprise / What to Automate First: A Leader's Framework for Agentic AI in 2026

What to Automate First: A Leader's Framework for Agentic AI in 2026

The question that stalls most agentic AI programs is not "can we build an agent." It is "which process first." A leader's framework for choosing the first project: four scoring criteria, one hard gate (blast radius), a 2x2 you can draw on a whiteboard, and a worked example scoring three real processes.

Muhammad Arbab

Muhammad Arbab · 14 years shipping AI

· 9 min read · Enterprise

Share LinkedIn · X · Email ·

The question that stalls most agentic AI programs is not “can we build an agent.” By 2026 the answer to that is almost always yes. The question that actually decides whether a program succeeds is “which process first,” and most organizations answer it badly. They pick the project that demos well in front of the board, not the one that teaches the organization how to ship.

Here is the honest answer, and it surprises most leaders: your first agent should not be the customer-facing showpiece. It should be the highest-volume internal task where a mistake is cheap, reversible, and easy to catch. Project one is not where you capture the most value. It is where you buy the two things every later project depends on: organizational trust that the technology works, and an evaluation baseline you can measure everything else against.

This piece gives you a framework for choosing: four criteria to score candidate processes on, one hard gate that can veto a high-scoring project, a 2x2 you can draw on a whiteboard, and a worked example scoring three real processes head to head. It sits one level up from the engineering question of when to build an agent yourself versus reach for a framework; this is the portfolio decision a leader makes before any of that. If you are still firming up what counts as an agent in the first place, start with What Counts as Agentic AI and What Does Not. The selection logic here is drawn out at length, with the enterprise controls, in Designing Enterprise Agentic AI Systems.

Why most teams pick the wrong first project

Three forces pull organizations toward the wrong first agent, and all three feel reasonable in the moment.

The first is demo gravity. A vendor or an internal team shows a slick customer-facing agent, the room is impressed, and that becomes the project. But “impressive in a demo” and “safe to learn on” are nearly opposite properties. The demo is chosen to dazzle; the first project should be chosen to de-risk.

The second is highest-value-first thinking. It feels disciplined to start with the project with the biggest ROI. The problem is that the highest-value processes are usually also the highest-stakes ones, the places where an early-stage mistake is most expensive. You do not learn to drive in heavy traffic.

The third is the saga instinct: automating an entire end-to-end workflow at once because automating one step “doesn’t feel worth it.” This multiplies the failure modes and makes it impossible to tell which part broke when something does. Start with a step, not a saga.

The framework below is the antidote to all three.

The four criteria

Score each candidate process from 1 (poor first project) to 5 (ideal first project) on four axes. These measure how much you will learn and how safely you will learn it.

CriterionThe question you are askingWhy it matters
Volume / frequencyHow often does this task run? Daily and many times, or rarely?High volume means more value per point of reliability, and more data to evaluate against.
Value per runHow much time or cost does automating one instance save?Sets the ceiling on payoff, but it is the least important of the four for a first project.
Feedback availabilityCan you tell, soon and clearly, whether the agent did it well?Without a ground truth or a fast human check, you cannot build the eval harness everything needs.
Scope boundednessAre the inputs, the goal, and “done” well-defined, or open-ended?Bounded tasks are testable and predictable. Open-ended ones hide failure modes you find in prod.

Notice that value per run is on the list but flagged as least important for the first project. That is deliberate, and it is the part leaders resist most. On project one you are not optimizing for return. You are optimizing for a clean, measurable, low-stakes win that proves the capability and produces an eval baseline. The return comes later, on the projects this one earns you.

The gate: blast radius

The four criteria produce a score. Blast radius is not part of that score. It is a separate gate that can veto a high-scoring project outright.

Blast radius is what happens when the agent is wrong: how much damage one bad action causes, and how hard it is to undo. A task that drafts an internal summary for a human to read has a small blast radius; the human catches the error and nothing happened. A task that autonomously moves money, sends a message to a customer, or changes a record in a regulated system has a large one; the error has already landed before anyone reviews it, and reversing it ranges from awkward to impossible.

The rule is simple and strict: you do not learn on a system where the failures are expensive. A process can score 5 on every criterion and still be the wrong first project if its blast radius is high. Those projects are not off the table forever. They are off the table until you have an eval harness, a track record, and the human-in-the-loop controls to constrain them, all of which your low-blast-radius first project is designed to produce.

The 2x2

Plot your candidates on two axes: volume on the vertical, blast radius on the horizontal. The quadrant tells you what to do.

START HERE high volume, mistakes are cheap Do it, but keep a human in the loop earn this one later Fine, but low payoff not worth being first Avoid not a first project VOLUME / FREQUENCY BLAST RADIUS IF WRONG high low low high
Choose your first agentic project from the gold quadrant: high volume, low blast radius.

The gold quadrant is where you start. High volume gives you enough repetitions to measure reliability and enough value to matter. Low blast radius means the inevitable early mistakes are caught and cheap. The top-right quadrant (high volume, high blast radius) is where the big prizes usually live, which is why teams reach for it first; the discipline is to get there second, with a human in the loop and an eval harness you built on the gold quadrant. The bottom two quadrants are not where a program should start: low volume means too little to learn from, and the bottom-right combines low learning with high risk.

A worked example

Three processes a mid-size company might consider. Score them on the four criteria, then apply the gate.

ProcessVolumeValueFeedbackBoundedBlast radius (gate)Verdict
Draft first-pass replies to inbound support tickets5344Low (human sends)Start here
Reconcile vendor invoices against POs, flag mismatches4455Low (flags only)Strong second candidate
Autonomously issue customer refunds3533High (moves money)Defer or constrain with controls

The refund agent has the highest value per run, and that is exactly the trap. It moves money irreversibly, so its blast radius is high; one confident mistake is a real loss and a compliance question. It fails the gate regardless of its value score. The support-reply drafter, by contrast, is unglamorous: it just writes a first draft a human edits and sends. But it runs constantly, the human review gives you immediate feedback (did they keep the draft or rewrite it?), and a bad draft costs nothing because it never reaches the customer unreviewed. That is the project that teaches your organization how to build, measure, and trust an agent. Invoice reconciliation is the natural second move: still low blast radius because it only flags for a human, but with crisp ground truth that makes the eval harness almost write itself.

Sequence: a step, not a saga, and trust before value

Two sequencing rules follow from all of this.

Automate a step, not a saga. The instinct to automate the whole end-to-end process first is the most expensive mistake in the list, because it stacks every failure mode into one system you cannot debug. Take the highest-volume single step, automate that, prove it, then extend to the next step. Breadth is earned, not assumed.

Buy trust before you chase value. The sequence across projects is not “biggest ROI first.” It is: a low-stakes high-volume win to prove the capability and build the eval harness, then a second low-blast-radius project to confirm it was not luck, and only then the high-value, high-blast-radius projects, now approached with controls and a measurement baseline you actually have. Organizations that invert this, that start with the refund agent because the ROI looked best, are the ones whose agentic programs stall after one bad incident and a loss of executive confidence.

The takeaway

The first agentic project is a learning investment disguised as a delivery. Its job is to convert a vague organizational hope (“AI can help us”) into two concrete assets: a team that has shipped and measured a real agent, and an evaluation harness that makes the next project safer. Pick it accordingly. Score candidates on volume, value, feedback, and boundedness; veto anything with a high blast radius no matter how it scores; start in the gold quadrant; automate a step before a saga; and let the boring, high-volume, low-stakes task be the one that earns you the right to attempt the exciting one.

The leaders who get agentic AI into production are rarely the ones who picked the most ambitious first project. They are the ones who picked the most learnable one, and used it to buy their way to the ambitious ones with evidence in hand.

Share this post LinkedIn · X · Email ·

Frequently asked

Quick answers

What should my first agentic AI project be?
The highest-volume task where a mistake is cheap, reversible, and easy to catch, not the flashiest customer-facing one. Your first project should be chosen to buy two things: organizational trust that the technology works, and an evaluation baseline you can measure later projects against. A high-volume internal task with a human reviewing the output before it acts is close to ideal. Maximum business value comes from project three or four, once you have proven you can ship and measure one.
Should I start with a customer-facing agent?
Usually no. Customer-facing agents are the most visible and the most tempting, which is exactly why they are the wrong first project. They combine high blast radius (a wrong answer reaches a customer immediately and is hard to take back) with the highest scrutiny, so a normal early-stage stumble becomes a reputational event. Prove the capability on an internal, reviewable task first, then move outward once you have an eval harness and a track record.
How do I know if a process suits an agent rather than simple automation or RPA?
If the task is fully deterministic with fixed rules and stable inputs, traditional automation or RPA is cheaper and more reliable than an agent. Agents earn their place when the task needs judgment over messy or unstructured inputs, when the steps vary case to case, or when it requires pulling together tools and context dynamically. A good test: if you could write the rules out completely in a flowchart, you probably do not need an agent. If a competent new hire would need judgment and a few tools, you might.
What is blast radius and why does it veto high-value projects?
Blast radius is what happens when the agent is wrong: how much damage one bad action causes and how hard it is to reverse. It is a gate, not a score, because it can override everything else. A task can be high-volume, high-value, and well-understood, and still be a bad first project if a single error moves money, sends an irreversible message, or breaks a regulated process. You do not learn on a system where the failures are expensive. Pick a first project where being wrong is cheap, then earn your way up to the high-blast-radius ones with controls in place.
How many processes should I automate at once?
One, and a single step of it rather than the whole workflow. The most common early mistake is trying to automate an entire end-to-end process (the "saga") on the first attempt, which multiplies the failure modes and makes it impossible to tell which part broke. Automate the highest-volume step first, prove it, then extend. Breadth comes after you have one working, measured, trusted automation, not before.
End · 9 min read ← All posts

Keep reading

Related posts

Enterprise ·

Why Voice AI Agents Are Harder Than Chatbots

A working chatbot rarely survives the jump to a phone line. Why voice agents are harder: latency as a hard budget, barge-in, ASR errors, silence that means something, emotion, and real-time handoff.

Enterprise ·

Agent Failure Modes Interviewers Probe: Tool Misuse, Loops, Prompt Injection

Senior interviewers do not ask how agents work. They ask how agents break. The seven failure modes that decide most agentic AI system-design rounds in 2026, the follow-up questions interviewers actually use, the structured answer template, and the two failures that get candidates rejected when missed.