When to Use an Agent and When to Use a Pipeline

Agent Systems agents, architecture

Agentic loops cost more, fail strangely and resist debugging. The honest test for whether your problem needs an agent or just a deterministic pipeline.

  • By Orzed Team
  • 6 min read
Key takeaways
  • A pipeline is the right answer when you can write down the steps in order before runtime.
  • An agent is the right answer when the right next step depends on what the previous step returned.
  • Cost ratio: an agent typically costs 4 to 10 times more than the equivalent pipeline.
  • Debugging an agent is expensive. Debugging a pipeline is normal engineering.

A team we worked with had built an agent system for a job that, after we audited it, turned out to be a six-step pipeline. The agent took 11 to 28 seconds per request, cost roughly 2 cents per call, and produced inconsistent outputs the QA team had to spot-check. The equivalent pipeline (an LLM call for classification, three deterministic API calls, an LLM call for the final summary) took under 3 seconds, cost 0.4 cents, and produced outputs the QA team stopped reviewing after a week because they were boringly consistent.

The team had reached for an agent because “agentic” sounded sophisticated. The work was not agent-shaped. This piece is about how to know which shape your problem is and how to make the right call before you write the orchestration loop.

What an agent actually does

An agent is a loop. The loop is roughly:

  1. Look at the current state and the goal
  2. Decide what to do next (which tool to call, what to ask the user, when to stop)
  3. Execute the chosen action
  4. Update state with the result
  5. Go to step 1, or terminate

The agent is the LLM that runs step 2. Everything else is plumbing. The agent is justified when step 2 genuinely requires a model: when the right next action cannot be determined by deterministic rules over the current state.

A pipeline is a fixed sequence of steps. Step 1 always runs first, then step 2, then step 3. Some steps may include LLM calls (summarise this, classify that), but the sequence and routing are decided by the engineer at design time, not by a model at runtime.

The crucial difference: a pipeline’s behaviour is enumerable. You can list the possible execution paths. An agent’s behaviour is not enumerable; it depends on what the model decides at each step.

The four-question test

Walk these in order. The first “yes” picks the architecture.

1. Can the steps be listed in advance? If you can write down “first do A, then B, then C”, you have a pipeline. The fact that some of the steps use LLM calls does not make it an agent. A pipeline with three LLM steps is still a pipeline.

2. Does the next step depend on intermediate results in a way that cannot be enumerated? If the answer is “if the result has property X, do Y, else do Z”, that is still a pipeline (with a branch). If the answer is “I need a model to decide what to do based on whatever the last step returned, and the space of possible decisions is large”, you have an agent.

3. Is the input space bounded? If users can ask anything within a known domain (refunds, shipping inquiries, product questions), you have a routable problem. Route to the right pipeline. If users can ask anything within an open domain (research, exploration, debugging an arbitrary system), you have an agent problem.

4. Is the cost or latency budget tight enough that an agent is impractical? Agents are 4 to 10 times more expensive per problem solved than the equivalent pipeline, mostly because of the orchestration overhead (multiple model calls per request, larger context windows, retries on tool failures). If the budget cannot tolerate that ratio, the answer is pipeline regardless of what the problem looks like.

QuestionPipelineAgent
Steps listable in advance?YesNo
Next step depends on enumerable property?YesNo
Input space bounded?YesNo
Cost / latency budget allows 5-10x overhead?EitherRequired

What pipelines win at

Determinism. Same input, same output. The QA team stops spot-checking after the first week because there are no surprises.

Cost. A pipeline call is a fixed number of operations. An agent call is variable, often expensive.

Latency. Pipelines run as fast as their slowest step. Agents run for as long as the loop takes, which can be unpredictable.

Debugging. When a pipeline fails, the failing step is identifiable. When an agent fails, you replay the loop and read its reasoning, then form a hypothesis, then try a fix that may not work.

Observability. Pipeline metrics are step-level: how long did each step take, how often does each step fail. Agent metrics are turn-level and tend to be a tangled mess unless designed deliberately.

Modifiability. Pipelines change predictably; you edit a step. Agents change unpredictably; you tweak a prompt and the entire loop’s behaviour shifts.

What agents win at

Open-ended exploration. The user wants to investigate an unfamiliar codebase. A pipeline cannot enumerate the steps because the steps depend on what the codebase contains.

Adaptive workflows. The right next action depends on what was found in the previous step, and the space of possible findings is too large to enumerate.

Multi-modal reasoning. The agent might need to call a calculator, then a search tool, then a code interpreter, in an order that depends on the question. A pipeline that can do all of this would have to encode every plausible ordering.

Conversational depth. The user wants to have a back-and-forth that does not fit a script. A pipeline can handle “answer this question”; an agent can handle “let’s work through this together”.

These are real use cases. They are also rarer than the marketing for agentic systems suggests. Most production AI features are pipeline-shaped.

A worked example

Customer asks: “Can I get a refund for order #4421?”

Pipeline approach:

  1. Parse the order ID from the message (regex, no LLM needed)
  2. Look up the order (deterministic API call)
  3. Check refund eligibility (deterministic logic over order state)
  4. If eligible: process refund (deterministic API call), generate confirmation message (LLM)
  5. If ineligible: generate explanation message (LLM), suggest alternatives (LLM)

Total: 4 to 5 deterministic operations, 1 to 2 LLM calls. Latency: 1 to 2 seconds. Cost: under 0.5 cents. Behaviour: predictable.

Agent approach:

  1. Agent gets the user message and a tool list (look_up_order, check_eligibility, process_refund, send_message)
  2. Agent decides to call look_up_order. Waits for result.
  3. Agent decides to call check_eligibility. Waits for result.
  4. Agent decides whether to process_refund or explain ineligibility.
  5. Agent decides what message to send.

Total: 5 LLM calls (one per loop iteration), plus tool calls. Latency: 6 to 12 seconds. Cost: 2 to 4 cents. Behaviour: usually right, occasionally surprising.

The pipeline wins decisively. The user does not benefit from the agent’s flexibility because there is no flexibility needed; the workflow is enumerable.

When the same example flips

Same domain, different question: “I had a problem with this order. Can you figure out what went wrong and fix it?”

Now the workflow is not enumerable. Was it a payment issue, a shipping issue, a product defect, a fraud flag, an integration failure with a partner system? Each requires a different sequence of investigation steps. The agent earns its place: it can call diagnostic tools in the order that makes sense based on what each one returns.

The lesson: the same domain can have pipeline-shaped workflows and agent-shaped workflows. Pick per workflow, not per product.

What we install on engagements

When we walk into an AI integration audit, the first question we ask about each LLM-driven feature is “can you write down the steps in advance”. For roughly two-thirds of the features we audit, the answer is yes; the team had built an agent because it sounded modern. We refactor those to pipelines. The cost drops, the latency drops, the support team stops complaining about inconsistency.

For the remaining third, the agent shape is justified. We then make sure the agent has the structural properties it needs: typed handoffs to other agents, deliberate memory architecture, observability on every loop iteration, a hard cap on iterations, and a verifier on outputs that affect state.

The right architecture is not the most modern one. It is the one that meets the requirement at the lowest cost in money, latency and operational pain. For most LLM features in production, that is a pipeline.

Frequently asked

Questions teams ask

What if I'm not sure whether my problem is agent-shaped?

Default to pipeline. Build the deterministic version first, identify the specific places where the deterministic path fails (the inputs that the pipeline cannot route), and consider an agent only for those residual cases.

Are LLMs only useful in agentic systems?

No. An LLM in a pipeline is a powerful tool: it can summarise, classify, extract or generate at a specific step. The LLM is not less useful in a pipeline; it just has a defined role instead of an open-ended one.

When does an agent actually beat a pipeline?

When the problem space cannot be enumerated. Open-ended research, debugging an unknown system, navigating an unfamiliar API. If you can list the possible inputs and the right action for each, a pipeline wins.