The Orzed Routing Layer, how models hand off work

Orzed Models & Agents routing, orchestration

The part of the Orzed stack that decides which model handles each task. Cost, quality and latency optimisation across Horizon, Meridian and Pulse tiers.

  • By Orzed Team
  • 7 min read
Key takeaways
  • Pulse based classifier reads every incoming task and proposes a tier; a deterministic policy layer applies hard constraints (security, escalation, customer overrides).
  • Routing optimises across cost, quality and latency, not for any single one. The triangle is the constraint, not a preference.
  • Project Credits stay stable for the customer; the routing decisions affect Operational Credit usage on our side.
  • Customers can override the route. A customer can ask the Console to use Horizon for a task Pulse classified as routine, or to call out to a third party frontier model entirely.
  • Routing decisions are logged and visible in the engagement audit trail. Nothing about which model touched what is hidden from the customer.

The Routing Layer is the part of the Orzed stack that almost no customer asks about and yet the part that decides almost every cost and latency outcome in their engagement. It is the policy engine that takes an incoming task and decides which model handles it. Get it right and the platform serves a whole engagement at the lowest defensible cost without a quality regression. Get it wrong and either the bill becomes unpredictable or the work quality starts to wobble.

This piece explains how the Routing Layer is built, what it optimises for, and why customers can see and override its decisions.

The shape of the problem

A delivery pipeline is a stream of tasks. Most of them are small (a classification, a structured output, a validation, a quick code refactor). Some are medium (a feature implementation, a content draft, a documentation pass). A few are large (a planning artifact, an architecture trade off, a senior review of a finished workstream).

If we send everything to Horizon, the bill is wasted on tasks Pulse could have done in 200 milliseconds. If we send everything to Pulse, the planning artifacts get produced by a model that cannot reason at the depth they require. The naive answer is to ask each customer to pick a model per task, which is unworkable; nobody wants to make a routing decision per call.

The Routing Layer is the alternative. A small classifier reads every task, predicts the right tier, and the platform routes accordingly. The customer does not see the routing as friction, only as a result.

Architecture

The Routing Layer is built in two parts: a learned classifier and a deterministic policy overlay.

The classifier. A Pulse based model trained on roughly 30,000 historical task to model decisions made by senior engineers across our delivery memory. The training signal is “given this task description and context, which tier was the right one to handle it”. The classifier outputs a tier prediction (Pulse, Meridian, Horizon) plus a confidence score. Average inference time is in the 50 to 80 millisecond range, which puts the routing decision well under the threshold where it becomes a meaningful contributor to end to end latency.

The policy overlay. A deterministic ruleset that runs after the classifier and can override its prediction. The overlay handles cases the classifier cannot reliably learn:

  • Hard constraints. A task tagged as security sensitive must escalate by at least one tier. A task in an engagement under a customer’s “frontier only” override goes to a third party model, not to the Orzed stack at all.
  • Escalation triggers. If Pulse handles a task and its output fails the validation gate, the platform retries on Meridian rather than reflexively retrying on Pulse.
  • Cost guardrails. If the engagement’s Operational Credit budget for the day is approaching its band, the overlay biases routing toward smaller tiers for cost insensitive tasks.
  • Customer overrides. Per task and per engagement pins (covered below) take precedence over the classifier’s prediction.

The order is deliberate. The classifier proposes; the policy overlay disposes.

What the layer optimises for

Three variables, treated as a triangle rather than a preference.

Cost. Per task Operational Credit usage is measured and aggregated. The Routing Layer is the largest single lever on the platform’s compute bill. We have measured engagements where moving the routing thresholds by ten percentage points changed Operational Credit usage by 25 to 35 percent without measurably changing deliverable quality.

Quality. Quality is measured by senior reviewer accept rates on Execution lane outputs and by retrospective review of Planning artifacts. The Routing Layer’s quality target is a band: deliverables produced under the Layer’s routing decisions should not drift more than a small percentage point band below the quality bar set by always escalating to the highest tier. We accept a measured trade because the cost saving is real and the quality drop is in the noise.

Latency. Tasks that surface in interactive Console flows have a hard latency budget. The Routing Layer respects it: an interactive task never escalates to Horizon mid stream because that would blow the budget. Async tasks (planning, batch generation) have looser latency budgets and the Layer routes for quality more aggressively there.

The triangle is the constraint. The Routing Layer is not optimising for cheapest routing or fastest routing or best routing in isolation; it is finding the routing that satisfies all three within their bands.

How it shows up in an engagement

A simplified trace, drawn from a typical engagement.

  1. Customer submits a brief through the Console.
  2. Pulse classifies it as an intake task and runs the Intake Agent.
  3. The Intake Report goes to the Technical Review Team. Their decision triggers a planning task, which the classifier routes to Horizon (this is a planning class call; Horizon is the right tier).
  4. The Planning Recommendation is reviewed and approved. The Approved Baseline becomes the binding plan.
  5. Execution begins. The Routing Layer sees a stream of tasks, most of which it routes to Meridian (code generation, content drafts, refactoring). A handful of architecture decisions during Execution escalate to Horizon for senior review. A long tail of validation and classification calls go to Pulse.
  6. Each Execution lane output enters the QA Agent’s review gate, which runs on Pulse (composited with deterministic tools).
  7. When QA flags an issue or a senior reviewer requests a deeper look, the relevant artifact escalates to Meridian or Horizon for revision.
  8. Release readiness is reviewed by a senior engineer with Horizon assisted Senior Review Notes.

The customer sees the deliverables and the milestones. They can also see the routing trace if they want to. The trace is part of the engagement audit log and lists every model invocation, which tier handled it, and the Operational Credit cost. We expose this on principle: the customer pays in Project Credits, but the underlying system is transparent.

The Project Credit and Operational Credit split

This is the part that distinguishes Orzed’s billing approach from per token pricing.

Project Credits are what the customer pays. They are denominated in deliverables. A planning artifact costs N Project Credits; a finished feature costs M Project Credits. The price is set during scoping, against the Approved Baseline, and does not move based on what models the platform chooses to invoke.

Operational Credits are the internal accounting unit. They track per call inference cost across all three tiers. The Routing Layer’s optimisation target is to keep Operational Credit usage in a band that lets us deliver Project Credits sustainably. Customers do not see Operational Credits on their bill; they see them only in the routing trace if they ask.

This split is the reason the Routing Layer exists at all. If we passed token costs through to the customer, every routing decision would become a customer facing decision and the platform would lose the freedom to make per call optimisations. By absorbing inference cost into a fixed deliverable price, the platform becomes responsible for getting routing right, and the customer is insulated from the daily fluctuations of inference cost.

Customer overrides

Three override surfaces in the Console.

Per task pin. Right click a task in the Console, choose a model. The Routing Layer respects the pin and logs it in the audit trail.

Per engagement default. Set a default tier or default frontier model for the engagement. The Routing Layer treats it as the floor.

Frontier model passthrough. For customers who want their work to run on a specific third party frontier model (Claude, GPT, Gemini), the Console exposes a passthrough surface. The Orzed agents continue to manage orchestration; the inference happens at the third party. Operational Credit accounting reflects the passthrough cost.

The override surfaces exist because we built Orzed as an open system. We provide our own model stack because it lets us guarantee cost and quality bands that we cannot guarantee on third party inference; we also provide third party access because the customer should never be locked into our models.

Specifications

AttributeValue
Classifier modelOrzed Pulse, fine tuned on 30K historical routing decisions
Average classification latency50ms to 80ms
Override surfacesPer task pin, per engagement default, frontier passthrough
AuditFull routing trace per engagement, surfaced in Console
Cost leverLargest single lever on platform Operational Credit usage
Customer surfaceProject Credits stable, Operational Credits visible on request

The next two write ups in this index cover the Intake Agent and the Planning Agent, the two named agents most customers interact with at the start of an engagement. Both are routed automatically; the routing is what makes them feel invisible.

Frequently asked

Questions teams ask

Does the customer pay more if the Routing Layer escalates to Horizon?

No. Project Credits are deliverable based and quoted upfront. Operational Credits, which track per call inference cost, are an internal accounting unit that does not flow through to the customer. Routing decisions can move the Operational Credit cost on our side, but the customer's bill is set by the deliverable, not by the model.

Can I force a specific model for my engagement?

Yes. The Console has per task and per engagement model overrides. A customer can pin Horizon for the planning step, request third party frontier models for specific deliverables, or default to Meridian across the board. The Routing Layer respects pins and surfaces them in the audit trail.

How does the Routing Layer decide when to call a third party model?

When the customer has explicitly opted into a third party model for a task class, when the task class is one we know our stack underperforms on (rare languages, niche compliance domains), or when the customer's preference profile says so. The platform never silently substitutes; routing to a third party is logged and visible.