Introducing Orzed Pulse, the always on lightweight model

Orzed Models & Agents orzed-pulse, models

Orzed Pulse is the lightweight tier of the Orzed stack. Distilled from Horizon, sized for QA gates, classification, routing and sub second validation.

  • By Orzed Team
  • 5 min read
Key takeaways
  • 3B to 7B parameter dense model, distilled from Orzed Horizon to retain reasoning quality at a much smaller footprint.
  • Median latency under 200 milliseconds on production workloads, suitable for inline gates and high frequency calls.
  • Powers the QA Agent first pass, the Intake Agent's brief comprehension, the Routing Layer's classifier, and dozens of other inline checks.
  • Runs comfortably on edge inference hardware, including the per region nodes that serve the Console at low latency.
  • Not a deep reasoner. Pulse delegates anything past its complexity ceiling to Meridian or Horizon through the Routing Layer.

Orzed Pulse is the smallest model in the Orzed stack and, by a comfortable margin, the most frequently invoked. If Horizon thinks about engagements and Meridian executes them, Pulse is the model that runs everywhere else: every QA gate, every classification call, every routing decision, every inline validation. It is the platform’s default answer to the question “we need a fast, cheap, smart enough call here”.

This is the model card.

Architecture

Pulse is a dense decoder only transformer in the 3 billion to 7 billion parameter range. The architecture is unremarkable on purpose; the goal was a small, well behaved model that could be deployed at the edge of the Console infrastructure and serve high frequency inline calls without dragging on the rest of the system. Architectural novelty would have made deployment harder and inference less predictable.

Context window is short by current standards, in the 16K to 32K token range. Pulse is not for long context tasks. The Routing Layer escalates anything that needs more.

The model lives close to the Console. We deploy Pulse to per region inference nodes so the round trip to the model is dominated by inference time rather than network transit. For interactive Console surfaces (the message classifier, the Intake first pass on a freshly submitted brief) this matters; a 50 millisecond network round trip plus 150 milliseconds of inference is a usable inline call, while routing through a central inference cluster would push it past the threshold where the user feels the latency.

Training approach

Pulse is distilled. The base is open source (the Llama, Mistral and Phi small model lineage informs the architecture choice), but the fine tune is structured as a two stage distillation from Orzed Horizon.

Stage one: wide distribution distillation. A broad task distribution is run through Horizon, capturing input and output pairs across classification, structured extraction, semantic comparison, routing and validation tasks. The dataset is large (in the millions of pairs) and deliberately diverse. Pulse is trained to reproduce Horizon’s outputs on this distribution.

Stage two: targeted task distillation. For each named use case (QA first pass, Intake comprehension, Routing classifier, output validation), we collect a focused dataset of representative inputs, run them through Horizon and our senior reviewers, and produce a labelled task specific dataset. Pulse is fine tuned again on the labelled set, with the loss weighted toward the cases where Horizon and the reviewers agreed.

The combination is what gives Pulse usable quality at its size. Distillation alone produces a model that approximates Horizon poorly. Distillation plus targeted task tuning produces a model that approximates Horizon well within its scoped use cases, and gracefully fails outside them (which the Routing Layer detects and escalates).

We hold out evaluation sets per task. Numbers below are bands.

Where Pulse sits

Pulse runs in many places. The most visible ones in the Console are these.

QA Agent first pass. Every Execution lane deliverable that flows into the Review and Validation Layer hits Pulse first, paired with deterministic test results and static analysis output. Pulse produces the semantic component of the QA evidence pack. The throughput requirement here is high; at any point in time the platform is processing a queue of deliverables across many engagements.

Intake Agent comprehension. When a customer submits a brief through the Console, the Intake Agent (Pulse based) reads it within seconds and produces a structured Intake Report: gaps, ambiguities, scope risks, missing context. The Technical Review Team reads the Intake Report alongside the original brief.

Routing Layer classifier. Every task entering the Console is classified by Pulse before it reaches a model. The classification (planning, execution, validation, routing, escalation) drives the model selection. Pulse classifies itself out of the loop on roughly 60 to 70 percent of the tasks it sees, escalating to Meridian or Horizon.

Output validation. When Meridian or Horizon produces a structured output (JSON, YAML, a typed payload), Pulse runs a quick validation pass that catches schema violations, hallucinated fields, malformed nesting. Failures are routed back to the producer with the validation error attached.

There are dozens of smaller surfaces. The pattern is the same: Pulse is the inline call that makes a fast, cheap classification or validation, and lets the platform decide what to do next.

Performance bands

Median latency on production workloads is comfortably under 200 milliseconds. The 95th percentile sits around 350 milliseconds. This is the band that makes Pulse usable as an inline gate; the user experience inside the Console does not feel laggy because the model is in the loop.

On the QA first pass evaluation set (the labelled regression and clean code set described in the QA Agent changelog), Pulse alone reaches roughly 78 to 84 percent of the senior human reviewer’s accept and reject decisions. When composited with the deterministic test runner and the static analysis signals, the combined accuracy reaches the 92 to 95 percent band described in that piece. Pulse alone is not good enough; Pulse as part of a composite is good enough to ship.

On the Intake comprehension set, Pulse correctly identifies the major gap or risk in a brief in roughly 80 to 88 percent of cases. The misses are concentrated in briefs that lean heavily on industry jargon Pulse has not seen in training; we feed back these cases into the Intake fine tune on a regular cadence.

On the Routing classifier set, Pulse routes correctly (in agreement with senior engineer judgement on the right tier for the task) in 92 to 96 percent of cases. The misclassifications are cheap to recover from; if Pulse routes a task to Meridian and Meridian determines it needs Horizon, the escalation is a one shot redirect with no rework.

Limits

Three limits.

Reasoning depth. Past two or three hops of chained reasoning, Pulse degrades sharply. The Routing Layer is the safety net; tasks that need depth do not land on Pulse.

Long context. Pulse’s context window is short. Anything longer escalates.

Open ended generation. Pulse can produce structured output well. It is not for free form long form writing; that is Meridian or Horizon depending on the depth.

Specifications

AttributeValue
ArchitectureDense decoder only transformer, 3B to 7B parameters
BaseOpen source small model lineage (Llama, Mistral, Phi), Orzed distilled from Horizon
Context window16K to 32K tokens
Median latencyUnder 200ms
95th percentile latencyRoughly 350ms
Throughput targetMillions of calls per day per inference cluster
Target use casesQA first pass, Intake comprehension, Routing classification, output validation, inline gates
Console surfaceBackground, surfaced in evidence packs and routing logs

The Routing Layer write up explains how Pulse, Meridian and Horizon work together, and how a single customer task flows across the three tiers without the customer ever needing to think about which model is doing what.

Frequently asked

Questions teams ask

If Pulse is small, how does it perform like a much larger model?

Distillation. Pulse is trained against the outputs of Orzed Horizon on a wide task distribution, which transfers a meaningful fraction of Horizon's reasoning quality at a much smaller parameter count. Distillation does not produce parity, only a usable fraction; the trade is acceptable for the high frequency tasks Pulse runs.

What runs on Pulse in a normal day?

The QA Agent's first pass, the Intake Agent's initial scan of every brief, the Routing Layer's classification of incoming tasks, the validation step on every structured output, the categorisation of incoming Console messages, and a long tail of small classification and verification calls. Together they account for the majority of inference calls in the Orzed platform by count, though not by compute.

Where does Pulse fail?

Anything that requires more than two or three hops of chained reasoning. Pulse is good at single shot evaluation, classification and structured output. It is not good at planning, deep code synthesis or multi turn conversation. The Routing Layer detects these cases and escalates.