Introducing Orzed Meridian, the workhorse production model

Orzed Models & Agents orzed-meridian, models

Orzed Meridian is the middle tier of the Orzed stack. Sized for everyday coding, content and structured output, with strong tool use and function calling.

By Orzed Team
26 March 2026
5 min read

Key takeaways

Dense architecture in the 13B to 22B parameter range, fine tuned on a code corpus weighted toward Python, TypeScript, Go, Rust and Java.
Tool use and function calling alignment is a separate training stage, not a prompt level afterthought.
Powers most Execution lane work: code generation, content drafts, refactoring, structured output, internal documentation.
Median latency in the 800ms to 1.2 second range, suitable for interactive Console flows.
Operational sweet spot is the body of an engagement, where the right answer matters but the planning depth of Horizon is overkill.

If Orzed Horizon is the model that thinks about the engagement, Orzed Meridian is the model that does the engagement. It is the workhorse tier: the model that writes most of the code, drafts most of the content, produces most of the structured output that flows through the Console on a normal day.

This is the model card. It explains the architecture, the training, the tasks Meridian is sized for and the latency and quality bands you should expect when it shows up in your engagement.

Architecture

Meridian is a dense decoder only transformer in the 13 billion to 22 billion parameter range. We deliberately stayed dense at this size class. Dense models are simpler to deploy, simpler to debug and, on the workloads we care about (code, structured output, function calling), match or beat comparably budgeted mixture of experts models in our internal evaluations. We pay the MoE complexity cost only at the Horizon tier where reasoning depth justifies it.

The base architecture borrows from the Llama and Mistral lineage of open source dense decoders. Context window is in the 64K to 128K token range; longer context routes to Horizon rather than asking Meridian to operate outside its sweet spot.

Tool use and function calling are first class. The model is trained to emit structured tool call payloads as part of its decoding loop, not as a post hoc parsing pass over free form text. This matters in production because tool call malformation is one of the largest sources of agent flakiness; getting it right at the model level removes a class of failures the orchestration layer would otherwise have to clean up.

Training approach

The base is open source. The fine tune is multi stage and weighted heavily toward the work Meridian actually does in production.

Stage one: code corpus. A multi language code corpus filtered for permissively licensed sources, with weighting in favour of the languages we use most (Python, TypeScript, Go, Rust, Java). The filter excludes machine generated code, which is otherwise enough of a source to skew the model’s style toward the patterns of older code generation tools.

Stage two: technical writing corpus. Documentation, technical blog posts (peer reviewed open sources only), RFCs, design documents. This is the dataset that gives Meridian the ability to draft a coherent technical explanation, not just a syntactically valid code snippet.

Stage three: Orzed internal workflow corpus. Roughly 12,000 paired examples of Execution lane prompts and accepted outputs, drawn from completed engagements. The pairs are filtered for quality (only outputs that passed senior review) and aligned to the structured output formats the Console uses.

Stage four: tool use alignment. A separate fine tuning pass on tool call traces. We collected tool call sequences from a year of agent operation, labelled them for correctness (call shape, parameter validity, semantic correctness of the call given the user’s intent) and trained against the labelled set. This is the stage that makes Meridian usable as a backbone for production agents rather than a prompt and pray code generator.

We hold out two evaluation sets. The first is a labelled production task set: 600 real Execution lane tasks with known good outputs. The second is a tool use stress set: 200 task chains where the right path requires four to seven tool calls in sequence. Numbers below are measured on both.

Where Meridian sits

Meridian is the default model in the Execution lane. When a deliverable needs to be produced (code, content, draft document, structured output for downstream automation), the Routing Layer almost always lands on Meridian unless there is a specific reason to escalate or de escalate. That default coverage is where the platform’s compute budget mostly goes.

Specific surfaces where Meridian shows up in the Console: the code generation stream during active development, the content draft pipeline, the structured output bridges that produce JSON or YAML for downstream automation, the documentation generator that runs over completed code. The interactive Console assistant (the one a user can actually chat with) also runs on Meridian, with explicit escalation to Horizon when the user asks a planning class question.

Performance bands

On the production task set, Meridian’s outputs reach acceptable quality (defined as outputs the senior reviewer would accept with no rework, or with light rework) in the 78 to 86 percent range. The variance across this band reflects task class; structured output and refactoring sit at the top, novel feature design sits at the bottom. For the bottom slice the Routing Layer often escalates to Horizon for the design step and then comes back to Meridian for the implementation.

On the tool use stress set, Meridian completes the full chain correctly in roughly 70 to 78 percent of cases. The most common failure is a chain that succeeds for the first several calls and then derails on a tool with ambiguous return shape, which is a problem the orchestration layer addresses with structured tool contracts (covered in the Routing Layer write up).

Latency on a typical Execution lane workload (a few thousand tokens of context, a few hundred to a few thousand tokens of output) sits in the 800 millisecond to 1.2 second range at the median. Streaming starts faster, so the perceived latency in the Console is in the 200 to 400 millisecond range to first token.

Limits

Three limits worth naming.

Long context degradation. Past about 80K tokens of context, Meridian’s quality starts to slip. The platform routes long context tasks to Horizon, which is sized for it.

Domain depth. Meridian is broad. For deep work in a narrow domain (specialised compliance, hard real time systems, exotic language stacks) the platform will route to either a fine tuned variant or, for customer choice, a third party frontier model. Meridian is the workhorse, not the specialist.

Reasoning ceiling. Genuine multi step planning work runs into Meridian’s reasoning ceiling around three to four hops of structured chain of thought. Past that the model starts to lose track of state. The pattern is to chain Meridian calls under a Horizon plan rather than ask Meridian to plan deeply on its own.

Specifications

Attribute	Value
Architecture	Dense decoder only transformer, 13B to 22B parameters
Base	Open source dense lineage (Llama, Mistral family), Orzed fine tuned
Context window	64K to 128K tokens
Median latency	800ms to 1.2s (production workload)
Streaming latency to first token	200ms to 400ms
Target use cases	Code generation, content drafts, structured output, tool use, interactive Console flows
Console surface	Execution lane outputs, interactive assistant, content drafts

Meridian is the model the platform spends the most compute on by a long margin, because it is the model doing the most work. The Routing Layer write up explains how the platform decides when to escalate to Horizon, when to drop to Pulse, and when to call out to a third party model entirely.

Frequently asked

Questions teams ask

Why a dense model instead of MoE at this size?

At 13B to 22B parameters dense models match or beat comparably budgeted MoE models on the workloads we care about (code, structured output, tool use), and the deployment story is simpler. We use MoE at the Horizon tier where the depth justifies the routing overhead; here it does not.

Which programming languages is Meridian strongest in?

Python, TypeScript, Go, Rust and Java carry the bulk of the corpus weight, in roughly that order. C# and Kotlin sit in a second band. Less common stacks (Elixir, Haskell, Clojure) are present but the model will route to a frontier third party model through the Console for serious work in those.

Does Meridian have a long context window?

Meridian's context window sits in the 64K to 128K token range, sized for production work that fits in a working set (a feature, a refactor, a drafting task with its references). For whole engagement context the platform routes to Horizon rather than stretching Meridian past its sweet spot.

Artificial Intelligence

Machine Learning

Data Engineering

Computer Vision

Deep Learning

Natural Language Processing

MLOps & Governance

Cyber Security & Risk Ops

Technology Stack

AI Integration

SaaS Product Development

E-Commerce & Marketplace

Growth Analytics & SEO/GEO

Mobile App Development

Web & Content Platforms

CRM & Revenue Operations

Code & Performance Refactoring

Financial Technology

Healthcare & MedTech

E-Commerce & Retail

Manufacturing & Industrial

Media & Publishing

Education & EdTech

Real Estate & PropTech

Logistics & Supply Chain

Energy & Sustainability

Project Management

Product Strategy

DevOps & Cloud Infrastructure

Enterprise Workflow Automation

Business Intelligence

QA & Release Governance

UX/UI Systems & Design

Change Management & Transformation

Portfolio Management