Treating deployment as the finish line
A model that passes eval on the training data ships, then quietly decays in production. Without drift detection and a retraining path, the clock starts the moment the release lands.
Deployment is the easy half. Monitoring, drift detection, inference optimisation, bias audit, privacy engineering and explainability are the layer that turns a working demo into an accountable production system. We operate all four pillars under one contract.
FOUR PILLARS
Deployment is one of four disciplines, not the discipline. An engagement scoped to a single pillar is welcome; most end up covering three once the first production incident makes the gaps visible.
Getting weights out of a training notebook and behind a service contract the rest of the company can call.
What a model did yesterday says nothing about what it will do tomorrow. We write the signals that catch regression early.
Inference cost is where most AI budgets quietly die. Quantisation, compilation and edge deployment where they save money or latency.
A production model is a regulated artefact. Bias audit, privacy, explainability and audit trail are design, not add-ons.
MLOps inherits whatever the data layer hands it. Drift detection, lineage and privacy controls start in the pipeline, not at the serving endpoint.
MATURITY LADDER
We use the Google MLOps maturity frame as a diagnostic. Most organisations we meet are between stage 01 and stage 02. The move that matters is 02 → 03, where the pipeline stops being a person and starts being code.
One engineer, one notebook, one model. Works for proof-of-concept; breaks the moment a second model or a second engineer arrives.
Training pipelines are reproducible, data and code versioned, metrics tracked. First experiments can be replayed; retraining is still manual.
Scheduled and triggered retraining on monitored drift. Candidate models go through staged evaluation before promotion. Human approval on the release gate.
Shadow deploy, canary, autoscale, automatic rollback on SLO breach. Release gates documented, executed by the pipeline, countersigned by a human.
The system reshapes itself within policy: chooses variants, rebalances cost, schedules retraining, opens tickets on its own drift. Humans approve the policy, not the steps.
FOUR PITFALLS
None of these are cutting-edge problems. They are basic operational hygiene that is almost always deferred because the first demo worked. When they break, they break the business case.
A model that passes eval on the training data ships, then quietly decays in production. Without drift detection and a retraining path, the clock starts the moment the release lands.
Cheap to train, expensive to serve. We set an inference-cost ceiling in stage 01 of the model build so quantisation, distillation and architecture choices are made before the training budget is spent.
Fairness metrics pass at launch, regress six months later as the input distribution shifts. Audits belong on the continuous-eval dashboard, not in a launch memo.
If the system cannot explain a single decision when the customer asks, the product is regulated debt. Explainability is a runtime feature, not a research artifact.
SLO CONTRACT
Numbers on a slide are not a contract. The table below is the shape every MLOps engagement ends with: latency, availability, rollback and on-call SLOs in writing, tied to a tier that matches the product's risk profile.
TOOLKIT
Stack picks are driven by latency, cost, compliance and team fluency, never by preferred-vendor contracts. These are the tools we run most often; substitutions happen per engagement.
Adjacent disciplines
Every production AI surface leans on its neighbours. The following disciplines run alongside on most engagements.
Training pipelines, evaluation suites and the model registry that feeds the MLOps layer.
FoundationLineage, residency, privacy and the online feature layer that real-time inference depends on.
ScaleDistributed training and quantised inference, the surface that most often breaks the cost envelope.
UmbrellaThe full discipline map. MLOps is the sixth pillar, the one that turns a working model into a trusted system.
Bring the model, the traffic shape and the compliance envelope. We come back with an SLO contract, deployment plan, monitoring spec and governance map inside ten working days. Numbers you can sign, not aspirations.