Practical writing
no content theatre
AI systems, delivery decisions, product operations and the mistakes worth writing down. Written by the senior team between engagements, not by a content calendar.
Orzed Intake Agent, brief comprehension at the front door
The Intake Agent reads every customer brief that enters the Orzed Console. Built on Pulse, produces a structured Intake Report for the Technical Review Team.
Prompt Injection Defence Beyond Input Filtering
Input filtering alone is not a defence against prompt injection. The layered architecture that keeps an LLM-driven system from being walked off its rails.
When to Use an Agent and When to Use a Pipeline
Agentic loops cost more, fail strangely and resist debugging. The honest test for whether your problem needs an agent or just a deterministic pipeline.
Context Window Economics: The Hidden Bill on RAG
Stuffing context costs money on every call, even on tokens the model ignored. The discipline that keeps RAG context relevant, ranked and compressed.
Evaluating LLMs Without a Research Team
A working evaluation gate that a small engineering team can build in a week, with the assertions, scoring and failure modes that make it production-credible.
EU AI Act Mapping for Engineering Teams
What the EU AI Act actually requires of an engineering team: the four risk tiers, the documentation burden, and the timeline that already started in 2025.
Prompt Registry: YAML File, Database Table, or Service?
Three working shapes for a production prompt registry, with the trade-offs that decide which one fits a team of three, thirty, or three hundred.
System Prompts That Survive Three Model Upgrades
A system prompt tuned to one model's quirks breaks on the next model. The structural patterns that decouple intent from model-specific tuning.
Prompt Caching: Where It Pays Back and Where It Does Not
Provider-side prompt caching cuts cached input cost by up to 90 percent. The pattern that earns the discount and the configurations that waste it.
LLM Output Streaming: The Edge Cases That Bite
Token streaming is the default for chat UIs and the source of subtle bugs. Partial JSON, truncated outputs, retries and the patterns that handle them.
When RAG Actually Helps and When It Hides Bad Retrieval
Retrieval-augmented generation looks like an answer to every grounding problem. The honest test for whether you need RAG, fine-tuning or a cleaner data source.
Vector Store Sizing: The Cost Truth Nobody Tells You
What a million vectors actually costs across Pinecone, Qdrant, Weaviate and pgvector, and the configuration choices that move the bill by 5x.