Feature Store Architecture: The Three Shapes
- Training-serving skew is the most common ML production bug and the one feature stores solve directly.
- A 200-line library on top of your existing warehouse beats a heavyweight feature store for small teams.
- Feast-class self-hosted is the right middle ground when you have multiple ML teams sharing features.
- Managed feature stores earn their keep when feature volume and serving latency matter more than infrastructure cost.
A team we worked with had been scaling their ML platform for three years. They had nine production models, each maintained by a different team. We discovered, in week one of the audit, that “user_lifetime_value” was defined six different ways across the nine models. Three definitions used a 30-day window, two used 90, one used “since signup”. Two excluded refunds; the others did not. Each team had reinvented the same feature with their own variation.
The downstream effect: when finance produced an LTV report and ML produced an LTV-predicted-customer-cohort report, the numbers never reconciled. Engineering had been chasing this for months without realising the root cause was that “user lifetime value” did not mean the same thing in any two systems.
A feature store solves this. It is also frequently over-engineered, picked by reading vendor whitepapers rather than measuring need. This piece is about the three shapes that work in production and the criteria that pick the right one.
What a feature store actually does
Two jobs:
Training-serving consistency. A feature computed in offline training and the same feature computed in online serving must produce identical values for the same entity. Without a feature store, the two computations live in different code (the training pipeline and the serving service) and drift. The model performs well in evaluation and badly in production.
Feature reuse. When team A defines “30-day rolling order count” and team B needs the same feature, B reads the canonical definition instead of re-implementing it. Bug fixes propagate; data definitions converge.
Anything else a vendor advertises (online materialisation, point-in-time joins, monitoring) is implementation detail in service of these two jobs. Pick the implementation that fits your team’s actual needs.
Shape 1: thin library on top of the warehouse
For a small team (one to three ML engineers, one to three production models), a feature store can be a 200-line library that exposes feature definitions as functions over the warehouse.
# Sketch of a thin feature library
@feature("user_lifetime_value_30d", entity="user_id", ttl=hours(6))
def user_lifetime_value_30d(user_id: int, as_of: datetime) -> float:
return warehouse.query("""
SELECT COALESCE(SUM(amount_cents), 0) / 100.0
FROM orders
WHERE user_id = %s
AND created_at >= %s - INTERVAL '30 days'
AND created_at < %s
AND status NOT IN ('refunded', 'cancelled')
""", user_id, as_of, as_of)
Training calls the function over a labelled dataset. Online serving calls the same function with as_of=now(). The function is the source of truth.
The benefits:
- One implementation per feature, used in both contexts. No drift.
- New features are added by writing a function. No registry to maintain.
- The library is in source control. Changes are reviewable.
The limits:
- No precomputed materialisation. Each call hits the warehouse.
- Latency is whatever the warehouse query takes (usually 50 to 500 ms).
- No automatic monitoring of feature distributions.
This shape works when feature compute is cheap and serving latency is forgiving. We use it on engagements where the team is small and the model count is single digits.
Shape 2: self-hosted Feast-class system
For a medium team (5 to 20 ML engineers, multiple production models, shared entities), a self-hosted feature store like Feast adds the missing pieces from shape 1: registered feature definitions, materialisation to a low-latency online store (Redis, DynamoDB), point-in-time correct training data generation.
The architecture:
- Feature definitions registered in code (YAML or Python decorators), versioned in source control.
- Offline store: the warehouse (Snowflake, BigQuery, Redshift). Used for training data generation.
- Online store: a low-latency store (Redis, DynamoDB, ScyllaDB). Used for online serving.
- Materialisation: a scheduled job that computes features from the offline store and writes the latest values to the online store.
# Feast-style feature definition (sketch)
feature_view:
name: user_lifetime_value_30d
entities: [user]
ttl: 6h
features:
- { name: ltv_30d, dtype: float }
source: "SELECT user_id, ltv_30d FROM analytics.user_features_30d"
online: true
offline: true
The benefits over shape 1:
- Online serving in single-digit milliseconds via the online store.
- Point-in-time correctness on training data (the right value of the feature as of when the prediction would have been made).
- Feature registry that all teams reference; “user_lifetime_value_30d” means the same thing everywhere.
The cost:
- Operational overhead. Someone has to maintain the materialisation pipelines, the online store, the registry.
- Engineering investment of 4 to 12 weeks for the first install plus ongoing.
This shape works when multiple teams need the same features and online serving latency matters. We install Feast (or its equivalents like Hopsworks, Featureform) on engagements that have outgrown shape 1 but are not at the scale or budget for managed.
Shape 3: managed feature store
For larger teams or higher serving demands, managed feature stores (Tecton, Vertex AI Feature Store, Databricks Feature Store) absorb the operational layer. The team writes feature definitions; the platform handles materialisation, online serving, monitoring, point-in-time correctness, lineage.
The benefits:
- No platform team needed for the feature store itself.
- Built-in monitoring, lineage, drift detection.
- Sub-millisecond online serving at scale.
- Audit trails and access control suitable for regulated environments.
The cost:
- SaaS contract, scaling with feature count and serving QPS. Real money: 5-figure to 6-figure annual contracts at non-trivial scale.
- Vendor lock-in. Feature definitions are tool-specific.
- Less control over the underlying infrastructure.
This shape works when the engineering cost of maintaining a self-hosted feature store exceeds the SaaS contract. Roughly: when you have a dedicated ML platform team of 3 or more engineers spending more than 30% of their time on the feature store, switch to managed.
The decision matrix
| Profile | Models | Engineers | Online serving QPS | Recommended shape |
|---|---|---|---|---|
| Startup, single team | 1 to 3 | 1 to 3 | low | Library on warehouse |
| Multi-team, growing | 4 to 12 | 5 to 20 | medium | Feast-class self-hosted |
| Enterprise / scale | 12+ | 20+ | high | Managed |
| Regulated / audit-heavy | any | any | any | Managed (or hardened self-hosted) |
The wrong move is jumping straight to shape 3 because it sounds robust. A startup with two models and a managed feature store contract is paying enterprise prices for a problem they do not have.
The other wrong move is staying in shape 1 too long. A team with eight models and shared entities pretending a thin library is enough has the original problem (drift, divergence, irreconcilable definitions) and refusing to acknowledge it.
What about AI-era features?
Two newer shapes of feature that classical feature stores do not handle well:
Embeddings. Vector representations from a pretrained model. Feature stores can store them as float arrays, but querying them requires a vector store, not a feature store. In practice, these live in a separate vector database that the feature store either does not know about or treats as an opaque blob.
LLM-derived attributes. “Customer sentiment based on last support thread” computed by an LLM. Cost-per-compute is much higher than for a SQL feature; the materialisation strategy changes. Most feature stores can integrate but the operational shape (rate-limited compute, eval-gated quality) needs deliberate design.
For now, treat these as adjuncts rather than first-class members of the feature catalogue. The classical feature store stays focused on tabular features; the embedding store and the LLM-attribute pipeline are separate concerns.
What we install on engagements
For a team starting fresh: shape 1 with a clear migration path to shape 2 if and when the model count grows past three.
For a team already at five-plus models with no feature store: shape 2 self-hosted, with a 4-to-8 week install timeline.
For a team at enterprise scale or in a regulated context: shape 3 managed, with the contract negotiation as a separate workstream.
In all cases: the feature definitions themselves are the artifact that survives platform migrations. Spend the time getting the definitions right; the storage tier can change later.
The teams that get this right have models that mean what their authors think they mean. The teams that skip it produce models that pass evaluation and fail in production for reasons that take weeks to diagnose. The work is not glamorous; the consequences of skipping it are.
Questions teams ask
Do I need a feature store at all?
If you have one model and one data scientist, no. If you have three or more models that share underlying entities (users, products, accounts), yes; otherwise feature definitions diverge across models and the bug fixes become per-model rather than per-feature.
Is online feature serving always required?
No. Many models serve fine in offline batch (recommendations refreshed hourly, fraud signals scored nightly). Online serving adds infrastructure cost; require it only when latency genuinely demands it.
How do feature stores work with LLMs?
LLMs do not consume tabular features in the same way. A feature store is for classical ML. Some teams use a feature store to serve metadata to LLM-driven systems (user tier, account state, recent activity), which is a valid use, just not what feature stores were designed for.