Streaming Data Pipelines: The Back-Pressure Trap

Data Platform streaming, kafka

Streaming pipelines work until the consumer falls behind the producer. Patterns that handle back-pressure without dropping data or stalling the source.

  • By Orzed Team
  • 6 min read
Key takeaways
  • Consumer lag is the metric that matters. Track it per partition, alert on sustained growth.
  • Drop policies require explicit decisions about which data is OK to lose.
  • Sampling preserves statistical fidelity while reducing volume.
  • Slow-source patterns work when the source is controllable; not all sources are.

A team had a Kafka-based pipeline ingesting 80,000 events per second from their application into their analytics warehouse. The consumer was a Spark Streaming job that processed the events into denormalized tables. For six months everything worked. Then traffic grew 30 percent. The consumer fell behind. Kafka lag started climbing. By the time someone noticed, lag was at 14 hours; the analytics team was looking at data from “earlier today” while the dashboards still labelled it as live.

The fix was not to scale up Kafka. The fix was to figure out why the consumer could not keep up, and to install policies for what to do when the producer outpaces the consumer in the future. The answer involved a mix of consumer-side parallelism, dropping low-value events, and aggressive late-arrival handling.

This piece is about the patterns that prevent the back-pressure trap and the trade-offs each one involves.

What back-pressure actually is

In a streaming pipeline, the producer (your application emitting events) and the consumer (your analytics job reading them) operate independently. Between them sits a queue (Kafka topic, Kinesis stream, RabbitMQ exchange) that buffers events until the consumer reads them.

When the consumer reads at the same rate the producer writes, the queue stays small. When the consumer reads slower, the queue grows. This growth is back-pressure: the buffer absorbing the rate mismatch.

Back-pressure is fine in short bursts. The buffer exists for exactly this. Sustained back-pressure is a problem: the buffer eventually fills, and at that point one of three things happens depending on configuration:

  1. The producer stalls (waiting for buffer space)
  2. The buffer drops events (losing data)
  3. The buffer overflows to disk (preserving data at the cost of latency)

Each of these is a real choice. None of them is the default that “just works”.

The metric that matters: consumer lag

Consumer lag is the difference between the latest event in the queue and the latest event the consumer has processed. Measured in records, time, or bytes depending on the system.

Per-partition lag matters more than aggregate. A pipeline with 16 partitions where one partition is 4 hours behind and the others are caught up looks fine on aggregate metrics; the dashboard for the affected customer is silently stale.

Lag levelWhat it means
Low (under SLA target)Pipeline healthy
Growing for under 5 minutesBrief burst, usually recovers
Growing for over 30 minutesReal back-pressure, alert
Growing for over 2 hoursInvestigate now, recovery may not happen on its own
At the buffer sizeImminent data loss or producer stall

Set the alert thresholds based on the consumer’s SLA, not on a generic recipe.

Pattern 1: drop the low-value events

Not all events are equally valuable. A pipeline might carry: user clicks, system metrics, audit logs, debug events. Under back-pressure, dropping debug events to keep audit logs flowing is often the right trade.

Two implementation approaches:

Producer-side priority. The producer tags events with a priority. The pipeline routes high-priority events to one topic, low-priority to another. Under load, the consumer falls behind on the low-priority topic without affecting the high-priority one.

Consumer-side sampling. The consumer reads everything but processes only a sample of the lower-priority events. The dropped events are not lost (they remain in the buffer), but the consumer keeps up by doing less work per event.

The first preserves the dropped events for later. The second preserves nothing but is simpler.

The discipline either way: explicit decisions about which events are OK to lose. “We’ll figure it out under load” is not a policy.

Pattern 2: sample the events

Statistical use cases (analytics, monitoring aggregates) often do not need every event. A 10 percent sample of 80,000 events per second is 8,000 events per second of statistical signal that the consumer can keep up with. The aggregate metrics computed from the sample are accurate within known confidence intervals.

The trick is sampling deterministically. If you sample randomly, dashboards become non-reproducible. If you sample by event hash, the same event is always sampled or always dropped, and the sample is a stable subset.

# Deterministic sampling by hash
def should_sample(event, sample_rate):
    return hash(event.id) % 100 < sample_rate * 100

This pattern is wrong for use cases that need every event (audit logs, financial transactions, fraud detection). It is right for use cases that just need an accurate aggregate.

Pattern 3: slow the source

If the producer is something you control (your own application, a controllable upstream system), you can apply back-pressure all the way to the source: slow down the producer until the consumer catches up.

Reactive Streams, Project Reactor, RxJava, and most modern queue clients (Kafka with max.in.flight.requests, AMQP with consumer ack) support this. The producer waits for buffer credit before emitting more events.

The trade-off: a slow consumer becomes a slow producer becomes a slow user-facing system. The back-pressure is visible all the way back. This is sometimes correct (better to slow down the system than lose data) and sometimes wrong (better to drop events than serve a 5-second-slower web page).

The pattern works only when the source is controllable. External sources (third-party webhooks, public APIs) cannot be slowed; the consumer has to handle whatever rate they arrive at.

Patterns that often do not work

“Just scale the consumer.” A common first response. Sometimes works (the consumer was under-provisioned). Often masks the real issue (the consumer’s per-event work is too expensive, or the partitioning is uneven, or there’s a downstream bottleneck the consumer is waiting on). Scaling without diagnosing tends to push the problem one layer downstream.

“Increase the buffer size.” Buys time but does not fix the imbalance. A bigger buffer takes longer to fill, but if the consumer is permanently slower than the producer, the buffer fills eventually. Bigger buffer means more lag at the moment of overflow.

“Switch to a faster queue technology.” Kafka, Pulsar, Kinesis, RabbitMQ all handle high-throughput. The queue technology is rarely the bottleneck; the consumer is. Switching technologies takes weeks of work and usually does not solve the original problem.

What partitioning does and does not solve

Partitioning the topic across multiple workers is the standard scaling pattern. With N partitions, N consumer instances can read in parallel.

Partitioning works well when:

  • The events have a natural partition key (user_id, account_id) and ordering only matters within that key.
  • The consumer’s per-event work is independent (no shared state across events).
  • The downstream sink can handle parallel writes (most warehouses, document stores can; some can’t).

Partitioning does not help when:

  • All events need to be processed in strict global order (ledger systems).
  • The consumer’s per-event work is bottlenecked on a single shared resource (one database, one external API with rate limits).
  • The partition key produces hot partitions (one user_id with 10x the volume of others).

The wrong move is partitioning blindly because “more parallelism is better”. The right move is partitioning where it actually solves the bottleneck.

What we install on engagements

For a team running streaming pipelines:

  1. Measure consumer lag per partition with alerts on sustained growth.
  2. Document the SLA per consumer (max acceptable lag, action on breach).
  3. Define the back-pressure policy per pipeline: drop, sample, or slow.
  4. Validate partitioning matches the use case (no hot partitions, downstream can absorb).
  5. Plan for traffic spikes with explicit auto-scaling on consumer instances or with documented manual response.

Total: typically two to four engineer-weeks for a typical pipeline.

The teams that get streaming right have predictable consumer lag and explicit policies for what happens under load. The teams that do not have invisible data loss and dashboards that lie quietly until somebody notices. The work is not glamorous; the consequences of skipping it are real.

Frequently asked

Questions teams ask

What's an acceptable consumer lag?

Depends on the use case. For real-time dashboards, under 30 seconds. For analytics ETL, under 30 minutes. For audit logs, under a few hours. Set the SLA explicitly per consumer; without it, lag drifts upward unnoticed.

Should we always partition for parallelism?

Not always. Partitioning helps when the consumer can scale horizontally; it hurts when ordering matters across partitions. Pick by what the downstream needs, not by Kafka folklore.

When should I move to event sourcing?

When the audit trail of state changes is more valuable than the current state itself, and when you have the team to maintain the additional complexity. Event sourcing is not a streaming-pipeline solution; it is a data-modelling choice with streaming consequences.