Redis vs Memcached for Feature Flag Caching

This how-to is part of Distributed Caching for Flag Evaluations.

When p99 evaluation latency exceeds 50 ms during an incremental rollout push, the bottleneck rarely lives in the flag rule logic itself. Concurrent fetch storms and write contention on the cache layer dominate the failure profile. The choice of cache backend — Redis or Memcached — determines how gracefully your system handles the burst of simultaneous reads and writes that every rollout percentage change triggers. This guide walks through diagnosing which failure mode is active, stopping it with request coalescing, migrating contended writes to atomic Lua scripts, and wiring real-time pub/sub invalidation so stale entries never linger past a rollout change.

Prerequisites

Before following the steps below, confirm each item:

Redis vs Memcached comparison matrix for feature flag caching A five-row matrix comparing Redis and Memcached across atomic writes, pub/sub invalidation, persistence, payload structures, and best fit. Redis scores a checkmark on all five; Memcached scores a dash on the first four and a note for simple workloads on the fifth. Feature Redis Memcached Atomic writes Pub/sub invalidation Persistence (AOF/RDB) Native payload structures (JSON, hashes) Best fit Rollouts, real-time Small, infrequent flags
Redis supports atomic writes, pub/sub invalidation, and persistence; Memcached fits only simple workloads with small payloads and infrequent updates.

Step 1 — Identify the failure mode: stampede or atomicity?

Before choosing a backend, instrument the cache layer to determine which failure mode is actually present. Two distinct problems surface during rollouts: a stampede (burst of cache misses hitting the control plane simultaneously when a cached entry expires) and write contention (multiple writers racing to update the same cache key). They require different fixes, and conflating them leads to over-engineering.

Wire an evaluation telemetry hook in your server-side SDK to surface miss rates per flag key:

// TypeScript SDK telemetry hook — emit a metric on every cache miss
sdk.on('evaluation', (ctx, result) => {
  if (result.cacheStatus === 'MISS') {
    metrics.increment('flag.cache.miss', {
      flagKey: ctx.flagKey,              // e.g. api.catalog.new-search-ranking
      rolloutPhase: ctx.rolloutPercent,
    });
  }
});

Then pull raw counters from the cache tier itself to distinguish the failure mode:

# Memcached: high cas_badval → write contention; high get_misses burst → stampede
echo 'stats' | nc localhost 11211 | grep -E 'get_hits|get_misses|cas_badval'

# Redis: watch keyspace hit ratio and command rate in real time
redis-cli --stat -i 1

A sustained cas_badval rate above 5% of total writes points to write contention — multiple service replicas are each trying to CAS-update the api.catalog.new-search-ranking key on a miss and losing the race. A sudden spike in get_misses without elevated cas_badval points to a stampede from TTL expiry. Knowing which you have determines whether Step 2 alone is sufficient or whether you also need Steps 3 and 4.


Step 2 — Stop the stampede with request coalescing

Request coalescing (promise deduplication) ensures that when many concurrent requests miss the cache for the same flag key, only one upstream fetch is issued. The rest await the result of that single fetch. This works identically on both Redis and Memcached and should be the first countermeasure regardless of which backend you are running.

The following pattern applies to any async Python service evaluating the api.catalog.new-search-ranking flag across multiple concurrent request handlers:

import asyncio

_pending: dict[str, asyncio.Future] = {}

async def get_flag(key: str) -> dict:
    """Coalesce concurrent fetches for the same flag key."""
    if key in _pending:
        return await asyncio.shield(_pending[key])
    future = asyncio.get_running_loop().create_future()
    _pending[key] = future
    try:
        result = await fetch_from_backend(key)  # single upstream call
        future.set_result(result)
        return result
    except Exception as exc:
        future.set_exception(exc)
        raise
    finally:
        _pending.pop(key, None)

Any caller that arrives while a fetch for the same key is in flight receives the same Future via asyncio.shield, which protects the pending fetch from cancellation if an individual caller times out. This alone can reduce upstream calls from hundreds to one per concurrent burst. See the distributed caching overview for how this fits into a layered cache topology with an in-process LRU in front of the remote tier.


Step 3 — Replace CAS writes with Redis atomic Lua scripts

If you are running Memcached and observing sustained cas_badval contention, the root cause is architectural: CAS is optimistic and fails under many simultaneous writers. Each service replica that misses the cache for api.catalog.new-search-ranking tries to write the fresh value, but only the first CAS wins. All others retry, generating another round of upstream reads and further contention. Moving the write path to Redis EVALSHA eliminates this — the Lua script runs atomically inside the Redis process, so concurrent callers serialize against one another without retries.

The version-checked Lua script below reads the current cached value, compares its version field against the caller’s expected version, and returns either the cached payload or an error indicating staleness:

-- flag_version_check.lua  — load at startup via SCRIPT LOAD, call via EVALSHA
-- KEYS[1]: flag cache key (e.g. "{flags}:api.catalog.new-search-ranking")
-- ARGV[1]: expected version number from the control plane response
local val = redis.call('GET', KEYS[1])
if not val then return redis.error_reply('MISS') end
local parsed = cjson.decode(val)
if parsed.version ~= tonumber(ARGV[1]) then return redis.error_reply('STALE') end
return val

Because EVAL/EVALSHA blocks all other commands on the same key slot for the duration of the script, no second writer can observe a partially updated state. Pair this with the following Redis client configuration:

# redis-client.yaml — feature flag cache configuration
cache:
  provider: redis-cluster
  ttl_seconds: 300
  compression: gzip          # see Gotchas: disable if payloads < 500 B
  eval_script: flag_version_check.lua   # loaded via SCRIPT LOAD at startup
  pubsub_channel: flag_updates_v1

Load the script at application startup with SCRIPT LOAD so all replicas share the same SHA and avoid redundant script transmission. Refer to cache invalidation strategies for a full treatment of how version-checked writes compose with TTL-based and event-driven expiry.


Step 4 — Wire pub/sub invalidation for real-time flag propagation

TTL-based expiry means a cache entry can remain stale for up to its full TTL duration after a flag change — up to 300 seconds in the configuration above. For rollout workloads where a flag change should take effect within seconds across all replicas, that gap is unacceptable. Redis pub/sub closes it by broadcasting an invalidation event on every flag write, allowing all subscribers to evict the stale key immediately.

The control plane publishes to the flag_updates_v1 channel on every flag mutation. Each service replica runs the following TypeScript subscriber:

import { createClient } from 'redis';

const subscriber = createClient({ url: 'redis://cache.internal:6379' });
await subscriber.connect();

await subscriber.subscribe('flag_updates_v1', async (message) => {
  const { flagKey } = JSON.parse(message);
  // flagKey will be e.g. "api.catalog.new-search-ranking"

  // 1. Evict from Redis
  await cache.del(`{flags}:${flagKey}`);

  // 2. Evict from in-process LRU if one is present (see Gotchas)
  inProcessLRU.delete(flagKey);
});

This gives you sub-second flag propagation across every replica without any polling overhead. The tradeoff is that pub/sub delivery is fire-and-forget in Redis — if a subscriber is momentarily disconnected, it misses the event. Combine it with a short TTL (60–120 s) as a backstop so stale entries always expire even when a delivery is missed. The polling vs streaming flag synchronization guide covers the transport-layer tradeoffs between these two propagation mechanisms in detail. See also the rule engine performance guide if in-process evaluation latency is the bottleneck after the cache layer is stable.


Verification

Benchmark Redis EVALSHA against Memcached get/set under realistic load to confirm the migration improves p99 latency:

# Redis: 100k operations across GET and SET, 50 concurrent connections, 1 KB payload
redis-benchmark -h cache.internal -p 6379 -c 50 -n 100000 -t get,set -d 1024

# Memcached: equivalent load via memtier_benchmark
memtier_benchmark -h cache.internal -p 11211 \
  --protocol=memcache_text -c 50 -t 4 -n 25000

Target outcomes:

Pre-warm the cache at least 30 seconds before each rollout percentage increment to ensure the warm state is in place before traffic ramps.


Gotchas and edge cases


FAQ

Why does Memcached still show high cas_badval even after reducing rollout frequency?

CAS contention is proportional to the number of simultaneous writers, not to how often flag updates are issued. If you have twenty service replicas and all of them miss the cache for api.catalog.new-search-ranking at the same moment, all twenty attempt a CAS write and only one wins. The other nineteen increment cas_badval and retry, generating further reads. Reducing rollout frequency makes the miss event less common but does nothing to reduce contention when a miss does occur. Moving to Redis EVALSHA means the atomic Lua script serializes all writers: the first caller wins, writes the value, and all subsequent callers read the fresh entry without retrying.

Redis pub/sub delivered the invalidation event but the cache entry is still being served — why?

The pub/sub subscriber received the event and deleted the Redis key, but an in-process LRU cache sitting in front of Redis still holds the stale value. Many SDK implementations maintain a short-lived in-process cache (often a few seconds of TTL) to avoid a round-trip to Redis on every evaluation. If your pub/sub handler only calls cache.del() on the Redis key, the in-process tier continues serving the stale entry until its own TTL expires. The fix is to invalidate both tiers in the subscriber callback, as shown in Step 4 above: delete the Redis key and call inProcessLRU.delete(flagKey) in the same handler.

When is it acceptable to stay on Memcached?

Memcached remains appropriate when flag payloads are small (under 1 KB), updates are infrequent (fewer than a few per minute across all flags), and your system has no real-time invalidation requirement — a TTL-based expiry every few minutes is acceptable. In these conditions, Memcached’s lower operational footprint and superior horizontal throughput for plain get/set workloads can outweigh the richer feature set of Redis. For teams running incremental rollout workloads — where flag changes need sub-second propagation across replicas and write contention is a realistic concern — Redis is the better fit.