Managing Flag Deprecation & Cleanup

This guide is part of the Feature Flag Architecture & Lifecycle Management series.

Every feature flag that ships eventually needs to retire. A flag that served its purpose — gating a rollout, guarding an experiment, backing an ops circuit breaker — becomes dead weight once the decision it encoded is no longer reversible. Left in place, it adds evaluation overhead to every request, bloats your configuration registry, and accumulates into the kind of flag debt that makes on-call shifts unpredictable. This guide covers the full retirement workflow: surfacing stale flags from usage telemetry, scoring and triaging candidates, quarantining safely, removing code references through normal PR review, and purging the registry and downstream caches.

What this guide does not cover: creating flags or choosing evaluation strategies (see backend evaluation & server-side SDKs), designing a metadata schema to prevent proliferation in the first place (see flag taxonomy), or structuring canary rollouts before a flag reaches retirement (see progressive delivery workflows).


Prerequisites

Before starting the retirement workflow, confirm the following:


Flag Lifecycle: Active → Stale → Deprecated → Removed

The diagram below shows the four states a flag moves through and the gates between them. Each transition requires a deliberate action — no flag moves backward from Deprecated to Active without a documented exception.

Flag deprecation lifecycle state machine Four states — Active, Stale, Deprecated, Removed — connected by labeled transition arrows showing the retirement workflow. Active flag in use Stale no evals ≥ N days Deprecated quarantined, default set Removed registry deleted zero traffic scored & triaged code cleaned Step 1: instrument Step 2: score & triage Step 3–4: quarantine Step 5: delete & purge Flag Deprecation Lifecycle Four states — each transition is a deliberate gate, not an automatic promotion exception: reactivate
Flag deprecation lifecycle. The dashed gold arc shows the exception path back to Active — requires a documented rationale in the audit log.

Step-by-Step Implementation

Step 1. Instrument Telemetry to Surface Stale Flags

The first gate — Active to Stale — fires when a flag records no evaluations for a configurable window (commonly 14–30 days). Wire your OpenFeature provider to emit an evaluation event on every resolve call, then aggregate by flag key.

# Python · openfeature-sdk + flagd provider
from openfeature import api
from openfeature.contrib.provider.flagd import FlagdProvider
from opentelemetry import metrics

api.set_provider(FlagdProvider())
client = api.get_client()

meter = metrics.get_meter("flagd.evaluations")
eval_counter = meter.create_counter(
    name="feature_flag.evaluations",
    description="Count of feature flag evaluation calls",
    unit="1",
)

def resolve_with_telemetry(flag_key: str, default: bool, ctx: dict) -> bool:
    result = client.get_boolean_value(flag_key, default, ctx)
    eval_counter.add(
        1,
        {"flag.key": flag_key, "flag.value": str(result), "service.name": "checkout"},
    )
    return result

# Usage
enabled = resolve_with_telemetry("checkout.payments.express-pay", False, {"user_id": "u123"})

In your observability backend, build a query that surfaces flags with zero evaluation events over the staleness window:

-- Prometheus PromQL equivalent (adapt to your TSDB)
-- Returns flag keys with no evaluations in the last 21 days
absent_over_time(
  feature_flag_evaluations_total{flag_key=~"checkout\\..+|web\\..+|api\\..+"}[21d]
)

Pitfall: Absence of evaluations in your metrics pipeline does not always mean the flag is dead — it may mean the telemetry export is broken. Cross-check against raw SDK logs and ensure your polling vs streaming flag synchronization health metrics are green before marking a flag stale.


Step 2. Score and Triage Candidates (Flag-Debt Metrics)

Not all stale flags carry the same risk. Score each candidate on three axes before committing to removal:

Axis Signal Score (0–3)
Age Days since last evaluation 0 = <14 d · 1 = 14–60 d · 2 = 60–180 d · 3 = >180 d
Reach Services referencing the key 3 = 1 service · 2 = 2–5 · 1 = 6–10 · 0 = >10
Criticality Flag type (ops kill switch vs. experiment) Kill switch = 0 · Ops = 1 · Experiment = 2 · Release = 3

Flags scoring 7 or higher are safe to advance to Deprecated immediately. Flags scoring below 4 need an owner review before any action.

// Node.js · @openfeature/server-sdk
// Fetch the flag registry manifest and compute debt scores

import { OpenFeature } from "@openfeature/server-sdk";
import { FlagdProvider } from "@openfeature/flagd-provider";

await OpenFeature.setProviderAndWait(new FlagdProvider({ host: "localhost", port: 8013 }));
const client = OpenFeature.getClient("flag-auditor");

async function scoreFlagDebt(flagKey, metadata) {
  const { daysSinceLastEval, reachCount, flagType } = metadata;

  const ageScore = daysSinceLastEval > 180 ? 3 : daysSinceLastEval > 60 ? 2 : daysSinceLastEval > 14 ? 1 : 0;
  const reachScore = reachCount === 1 ? 3 : reachCount <= 5 ? 2 : reachCount <= 10 ? 1 : 0;
  const critScore = flagType === "release" ? 3 : flagType === "experiment" ? 2 : flagType === "ops" ? 1 : 0;

  const total = ageScore + reachScore + critScore;
  console.log(`${flagKey}: debt score ${total}/9 — ${total >= 7 ? "READY TO DEPRECATE" : "NEEDS REVIEW"}`);
  return total;
}

// Example: api.search.semantic-rerank flagged by telemetry
await scoreFlagDebt("api.search.semantic-rerank", {
  daysSinceLastEval: 95,
  reachCount: 2,
  flagType: "experiment",
});

Document the triage decision in your audit trail — who reviewed the flag, the computed score, and the approved timeline for removal. This record is required for SOC 2 change-management evidence.

Pitfall: Never auto-promote a flag to Deprecated based on score alone. The scoring algorithm cannot know whether an upcoming sprint plans to reactivate the flag. Always send an async notification to the flag owner and wait for acknowledgement before advancing state.


Step 3. Quarantine and Set Safe Defaults

Quarantine puts the flag into a read-only state in the registry — evaluations still work, but the flag cannot be modified except to advance to Removed. More importantly, this is when you hardcode the permanent default value into the flag definition, so that even if the registry becomes unavailable during cleanup, every service returns the correct permanent answer.

# flagd flag definition — flags.yaml
# Before quarantine
flags:
  web.dashboard.new-nav:
    state: ENABLED
    variants:
      "on": true
      "off": false
    defaultVariant: "off"

# After quarantine (deprecated state)
flags:
  web.dashboard.new-nav:
    state: DISABLED          # registry stops serving live rules
    variants:
      "on": true
      "off": false
    defaultVariant: "on"    # permanent winner — new-nav ships to all
    metadata:
      deprecatedAt: "2026-06-20"
      deprecatedBy: "platform-eng"
      scheduledRemoval: "2026-07-11"
      jiraTicket: "PLAT-4421"

After pushing the quarantined definition, verify that all services resolve to the permanent default without contacting the registry:

# Simulate provider unavailability — confirm static default survives
from openfeature import api
from openfeature.provider.no_op_provider import NoOpProvider

api.set_provider(NoOpProvider())  # returns defaults for every call
client = api.get_client()

# Should return True (our permanent default) — not raise, not return False
value = client.get_boolean_value("web.dashboard.new-nav", True, {})
assert value is True, f"Unexpected default: {value}"
print("Safe default confirmed — code is ready for registry removal")

If this assertion fails, track down the call site that is still relying on the registry to decide the answer and fix it before proceeding.

Pitfall: A kill-switch or emergency rollback can still target a Deprecated flag during a live incident. Do not advance to Removed until you have confirmed no active incident involves this flag.


Step 4. Remove Code References via PR Review

Static search across the codebase, tracked as a formal PR, is the only reliable way to remove flag references. Automated sed rewrites are fragile on complex evaluation patterns — use grep or an AST-aware tool to locate every call site, then replace it with the hardcoded permanent default in a human-reviewed commit.

#!/usr/bin/env bash
# Find every source file referencing the deprecated flag key
FLAG_KEY="web.dashboard.new-nav"
REPO_ROOT="$(git rev-parse --show-toplevel)"

echo "=== References to ${FLAG_KEY} ==="
grep -rn --include="*.ts" --include="*.tsx" --include="*.py" --include="*.go" \
  "${FLAG_KEY}" "${REPO_ROOT}/src" "${REPO_ROOT}/services"

Open a PR for each affected service. The PR description should state:

  1. The flag key being retired
  2. The permanent replacement value
  3. A link to the deprecation triage record in the audit log
  4. Confirmation that the flag is in quarantine state (registry will not serve live rules)

Add an OPA policy to your CI pipeline to block any future PR that re-introduces the flag key:

# OPA policy — ci/policies/flag_cleanup.rego
package flag.cleanup

import rego.v1

deprecated_keys := {"web.dashboard.new-nav", "api.search.semantic-rerank"}

deny contains msg if {
  some file in input.changed_files
  some key in deprecated_keys
  contains(file.content, key)
  msg := sprintf("PR blocked: reference to retired flag '%s' in %s", [key, file.path])
}

Pitfall: Monorepo builds may have generated files — GraphQL schema snapshots, OpenAPI docs, protobuf outputs — that include flag keys. Ensure your grep searches compiled and generated directories, not just src/.


Step 5. Delete from Registry and Purge Caches

Once all PRs from Step 4 are merged and deployed, the flag key no longer exists in production code. Now it is safe to delete the registry entry and force downstream caches to evict the stale definition.

// Node.js — delete flag via flagd REST management API, then bust SDK caches
import fetch from "node-fetch";

const FLAG_KEY = "web.dashboard.new-nav";
const FLAGD_HOST = process.env.FLAGD_HOST ?? "http://localhost:8013";

// 1. Delete from flagd configuration store
const del = await fetch(`${FLAGD_HOST}/flagd.evaluation.v1.Management/DeleteFlag`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ key: FLAG_KEY }),
});

if (!del.ok) throw new Error(`Registry delete failed: ${del.status}`);
console.log(`Registry entry deleted: ${FLAG_KEY}`);

// 2. Compute new ETag for the updated manifest
const manifest = await fetch(`${FLAGD_HOST}/flags`).then(r => r.json());
const etag = Buffer.from(JSON.stringify(manifest)).toString("base64").slice(0, 16);

// 3. Broadcast cache invalidation to SDK instances via your event bus
await fetch(process.env.CACHE_INVALIDATION_ENDPOINT, {
  method: "POST",
  headers: { "Content-Type": "application/json", "X-Service": "flag-registry" },
  body: JSON.stringify({ action: "evict", keys: [FLAG_KEY], etag }),
});

console.log(`Cache invalidation broadcast. New manifest ETag: ${etag}`);

After deletion, monitor your evaluation error rate for 30 minutes. Any FlagNotFoundError surfacing now indicates a service that was missed in Step 4 — roll it back to the quarantined definition, fix the missed reference, and re-attempt.

Pitfall: Edge-cached flag manifests at CDN PoPs may survive a registry delete for several minutes. If your architecture uses a CDN in front of flagd, issue an explicit cache purge call before declaring cleanup complete.


Verification & Testing

Run the following checks immediately after completing Step 5:

1. Evaluation error rate baseline. Query your metrics backend for feature_flag.evaluation.errors on the deleted key. Zero errors for 30 minutes after registry deletion confirms success.

2. Static analysis clean pass. Re-run the grep from Step 4 across all repositories. It must return no output.

3. Registry integrity check. Fetch the full flag manifest from flagd and confirm the key is absent:

curl -s http://localhost:8013/flags | jq '.flags | keys | map(select(startswith("web.dashboard")))'
# Expected: [] — no matching keys remain

4. OPA gate smoke test. Submit a test PR that contains the deleted flag key and confirm the policy in Step 4 blocks it before merge.


Troubleshooting & FAQ {#faq}

Why am I seeing FlagNotFoundError spikes after deleting from the registry? {#faq-not-found}

This almost always means Step 4 (code reference removal) was incomplete. A service still holds a call to client.get_boolean_value("web.dashboard.new-nav", ...) and is now receiving a not-found response from flagd rather than the SDK default. To recover quickly: restore the quarantined flag definition to the registry (do not mark it active — keep state: DISABLED), which causes the SDK to return the default variant rather than throw. Then finish the code cleanup and re-attempt deletion.

How long should a flag stay in the Stale state before we score it? {#faq-stale-window}

A minimum of 14 days of zero evaluations is a reasonable threshold for low-risk experiment flags. For release flags and ops toggles, extend the window to 30 days and require two consecutive weeks of zero traffic to rule out sampling gaps or dark launch traffic. Teams with sparse evaluation telemetry (batch jobs, overnight cron tasks) should use 60 days to avoid false positives.

Can we delete a flag that is referenced in an active kill switch playbook? {#faq-kill-switch}

No. If a flag is named in an emergency kill-switch and instant-rollback runbook, remove it from the runbook first, update the on-call documentation, and notify the incident response team before advancing the flag to Removed. Deleting a flag that your runbook expects to toggle during a live incident is operationally dangerous.


Performance & Scale

Flag retirement directly reduces evaluation latency. Each active flag in a flagd ruleset adds a rule-matching pass on every evaluation call. At 10,000 evaluations per second across a fleet, trimming 50 stale flags from a 500-flag ruleset reduces the per-request rule matching work by 10% — measurable at P99.

For very large registries (>1,000 flags), batch the deletion step: remove no more than 50 flags per deployment window, monitor for 24 hours, then proceed with the next batch. This pacing limits the blast radius if a deletion uncovers a missed code reference.

Cache invalidation at scale deserves its own attention. If your architecture fans out flag state to thousands of SDK instances via a pub/sub channel, a mass invalidation event can cause a thundering-herd effect as all instances refetch the manifest simultaneously. Introduce a jittered delay (50–500 ms random) on the client-side re-fetch to spread the load:

import random, time, asyncio

async def on_cache_invalidation(flag_key: str):
    jitter_ms = random.randint(50, 500)
    await asyncio.sleep(jitter_ms / 1000)
    # re-fetch flag manifest from flagd
    await refresh_flag_cache()