Automated Flag Cleanup Scripts for Stale Toggles

Q: Why does the quarantine script report flags as stale but the registry shows recent evaluation timestamps?

The most common cause is a timezone mismatch: the registry stores UTC timestamps but the comparison script constructs datetime.now() in local time. Always use datetime.now(datetime.timezone.utc) for the cutoff. A secondary cause is provider telemetry batching, which can delay timestamp writes by several minutes.

Q: How do I exclude flags from the cleanup pipeline permanently?

Add a cleanup_exempt: true field to the flag metadata in the registry and filter it out in the telemetry query step. Common exemptions include kill switches and flags that gate seasonal features. Document the exemption reason in the flag owner_notes field.

This how-to is part of the Managing Flag Deprecation & Cleanup guide, which covers the full retirement lifecycle from detection through deletion. Here you will build and schedule a five-stage pipeline that identifies zero-traffic toggles, confirms they are absent from the codebase, quarantines them in the registry, routes a tracked pull request for code removal, and permanently deletes the flag once the PR has merged.

Orphaned flags left in production registries inflate configuration drift, slow evaluation engines, and obscure the true state of your system. An automated pipeline reduces that debt without requiring engineers to manually audit hundreds of flag keys each quarter.

Prerequisites

OpenFeature Python SDK (openfeature-sdk >= 1.2) installed and a server-side provider configured — see server-side SDK integration patterns for provider wiring
Flag registry exposes a REST API with GET /flags, PATCH /flags/{key}, and DELETE /flags/{key} endpoints
Evaluation telemetry writes last_evaluated ISO-8601 timestamps and a rolling traffic_7d counter to each flag record
jq and curl available in your CI runner environment
FLAG_API_TOKEN set as a CI secret; the service account has write permission on flag resources
Each flag key follows the namespace.service.feature schema — for example, checkout.payments.express-pay and web.dashboard.new-nav — consistent with your flag taxonomy
Git remote is accessible from CI so the pipeline can open pull requests

The five-stage cleanup pipeline: telemetry query identifies candidates, AST scan confirms code absence, quarantine disables the flag, PR review removes source references, and registry delete purges the record.

Step-by-Step Procedure

Step 1. Query evaluation telemetry for zero-traffic flags

Pull all flag records from the registry and filter to those whose last_evaluated timestamp is older than 30 days. The OpenFeature provider exposes provider metadata but does not itself surface flag inventory — use your registry’s management API directly for this step.

# detect_stale_flags.py
import datetime
import requests

FLAG_API = "https://api.flags.example.com/v1"
STALE_DAYS = 30


def get_stale_flag_keys(api_token: str) -> list[str]:
    """Return flag keys with zero evaluation traffic for > STALE_DAYS days."""
    headers = {"Authorization": f"Bearer {api_token}"}
    flags = requests.get(f"{FLAG_API}/flags", headers=headers).json()

    cutoff = datetime.datetime.now(datetime.timezone.utc) - datetime.timedelta(days=STALE_DAYS)
    stale = []
    for flag in flags:
        last_eval = datetime.datetime.fromisoformat(flag["last_evaluated"])
        # Require both time-based staleness and confirmed zero recent traffic
        if last_eval < cutoff and flag.get("traffic_7d", 0) == 0:
            stale.append(flag["key"])

    return stale


if __name__ == "__main__":
    import os, json
    keys = get_stale_flag_keys(os.environ["FLAG_API_TOKEN"])
    print(json.dumps(keys, indent=2))

The dual condition — stale timestamp and zero recent traffic — prevents false positives from flags that were dormant during a holiday period but are still referenced in code. Save the output as a JSON file to pass between pipeline stages.

Cross-reference your flag keys against your flag taxonomy ownership metadata to verify each candidate has an assigned owner who can approve removal.

Step 2. Cross-reference flag keys against code references

A flag that no longer receives evaluation traffic might still be referenced in source code on a feature branch or in a dormant service. Run a grep scan first, then fall back to AST analysis for languages where string matching is unreliable.

#!/usr/bin/env bash
# scan_code_refs.sh — exit 1 if any stale flag key still appears in source
set -euo pipefail

STALE_KEYS_FILE="${1:-stale_flags.json}"
REPO_ROOT="${2:-.}"
FOUND=0

while IFS= read -r key; do
  # Search Python, TypeScript, Go, and YAML config files
  if grep -rq --include="*.py" --include="*.ts" --include="*.go" --include="*.yaml" \
       -- "${key}" "${REPO_ROOT}"; then
    echo "STILL REFERENCED: ${key}"
    FOUND=1
  fi
done < <(jq -r '.[]' "${STALE_KEYS_FILE}")

if [ "${FOUND}" -eq 1 ]; then
  echo "One or more stale flags still have code references. Aborting cleanup." >&2
  exit 1
fi

echo "No live code references found for stale flags."

For repositories where flag keys are constructed dynamically (e.g., f"checkout.payments.{variant}"), a static grep will miss them. In that case run an AST-level scan using ast.parse in Python or ts-morph in TypeScript to trace string templates. See optimizing rule-engine performance for context on how the rule engine resolves keys at evaluation time — the same resolution path determines what “referenced” means.

Step 3. Quarantine stale flags in the registry

Quarantine disables targeting on the flag without deleting it, preserving the key in the registry so that any missed code reference surfaces as an EvaluationError rather than silently serving the wrong default. This also keeps the key visible in your audit trail during the grace period.

# quarantine_flags.py
import datetime
import os
import requests

FLAG_API = "https://api.flags.example.com/v1"


def quarantine_flags(keys: list[str], api_token: str) -> None:
    headers = {"Authorization": f"Bearer {api_token}"}
    for key in keys:
        resp = requests.patch(
            f"{FLAG_API}/flags/{key}",
            json={
                "status": "quarantined",
                "targeting_active": False,
                "quarantined_at": datetime.datetime.now(datetime.timezone.utc).isoformat(),
                "quarantine_reason": "automated-stale-detection",
            },
            headers=headers,
        )
        resp.raise_for_status()
        print(f"Quarantined: {key}")


if __name__ == "__main__":
    import json, sys
    keys = json.loads(sys.stdin.read())
    quarantine_flags(keys, os.environ["FLAG_API_TOKEN"])

Run this script with the output of Step 1 piped in: python detect_stale_flags.py | python quarantine_flags.py. Monitor your error rate dashboards for 24 hours after quarantine — any spike in EvaluationError events for the quarantined keys means a code path was missed.

Step 4. Remove code references in a tracked pull request

This is the deliberate human-in-the-loop stage. Automated removal of source references without review is the most common cause of production regressions in cleanup pipelines. The pipeline should open a PR that lists exactly which flag keys are being removed and which files are affected, then block the deletion step until the PR merges.

#!/usr/bin/env bash
# open_cleanup_pr.sh — create a branch, remove flag constants, open PR
set -euo pipefail

STALE_KEYS_FILE="${1:-stale_flags.json}"
BRANCH="flag-cleanup/$(date +%Y%m%d)"

git checkout -b "${BRANCH}"

# Example: remove Python constant declarations for each key
while IFS= read -r key; do
  # Replace the constant assignment with a deletion comment for review
  grep -rl -- "${key}" . \
    --include="*.py" --include="*.ts" --include="*.go" \
    | xargs -I{} sed -i "s|.*${key}.*|# REMOVED: ${key} — stale flag deleted $(date +%Y-%m-%d)|g"
done < <(jq -r '.[]' "${STALE_KEYS_FILE}")

git add -A
git commit -m "chore: remove stale feature flag references

$(jq -r '.[] | "- \(.)"' "${STALE_KEYS_FILE}")

Quarantined flags with zero traffic > 30 days. Registry deletion pending PR merge."

git push origin "${BRANCH}"

# Requires gh CLI
gh pr create \
  --title "chore: remove stale feature flag code references" \
  --body "$(printf 'Flags quarantined in registry:\n\n%s\n\nReview each removal before merging. Registry deletion runs automatically after merge.' \
    "$(jq -r '.[] | "- \`\(.)\`"' "${STALE_KEYS_FILE}")")"

Pair this PR with your emergency kill-switch runbook so reviewers understand the rollback path if a removed flag turns out to still be needed. The kill-switch procedure covers re-enabling a flag instantly if a regression surfaces post-merge.

If your team uses an OpenFeature provider abstraction, also remove the provider’s flag-key constants from any shared configuration modules — those are the most commonly missed reference sites.

Step 5. Delete from registry and purge caches

Once the PR from Step 4 has merged and deployed, the pipeline can safely delete the quarantined flags from the registry. This script fetches all flags in quarantined status with confirmed zero traffic and issues idempotent DELETE requests.

#!/usr/bin/env bash
# automated-flag-cleanup.sh — delete quarantined zero-traffic flags
set -euo pipefail

BASE_URL="https://api.flags.example.com/v1"
AUTH_HEADER="Authorization: Bearer ${FLAG_API_TOKEN}"

# Fetch quarantined flags that also show zero traffic in the last 7 days
QUARANTINED_FLAGS=$(curl -sf \
  -H "${AUTH_HEADER}" \
  "${BASE_URL}/flags?status=quarantined&traffic_7d=0" \
  | jq -r '.[].key')

if [ -z "${QUARANTINED_FLAGS}" ]; then
  echo "No quarantined zero-traffic flags found. Exiting."
  exit 0
fi

for flag in ${QUARANTINED_FLAGS}; do
  echo "Deleting: ${flag}"
  curl -sf -X DELETE \
    -H "${AUTH_HEADER}" \
    -H "X-Audit-Reason: automated-stale-cleanup" \
    "${BASE_URL}/flags/${flag}"
  echo "Deleted: ${flag}"
done

# Flush any distributed cache entries for deleted keys
# Replace the line below with your cache provider's invalidation endpoint
curl -sf -X POST \
  -H "${AUTH_HEADER}" \
  -H "Content-Type: application/json" \
  -d "{\"keys\": $(echo "${QUARANTINED_FLAGS}" | jq -Rs 'split("\n") | map(select(. != ""))')}" \
  "${BASE_URL}/cache/invalidate"

echo "Cleanup and cache purge complete."

The X-Audit-Reason header writes a deletion event to your flag registry’s event log, which feeds the immutable audit record required for compliance reviews. Cache invalidation ensures that any distributed caching layer stops serving evaluation results for deleted keys immediately rather than waiting for TTL expiry.

Verification

After running the full pipeline, confirm the following before closing the cleanup ticket:

GET /flags?status=quarantined returns an empty list — all quarantined flags have been deleted.
Your error-rate dashboard shows zero EvaluationError events for the deleted flag keys over a 2-hour observation window.
The audit log contains a deleted event for each key with reason: automated-stale-cleanup.
CI passes on the default branch with the code-reference removal PR merged.

If any EvaluationError spikes appear post-deletion, the flag key is still referenced somewhere. Re-add the flag in quarantined state immediately (it will serve defaults), then repeat Step 2 with a broader file-type search.

Gotchas & Edge Cases

Shared flags across environments. A flag that shows zero traffic in production may still be active in staging or a canary environment. Query telemetry per environment and require zero traffic across all environments before quarantining. Deleting a production flag whose staging counterpart is still live causes evaluation mismatches that are hard to diagnose.
Dynamically constructed flag keys. Services that build flag keys at runtime from configuration values (e.g., f"{namespace}.{service}.{feature}") will not be caught by grep in Step 2. Add a test that asserts every runtime-constructed key resolves to a key that still exists in the registry — make it a CI gate so the cleanup pipeline surfaces the miss before deletion.
Grace period alignment with deployment cadence. The 30-day staleness threshold assumes flags are evaluated at least once per month under normal load. For services with infrequent batch execution (weekly jobs, quarterly processes), extend the threshold to match the longest reasonable gap between evaluations — otherwise you will quarantine flags that are working correctly.

Troubleshooting & FAQ

Why does the quarantine script report flags as stale but the registry shows recent evaluation timestamps?

The registry’s last_evaluated field is written by your flag provider on each evaluation event. If your provider batches telemetry writes, there can be a lag of several minutes between the actual evaluation and the timestamp update. More commonly, the issue is a timezone mismatch: the registry stores UTC but the comparison script is constructing datetime.now() in local time. Always use datetime.datetime.now(datetime.timezone.utc) for the cutoff, as shown in Step 1. Confirm the registry’s timestamp format — some providers omit the +00:00 suffix on UTC timestamps, which causes fromisoformat() to parse them as naive datetimes and the age comparison to fail silently.

The DELETE request returns 200 but evaluations still return values for the deleted flag key.

This is a cache consistency issue. Your distributed caching layer holds a snapshot of flag state that was valid at the last sync. The registry deletion does not automatically invalidate cache entries unless your provider sends a deletion event over the flag-sync transport. Run the cache invalidation step from Step 5 manually against the specific key, or force a full cache refresh on the affected services. If the problem recurs after future deletions, wire your registry’s deletion webhook to trigger cache invalidation automatically rather than relying on the cleanup script to do it inline.

How do I exclude flags from the cleanup pipeline permanently?

Add a cleanup_exempt: true field to the flag’s metadata in the registry and filter it out in Step 1: if flag.get("cleanup_exempt") or flag.get("traffic_7d", 0) > 0. Common cases for exemption include kill switches (which are intentionally dormant until needed) and flags that gate annual or seasonal features. Document the reason for exemption in the flag’s owner_notes field so future maintainers understand why the flag bypasses automated retirement.