Detecting Flag Configuration Drift Across Environments

This how-to is part of Multi-Environment Flag Promotion Pipelines.

Configuration drift happens when flag state quietly diverges between dev, staging, and production without anyone noticing. A flag that is on in staging and off in production is a silent time bomb: you ship code that was tested against a world that production never sees. This guide shows you how to export resolved flag state from each environment, diff it programmatically, gate deployments on a clean diff in CI, and fire an alert when unexpected drift surfaces.

Prerequisites

Access to each environment’s flag evaluation API or management API (dev, staging, prod endpoints and tokens available as environment variables or CI secrets)
Python 3.10+ and jq installed in your CI runner image
A flag taxonomy with consistent key naming using the namespace.service.feature schema — drift detection relies on identical keys across environments
A drift allowlist file committed to the repo (even if initially empty) that lists flags intentionally different per environment
A Slack webhook URL (or any HTTP webhook) stored as a CI secret for drift alerts

What flag configuration drift looks like

The matrix below shows four flags across three environments. Green cells match; orange cells have drifted from the expected baseline.

Four flags across three environments. Burnt-orange cells mark values that diverge from the dev baseline. checkout.payments.express-pay was promoted to staging and prod before dev was updated — classic drift.

Step 1 — Export each environment’s resolved flag state

Call the flag management API for each environment and write normalized JSON snapshots to disk. Resolved state means the flat map of flag key → current default variant, not the raw config file — you want what evaluation actually returns.

#!/usr/bin/env bash
# scripts/export-flag-state.sh
# Requires: FLAG_API_TOKEN_DEV, FLAG_API_TOKEN_STG, FLAG_API_TOKEN_PRD
# and the base URLs for each environment's flagd-compatible REST API.

set -euo pipefail

EXPORT_DIR="${FLAG_EXPORT_DIR:-/tmp/flag-exports}"
mkdir -p "$EXPORT_DIR"

export_env() {
  local env="$1"
  local base_url="$2"
  local token="$3"
  local out="$EXPORT_DIR/${env}.json"

  echo "Exporting $env → $out"
  curl --silent --fail \
    --header "Authorization: Bearer $token" \
    --header "Accept: application/json" \
    "${base_url}/flags?resolved=true" \
    | jq 'to_entries | map({key: .key, value: .value.defaultVariant}) | from_entries' \
    > "$out"
}

export_env "dev"     "$FLAG_API_URL_DEV"  "$FLAG_API_TOKEN_DEV"
export_env "staging" "$FLAG_API_URL_STG"  "$FLAG_API_TOKEN_STG"
export_env "prod"    "$FLAG_API_URL_PRD"  "$FLAG_API_TOKEN_PRD"

echo "Exports written to $EXPORT_DIR"

The jq filter strips everything except the flag key and its current default variant. Percentage-rollout weights and targeting rules are deliberately excluded here — they are not stable across environments and are handled separately (see Gotchas). This connects directly to how multi-environment promotion pipelines treat each environment as an independent source of truth.

Step 2 — Normalize and diff the exported configs

Load all three snapshots, build a unified key set, and emit a structured diff. The script outputs clean JSON so downstream steps can parse it without fragile string matching.

#!/usr/bin/env python3
# scripts/diff_flags.py
# Usage: python scripts/diff_flags.py /tmp/flag-exports/
# Exits 0 with drift=[] when clean, exits 1 when drift found (unless --json-only).

import json
import sys
from pathlib import Path

ENVS = ["dev", "staging", "prod"]
BASELINE = "dev"

def load(export_dir: Path, env: str) -> dict:
    p = export_dir / f"{env}.json"
    with p.open() as f:
        return json.load(f)

def diff_flags(export_dir: Path) -> list[dict]:
    states = {env: load(export_dir, env) for env in ENVS}
    all_keys = sorted(set().union(*[s.keys() for s in states.values()]))
    drift = []
    for key in all_keys:
        values = {env: states[env].get(key, "__missing__") for env in ENVS}
        baseline_val = values[BASELINE]
        drifted_envs = {
            env: val for env, val in values.items()
            if env != BASELINE and val != baseline_val
        }
        if drifted_envs:
            drift.append({
                "flag": key,
                "baseline": {BASELINE: baseline_val},
                "drifted": drifted_envs,
            })
    return drift

if __name__ == "__main__":
    export_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("/tmp/flag-exports")
    json_only = "--json-only" in sys.argv
    result = diff_flags(export_dir)
    print(json.dumps({"drift": result}, indent=2))
    if result and not json_only:
        sys.exit(1)

Every entry in drift names the flag key, what value the baseline (dev) holds, and which environments differ and by how much. The structured output makes it trivial for the CI step to read and for your audit trail to ingest.

Step 3 — Assert no unexpected drift in CI

Run the export and diff as part of every deployment gate. Flags in the allowlist are intentionally different across environments — the check subtracts them before deciding whether to fail.

Drift allowlist (config/flag-drift-allowlist.yaml):

# Flags that are INTENTIONALLY different across environments.
# Each entry must include a reason and an optional expiry date.
# Remove entries once the flag has been promoted to all environments.
allowlist:
  - flag: checkout.payments.express-pay
    reason: "Staged rollout — prod promotion scheduled for 2026-07-01"
    expires: "2026-07-15"
  - flag: platform.search.semantic-rerank
    reason: "Prod hold pending infrastructure capacity review"
    expires: "2026-06-30"

GitHub Actions step (.github/workflows/deploy.yml excerpt):

- name: Export flag state from all environments
  env:
    FLAG_API_URL_DEV:  ${{ secrets.FLAG_API_URL_DEV }}
    FLAG_API_URL_STG:  ${{ secrets.FLAG_API_URL_STG }}
    FLAG_API_URL_PRD:  ${{ secrets.FLAG_API_URL_PRD }}
    FLAG_API_TOKEN_DEV: ${{ secrets.FLAG_API_TOKEN_DEV }}
    FLAG_API_TOKEN_STG: ${{ secrets.FLAG_API_TOKEN_STG }}
    FLAG_API_TOKEN_PRD: ${{ secrets.FLAG_API_TOKEN_PRD }}
    FLAG_EXPORT_DIR: /tmp/flag-exports
  run: bash scripts/export-flag-state.sh

- name: Diff flag configs and check against allowlist
  run: |
    python scripts/diff_flags.py /tmp/flag-exports --json-only > /tmp/drift.json

    # Extract allowlisted keys
    ALLOWED=$(python - <<'PYEOF'
    import yaml, json
    with open("config/flag-drift-allowlist.yaml") as f:
        al = yaml.safe_load(f)
    keys = [e["flag"] for e in al.get("allowlist", [])]
    print(json.dumps(keys))
    PYEOF
    )

    # Subtract allowed drift; fail on anything unexpected
    python - "$ALLOWED" <<'PYEOF'
    import json, sys

    with open("/tmp/drift.json") as f:
        drift = json.load(f)["drift"]

    allowed = json.loads(sys.argv[1])
    unexpected = [d for d in drift if d["flag"] not in allowed]

    if unexpected:
        print("UNEXPECTED FLAG DRIFT DETECTED:")
        print(json.dumps(unexpected, indent=2))
        sys.exit(1)
    else:
        print(f"Drift check passed. {len(drift) - len(unexpected)} allowlisted differences ignored.")
    PYEOF

The allowlist is version-controlled alongside your flag taxonomy, so every intentional difference has a paper trail that feeds into compliance reporting. This is the enforcement layer for the broader feature flag architecture lifecycle.

Step 4 — Alert and open a ticket on detected drift

When the CI check finds unexpected drift, fire a Slack notification and optionally open a GitHub issue so it lands in someone’s queue rather than disappearing into a failed build log.

#!/usr/bin/env bash
# scripts/alert-drift.sh
# Called only when unexpected drift is confirmed.
# Requires: SLACK_WEBHOOK_URL, GITHUB_TOKEN, GITHUB_REPO (owner/repo)

set -euo pipefail

DRIFT_JSON="${1:-/tmp/drift.json}"
DRIFTED_FLAGS=$(python3 -c "
import json, sys
with open('$DRIFT_JSON') as f:
    d = json.load(f)['drift']
print(', '.join(x['flag'] for x in d))
")

# 1. Slack alert
curl --silent --fail -X POST "$SLACK_WEBHOOK_URL" \
  --header "Content-Type: application/json" \
  --data "$(jq -n \
    --arg flags "$DRIFTED_FLAGS" \
    --arg run  "${GITHUB_SERVER_URL:-https://github.com}/${GITHUB_REPOSITORY:-unknown}/actions/runs/${GITHUB_RUN_ID:-0}" \
    '{
      text: ("*Flag config drift detected* :rotating_light:\nDrifted flags: `" + $flags + "`\nCI run: " + $run)
    }')"

# 2. Open a GitHub issue (idempotent: checks for existing open issue first)
EXISTING=$(curl --silent \
  -H "Authorization: Bearer $GITHUB_TOKEN" \
  "https://api.github.com/repos/${GITHUB_REPO}/issues?labels=flag-drift&state=open" \
  | jq 'length')

if [ "$EXISTING" -eq 0 ]; then
  curl --silent --fail -X POST \
    -H "Authorization: Bearer $GITHUB_TOKEN" \
    -H "Content-Type: application/json" \
    "https://api.github.com/repos/${GITHUB_REPO}/issues" \
    --data "$(jq -n \
      --arg title "Flag config drift: $DRIFTED_FLAGS" \
      --arg body  "Unexpected flag configuration drift detected in CI.\n\nDrifted flags:\n\`\`\`\n$(cat "$DRIFT_JSON")\n\`\`\`\n\nAdd to \`config/flag-drift-allowlist.yaml\` if intentional, or promote the flag to resolve." \
      '{title: $title, body: $body, labels: ["flag-drift"]}')"
fi

Combine this with the backend evaluation runtime’s own flag-change event stream if you want sub-CI-cycle alerting for production changes that bypass your deployment process.

Verification

Run this command to confirm the diff exits cleanly. It exits 0 on no unexpected drift and non-zero (printing the offending flags) when drift is present:

bash scripts/export-flag-state.sh && \
  python scripts/diff_flags.py /tmp/flag-exports --json-only > /tmp/drift.json && \
  python - < /tmp/drift.json <<'EOF'
import json, sys, yaml
drift = json.load(sys.stdin)["drift"]
with open("config/flag-drift-allowlist.yaml") as f:
    allowed = {e["flag"] for e in yaml.safe_load(f).get("allowlist", [])}
unexpected = [d for d in drift if d["flag"] not in allowed]
if unexpected:
    print("DRIFT:", json.dumps(unexpected, indent=2)); sys.exit(1)
print("Clean — no unexpected drift.")
EOF

A clean environment prints Clean — no unexpected drift. and exits 0. Any unexpected divergence prints the structured diff and exits 1, which fails the CI step.

Gotchas & edge cases

Intentional per-environment differences must be in the allowlist before you run the check. A flag that is correctly off in production while you test it in staging is not drift — but the tool cannot know that without an explicit allowlist entry. Ship the allowlist update in the same PR that changes the flag in staging.
Percentage rollout weights will always diverge — exclude them from the export. The jq filter in Step 1 extracts only defaultVariant. If your API returns the full rule set including percentage splits (e.g. "rollout": {"percentage": {"on": 30, "off": 70}}), extend the filter to strip those fields before writing the snapshot, or the diff will be permanently noisy on any flag with a gradual rollout.
Context-level overrides and environment-specific targeting rules are not config drift. A flag that serves on to users in a beta segment only in prod is behaving correctly. The export captures the default variant, not segment-specific resolutions. If you also export rule sets, add a separate normalization pass that redacts environment-scoped targeting rules before diffing.

Troubleshooting & FAQ {#faq}

How do I distinguish intentional environment differences from real drift?

Intentional differences belong in config/flag-drift-allowlist.yaml with a reason and an expiry date. The CI check subtracts allowlisted flags before deciding whether to fail. If a difference is not in the allowlist, it is unexpected by definition. Treat the allowlist as the canonical record of “in-flight promotions” — it should shrink over time as flags are promoted to prod, not grow indefinitely. If you find entries without expiry dates or with stale dates, that is a signal the flag is overdue for cleanup.

My CI drift check is flaky — what causes false positives?

Three common causes: (1) The flag API returns timestamps or generated metadata in the default response — extend the jq strip filter to remove those fields. (2) The export script calls three different API endpoints in rapid succession and one environment is mid-deploy with a partially-applied config — add a short sleep 3 between exports or retry logic. (3) Percentage-rollout fields are included in the export — see the Gotchas section above. Run the export twice in a row and diff the two outputs against each other; if they match, the flakiness is environmental rather than in the script.

How often should I run drift detection?

At minimum, gate every deployment on a clean drift check — that catches drift before it reaches production. For higher confidence, run detection on a schedule (every 30–60 minutes) against live environments using a CI cron job or a lightweight scheduled script. Out-of-band changes to flag state (someone toggled a flag in the management UI between deployments) will only surface on the scheduled run, not the deployment gate. Combine both approaches: the deployment gate blocks bad promotions; the scheduled run catches console-level overrides.