Server-Side SDK Integration Patterns
This guide is part of the Backend Evaluation & Server-Side SDKs series. It covers the integration patterns that make a feature flag SDK production-ready: correct initialization sequencing, dependency injection, middleware placement, resilience under control-plane failure, and the observability hooks that let you answer “what variant did this request get?” during an incident.
Server-side evaluation keeps targeting logic and segmentation rules inside your trust boundary. The SDK downloads a compiled rule set once and resolves flags in-process at sub-millisecond cost. This guide does not cover which sync transport to use (polling or streaming) or how to structure a distributed cache across nodes — those decisions are covered in their own deep-dives.
Prerequisites
@openfeature/server-sdk,openfeaturePython package, or the Go module)owner,created, andexpirypopulated per your flag taxonomy
Core Concept & Architecture
OpenFeature’s provider abstraction separates the evaluation API from the vendor implementation. Your application code calls one interface (client.BooleanValue, client.StringValue) and the provider behind it handles syncing, caching, and reconnection. Swap providers without touching business logic.
The initialization sequence is strict: the provider must finish its first rule download before the container signals readiness. Any evaluation that runs before the provider is Ready returns the code default with errorCode: PROVIDER_NOT_READY — a footgun that silently serves wrong variants in staging and never shows up in tests.
| State | Evaluation behavior | Health probe |
|---|---|---|
| Connecting | Returns code defaults | Fails readiness |
| Ready | In-process rule evaluation | Passes |
| Stale | Last-known-good variants | Passes liveness, fails readiness |
| Closed | Panics / no-ops | Fails both |
Step-by-Step Implementation
Step 1 — Bootstrap the provider before accepting traffic
Initialize the SDK in your startup sequence and block readiness until the provider signals it has loaded its first rule set. An idempotent guard prevents duplicate initialization during hot-reloads.
// startup.ts
import { OpenFeature } from '@openfeature/server-sdk';
import { FlagdProvider } from '@openfeature/flagd-provider';
let initialized = false;
export async function bootstrapFlags(): Promise<void> {
if (initialized) return; // idempotent — safe to call on hot-reload
const provider = new FlagdProvider({
host: process.env.FLAGD_HOST!,
port: Number(process.env.FLAGD_PORT ?? 8013),
tls: process.env.NODE_ENV === 'production',
});
// Blocks until the first rule set is downloaded; throws on timeout
await OpenFeature.setProviderAndWait(provider);
initialized = true;
process.on('SIGTERM', async () => {
await OpenFeature.close(); // drain streams, flush telemetry
process.exit(0);
});
}
Pitfall: calling setProvider (without AndWait) lets the process start accepting traffic before rules are available. Every evaluation in that window returns the code default and emits PROVIDER_NOT_READY. Wire your readiness probe to the provider state, not just to the HTTP port binding.
Step 2 — Register the client as a singleton
Register the OpenFeature client as a singleton in your service container. Scoped or transient instances create multiple connection pools, multiplying connection handshakes and bypassing the shared rule-set cache.
// Program.cs (.NET)
builder.Services.AddSingleton<IFeatureClient>(sp => {
var cfg = sp.GetRequiredService<IConfiguration>();
OpenFeature.Api.Instance.SetProvider(
new FlagdProvider(new FlagdConfig {
Host = cfg["FLAGD_HOST"],
Port = int.Parse(cfg["FLAGD_PORT"] ?? "8013"),
})
);
return OpenFeature.Api.Instance.GetClient("api");
});
# deps.py (FastAPI)
from openfeature import api
from openfeature.provider.flagd import FlagdProvider
import os
_client = None
def get_flag_client():
global _client
if _client is None:
api.set_provider(FlagdProvider(
host=os.environ["FLAGD_HOST"],
port=int(os.environ.get("FLAGD_PORT", "8013")),
))
_client = api.get_client("api")
return _client
Pitfall: in languages with async runtimes (asyncio, Tokio), a module-level singleton can initialize on multiple threads simultaneously. Use a lock or an async-once primitive.
Step 3 — Evaluate in middleware, not in handlers
Resolve the flags your handler needs at the request boundary in a middleware or interceptor. This keeps handler logic free of SDK calls, lets you batch evaluations, and ensures evaluation context is assembled once with a consistent snapshot of the request attributes.
# middleware.py (FastAPI)
from fastapi import Request
from contextvars import ContextVar
from openfeature.evaluation_context import EvaluationContext
from .deps import get_flag_client
_flags: ContextVar[dict] = ContextVar("request_flags")
async def flag_middleware(request: Request, call_next):
client = get_flag_client()
ctx = EvaluationContext(
targeting_key=request.headers.get("X-User-ID", "anonymous"),
attributes={
"tenantTier": request.state.tenant_tier,
"region": request.headers.get("CF-IPCountry", "unknown"),
}
)
flags = {
"api.search.semantic-rerank": client.get_boolean_value("api.search.semantic-rerank", False, ctx),
"api.checkout.express-pay": client.get_boolean_value("api.checkout.express-pay", False, ctx),
}
_flags.set(flags)
return await call_next(request)
The evaluation context assembled here is the input to the rule engine; keep it consistent across a request to avoid split evaluations. See the rule engine performance guide for batching strategies under high concurrency.
Pitfall: evaluating inside a database transaction or after acquiring a lock prolongs the critical section. Resolve flags before entering any lock.
Step 4 — Wrap every evaluation in a circuit breaker
Isolate the provider from the rest of your service. A circuit breaker opens after a threshold of evaluation errors and returns a safe default until the provider recovers — so a degraded control plane cannot take down unrelated request paths.
// resilience.go
import (
"context"
"time"
"github.com/sony/gobreaker"
"go.opentelemetry.io/otel/attribute"
openfeature "github.com/open-feature/go-sdk/pkg/openfeature"
)
var cb = gobreaker.NewCircuitBreaker(gobreaker.Settings{
Name: "flag-provider",
MaxRequests: 1,
Interval: 10 * time.Second,
Timeout: 30 * time.Second,
ReadyToTrip: func(counts gobreaker.Counts) bool {
return counts.ConsecutiveFailures >= 5
},
})
func BoolFlag(ctx context.Context, client openfeature.IClient, key string, defaultVal bool) bool {
result, err := cb.Execute(func() (interface{}, error) {
evalCtx, cancel := context.WithTimeout(ctx, 50*time.Millisecond)
defer cancel()
val, err := client.BooleanValue(evalCtx, key, defaultVal,
openfeature.EvaluationContext{})
return val, err
})
if err != nil {
return defaultVal // safe default on open circuit
}
return result.(bool)
}
Pitfall: a timeout that is too generous (>100ms) hides provider latency and can cascade into your P99 budget. Keep evaluation timeouts at 20–50ms; the rule engine should resolve in well under 1ms on warm state.
Step 5 — Emit structured evaluation telemetry
Attach evaluation metadata to every trace span and emit a structured log entry. This makes it possible to answer “which variant did request X get?” from a trace rather than re-running targeting logic post-incident.
// telemetry.ts
import { OpenFeature, Hook, EvaluationDetails } from '@openfeature/server-sdk';
import { trace } from '@opentelemetry/api';
const telemetryHook: Hook = {
after(hookContext, evaluationDetails: EvaluationDetails<unknown>) {
const span = trace.getActiveSpan();
if (span) {
span.setAttributes({
'feature_flag.key': hookContext.flagKey,
'feature_flag.variant': String(evaluationDetails.value),
'feature_flag.reason': evaluationDetails.reason ?? 'UNKNOWN',
});
}
},
error(hookContext, err) {
// flag.stream.error counter lives here
}
};
OpenFeature.addHooks(telemetryHook);
Align attribute names with the OpenTelemetry semantic conventions for feature flags so your tracing backend can correlate them automatically.
Verification & Testing
Confirm the SDK reaches Ready before your container passes its readiness check, and that evaluation fails safe on provider error:
# After startup, confirm the provider state via your debug endpoint
curl -s http://localhost:3000/healthz | jq '.provider_state'
# expect: "READY"
# Simulate provider failure: block egress to the control plane
iptables -A OUTPUT -p tcp --dport 8013 -j DROP
# Confirm evaluation returns safe defaults
curl -s http://localhost:3000/debug/flags/api.search.semantic-rerank | jq .
# expect: { "variant": false, "reason": "DEFAULT", "errorCode": "GENERAL" }
For reconnection behavior after the block lifts, verify a full resync fires rather than relying on a delta. The exponential backoff for SDK reconnection how-to covers testing the exact backoff curve.
Troubleshooting & FAQ
Why do evaluations return the default variant right after a deploy?
The provider has not reached Ready yet. Your container probably signals readiness before setProviderAndWait resolves. Confirm by checking the evaluation reason field: PROVIDER_NOT_READY with a DEFAULT value is the exact signature. Fix by delaying the HTTP listener start until after provider initialization, or by returning 503 from your readiness probe until the provider state is READY.
How do I test flag behavior without hitting a real control plane?
Use an in-memory provider in tests:
import { InMemoryProvider } from '@openfeature/server-sdk';
await OpenFeature.setProviderAndWait(new InMemoryProvider({
'api.search.semantic-rerank': { defaultVariant: 'on', variants: { on: true, off: false } },
}));
This keeps CI fast and deterministic without a flagd process.
Can I use the SDK in a serverless function?
Yes, but initialization cost matters. In a cold start, setProviderAndWait adds the full rule-download latency to your first invocation. Consider a short-polling provider with a tight timeout (2–3s), or pre-warm by bootstrapping the SDK in the global scope outside the handler so subsequent warm invocations reuse the state.
How do I confirm the circuit breaker is actually protecting the path?
Expose the circuit breaker state in your metrics: flag.provider.circuit_state (0=closed, 1=open, 2=half-open). Alert on sustained open state — it means evaluation has been falling back to defaults for at least Timeout seconds, which warrants a look at control-plane health.
Performance & Scale
At hundreds of concurrent requests, the per-evaluation cost is the rule engine’s in-process lookup — typically under 1ms. The SDK’s internal rule set is read-only after initialization, so no locking is needed on the evaluation path. Connection overhead is bounded by the number of provider instances (one per process with the singleton pattern). Horizontal scaling adds connections linearly; that is expected and cheap compared to per-request network evaluation.
For cache topology across a fleet, the local in-process rule set is already the first-level cache. See distributed caching for flag evaluations if you need a second-level shared cache to reduce control-plane connection count.