Optimizing Rule Engine Performance
This guide is part of the Backend Evaluation & Server-Side SDKs series. A rule engine sits on every request path in a server-side flag system: it receives an evaluation context, walks the targeting tree for a flag, and returns a variant — all while the calling thread is waiting. When that work is slow it shows up directly in your service’s p99, not just in flag metrics.
This guide covers the engine itself: how rules move from JSON to a compiled AST, how short-circuit evaluation and regex avoidance keep the hot path fast, and how to set and enforce a latency budget. It does not cover the cache topology that sits in front of the engine (see distributed caching for flag evaluations) or the transport that keeps the rule set fresh (see polling vs streaming flag synchronization).
Prerequisites
@openfeature/server-sdk,openfeaturePython, or Go equivalent)owner,created,expirypopulated per your flag taxonomyflag_eval_duration_secondswith histogram buckets
Core Concept & Architecture
Every flag evaluation request walks a path: receive context → check cache → enter the rule engine → return a variant. The rule engine phase is the only step with non-trivial complexity: it interprets a targeting rule — usually expressed as a JsonLogic document — and applies it to the evaluation context. Two sources of overhead dominate:
Parse overhead — re-reading the JSON rule definition and building an operator tree from scratch on every call. This is pure waste and belongs at initialisation, not on the hot path.
Traversal overhead — walking the operator tree with an inefficient algorithm: no short-circuiting, regex matching on large strings, or attribute lookups that hydrate a database record.
The architecture that keeps evaluation fast is:
- Parse-once, compile-to-AST — at provider init, parse every flag’s targeting rule into an in-memory AST. Store the compiled tree keyed by flag version. Re-parse only when a flag update arrives.
- Short-circuit evaluation — implement
ANDandORin the evaluator using native boolean short-circuiting so branches that cannot affect the result are never visited. - Avoid regex on the hot path — prefix/suffix and equality operators are one array index or hash lookup; regex requires backtracking and blows the CPU budget for complex patterns. Replace with set membership checks where possible.
- Thin, flat context — the engine iterates context attributes; a 50-key payload costs proportionally more than a 5-key one. Strip non-targeting keys at the API boundary.
Latency budget breakdown
A realistic per-evaluation budget for a service targeting p99 ≤ 5 ms:
| Phase | Target | Notes |
|---|---|---|
| Context lookup (in-process) | < 0.05 ms | hash map read; no I/O |
| Cache check (local LRU) | < 0.1 ms | before entering the engine |
| AST walk (compiled, simple rule) | < 0.2 ms | AND/OR chains, 3–5 predicates |
| AST walk (complex rule) | < 1.0 ms | segments, nested conditions |
| Serialisation + overhead | < 0.3 ms | variant → caller |
| Total budget | < 2 ms median | leaves headroom for p99 spike |
If p99 exceeds 5 ms, the investigation order is: confirm the AST is compiled (not parsed on each call), profile for regex predicates, then check context payload size.
Step-by-Step Implementation
Step 1 — Compile rules to AST at provider initialisation
Move all rule parsing out of the evaluation path. At startup — and again whenever the flag update stream delivers a change — parse each flag’s targeting rule into a compiled tree and store it keyed by flag key + rule version.
import { OpenFeature, Provider, ResolutionDetails } from '@openfeature/server-sdk';
interface ASTNode {
op: 'AND' | 'OR' | 'EQ' | 'IN' | 'GT';
left?: ASTNode;
right?: ASTNode;
attr?: string;
value?: unknown;
values?: unknown[];
}
function compileRule(jsonLogic: Record<string, unknown>): ASTNode {
const [op, args] = Object.entries(jsonLogic)[0];
if (op === 'and') return { op: 'AND', left: compileRule(args[0]), right: compileRule(args[1]) };
if (op === '==') return { op: 'EQ', attr: (args[0] as any).var, value: args[1] };
if (op === 'in') return { op: 'IN', attr: (args[0] as any).var, values: args[1] };
throw new Error(`Unsupported op: ${op}`);
}
// compiledRules lives at module scope — built once, read on every evaluation
const compiledRules = new Map<string, ASTNode>();
function loadFlags(flagDefs: Record<string, { targeting: Record<string, unknown>, version: number }>) {
for (const [key, def] of Object.entries(flagDefs)) {
compiledRules.set(`${key}@${def.version}`, compileRule(def.targeting));
}
}
Pitfall: calling JSON.parse + compileRule inside the evaluation function is the most common cause of evaluation latency spikes. Profile with console.time or a flag_eval_parse_duration_seconds counter to confirm the cost before and after moving compilation to init.
Step 2 — Implement short-circuit evaluation in the AST walker
The evaluator must mirror how the host language’s && and || operators work: for AND, stop as soon as one branch returns false; for OR, stop as soon as one branch returns true. Never evaluate both sides unconditionally.
function walkAST(node: ASTNode, ctx: Record<string, unknown>): boolean {
switch (node.op) {
case 'AND':
// Short-circuit: right branch is skipped if left is false
return walkAST(node.left!, ctx) && walkAST(node.right!, ctx);
case 'OR':
return walkAST(node.left!, ctx) || walkAST(node.right!, ctx);
case 'EQ':
return ctx[node.attr!] === node.value;
case 'IN':
return (node.values as unknown[]).includes(ctx[node.attr!]);
case 'GT':
return (ctx[node.attr!] as number) > (node.value as number);
default:
return false; // safe default for unrecognised ops
}
}
Pitfall: an evaluator that materialises the full result of both branches before applying the operator defeats short-circuiting. Watch for Promise.all, eager list comprehensions, or any pattern that forces evaluation of an operand before the logical result is known.
Step 3 — Replace regex predicates with set membership or prefix checks
Regex matching in targeting rules is disproportionately expensive: even a simple ^enterprise-.* pattern requires backtracking machinery and prevents branch prediction. Replace with explicit set membership or prefix operators wherever rule authors have the option.
# flagd rule definition — AVOID regex in targeting
# Slow:
targeting:
if:
- { "regex": [ { "var": "tenantId" }, "^enterprise-" ] }
- "on"
- "off"
# Fast: explicit set membership
flags:
api.search.semantic-rerank:
state: ENABLED
variants: { "on": true, "off": false }
defaultVariant: "off"
targeting:
if:
- { "in": [ { "var": "tenantTier" }, [ "enterprise", "business-plus" ] ] }
- "on"
- "off"
# If a custom evaluator must match patterns, compile once and cache the compiled pattern
import re
from functools import lru_cache
@lru_cache(maxsize=256)
def _compile(pattern: str) -> re.Pattern:
return re.compile(pattern)
def match_regex_predicate(pattern: str, value: str) -> bool:
return bool(_compile(pattern).match(value))
Pitfall: a regex predicate compiled fresh inside the evaluator on every call adds 5–50 µs per match depending on pattern complexity. Cache compiled patterns at the rule level, not at call time.
Step 4 — Set and enforce a latency budget with alerting
An SLO without enforcement decays. Add a histogram metric to every evaluation call, set a Prometheus alert at the p99 threshold, and fail CI if a benchmark exceeds the budget.
package flags
import (
"context"
"time"
"github.com/open-feature/go-sdk/pkg/openfeature"
"github.com/prometheus/client_golang/prometheus"
)
var evalDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{
Name: "flag_eval_duration_seconds",
Help: "Feature flag evaluation latency",
Buckets: []float64{.0005, .001, .002, .005, .010, .025},
}, []string{"flag_key", "variant"})
func EvaluateWithBudget(ctx context.Context, client openfeature.IClient, key string, defaultVal bool) bool {
start := time.Now()
val, _ := client.BooleanValue(ctx, key, defaultVal, openfeature.EvaluationContext{})
evalDuration.WithLabelValues(key, fmt.Sprintf("%v", val)).Observe(time.Since(start).Seconds())
return val
}
# prometheus-alerts.yaml
groups:
- name: flag_evaluation
rules:
- alert: FlagEvalP99Breach
expr: histogram_quantile(0.99, rate(flag_eval_duration_seconds_bucket[5m])) > 0.005
for: 2m
labels: { severity: critical }
annotations:
summary: "Flag evaluation p99 exceeds 5 ms budget"
description: "Check AST compilation, regex predicates, or context payload size."
Pitfall: aggregating evaluation latency across all flags hides per-flag outliers. Include flag_key as a histogram label so a single complex rule doesn’t obscure a healthy average.
Step 5 — Tune GC and memory allocation
Evaluation allocates on every call: context map reads, intermediate boolean results, variant strings. In GC-managed runtimes (JVM, Go, Python) this creates back-pressure at p99 when GC pauses spike.
// Pre-allocate a context struct rather than building a new map per evaluation
type EvalContext struct {
TargetingKey string
TenantTier string
Region string
// Only fields the rule engine actually reads
}
// Reuse via a sync.Pool to avoid per-request heap allocation
var ctxPool = sync.Pool{
New: func() interface{} { return &EvalContext{} },
}
func getContext(req *http.Request) *EvalContext {
ctx := ctxPool.Get().(*EvalContext)
ctx.TargetingKey = req.Header.Get("X-User-ID")
ctx.TenantTier = req.Header.Get("X-Tenant-Tier")
ctx.Region = req.Header.Get("X-Region")
return ctx
}
func releaseContext(ctx *EvalContext) {
*ctx = EvalContext{} // zero before returning
ctxPool.Put(ctx)
}
Pitfall: returning a pooled struct to the caller creates a dangling reference once releaseContext is called. Pool context objects only if evaluation is synchronous and the struct does not escape the call frame.
Verification & Testing
Confirm the optimisations are working with a benchmark that measures compiled vs. uncompiled throughput:
// BenchmarkCompiledAST vs BenchmarkParsedRule — run with: go test -bench=. -benchtime=5s
func BenchmarkCompiledAST(b *testing.B) {
ctx := map[string]interface{}{"tenantTier": "enterprise", "region": "us-east-1"}
// compiledRules already populated at init
b.ResetTimer()
for i := 0; i < b.N; i++ {
walkAST(compiledRules["api.search.semantic-rerank@3"], ctx)
}
}
func BenchmarkParsedRule(b *testing.B) {
raw := `{"if":[{"==":[{"var":"tenantTier"},"enterprise"]},"on","off"]}`
ctx := map[string]interface{}{"tenantTier": "enterprise"}
b.ResetTimer()
for i := 0; i < b.N; i++ {
var rule map[string]interface{}
json.Unmarshal([]byte(raw), &rule) // the cost we're eliminating
evaluateJsonLogic(rule, ctx)
}
}
Expected result: BenchmarkCompiledAST should run 10–50× faster than BenchmarkParsedRule. If the ratio is lower, confirm the compile step runs before the benchmark loop, not inside it.
Troubleshooting & FAQ
Why is p99 high even though p50 is under 1 ms?
p99 outliers with a healthy median usually point to GC pauses, lock contention, or OS scheduling jitter — not to the rule logic itself. Add a histogram metric for allocation count per evaluation and look for correlation with GC events. In Go, runtime/trace will show you GC stop-the-world pauses alongside your goroutine schedule.
How do I know if AST compilation is actually running at init?
Add a log line and a counter (flag_rules_compiled_total) in loadFlags. On startup you should see one log entry per flag per version, and the counter should reach its final value before the first HTTP request is served. If you see parse-related log lines at evaluation time, the compiled tree is not being used.
Do I need to recompile all rules when one flag changes?
No. Compile rules keyed by flag_key + rule_version. When the provider receives an update for a single flag, recompile only that flag’s targeting tree. This keeps the recompile cost proportional to the size of the change, not the total number of flags. See precompiling targeting rules into an AST for the incremental update pattern.
What happens to evaluations during a recompile?
Swap the compiled map atomically: build the new AST into a fresh map, then replace the pointer in a single atomic store or mutex-protected assignment. In-flight evaluations finish against the old tree; new evaluations pick up the updated tree. Never mutate the live map in place while evaluations may be reading it.
Performance & Scale
At high request rates the rule engine is evaluated millions of times per minute across nodes in the fleet. The key insight is that the cost is additive: every extra predicate, every unindexed context key, every avoidable regex adds a fixed overhead to every request. A 0.3 ms per-evaluation improvement on a service processing 10,000 req/s saves 3 CPU-seconds per second — enough to remove a replica.
Propagating rule changes to all nodes in the fleet without a latency spike requires that recompilation happen in a background goroutine, not inline with the update event. See server-side SDK integration patterns for the lifecycle hooks that make this safe.