Experiment 048
Query-adaptive funnel width + a per-query certificate
Perf record: 048-query-adaptive-funnel.json.
Granite box (Xeon 6975P-C, 8 vCPU). src/bin/adaptive.rs. Cohere v3 cell N=100k,
1000 queries, rotate×2. Direction 5 of the breakout loop.
The idea
Fixed C wastes rerank on easy queries (top-k clearly Hamming-separated) and
under-serves hard ones (many near-ties). Adaptive C keys off a stage-1 signal — the
Hamming margin — spending rerank only where needed:
C_q = #{candidates with hamming ≤ hamming[k-1] + margin}.
A design caveat sets the stakes: in RAM the scan dominates, so C barely moves QPS there. But in the disk-resident regime (045), each reranked vector is an SSD read (~0.4 ms), so mean-C is the latency — adaptive cuts it directly. So the headline metric is mean rerank work (mean C) at matched mean recall.
Result 1 — adaptive cuts mean rerank ~34% at matched high recall
| approach | mean recall | mean C |
|---|---|---|
| fixed C=200 | 0.9754 | 200 |
| fixed C=500 | 0.9951 | 500 |
| fixed C=1000 | 0.9985 | 1000 |
| adaptive margin=32 | 0.9939 | 328 |
| adaptive margin=16 | 0.8931 | 64 |
| adaptive margin=64 | 0.9997 | 1753 |
At ~0.994 recall, adaptive needs mean C=328 vs fixed 500 — ~34% fewer reranks. In the disk regime that's a ~34% latency cut (328 vs 500 SSD reads); in RAM it's marginal (scan-dominated).
But the simple margin rule overshoots at the extreme-recall tail: margin=64 hits 0.9997 at mean C=1753 — worse than fixed C=1000's 0.9985, because a few pathological queries blow C_q up to the M=2000 cap and drag the mean. The rule wins in the high-recall band (~0.97–0.995); past that it needs a per-query cap or a relative (not absolute) margin. Honest: this is a band win, not a global dominance.
Result 2 — the certificate predicts per-query recall
The certificate is the Hamming gap at the funnel boundary,
gap_q = hamming[C-1] − hamming[k-1] (headroom of the boundary above the k-th) —
computable at query time without ground truth. Bucketing queries by gap (at C=100):
| gap bucket | #queries | mean recall |
|---|---|---|
| 7–14 | 38 | 0.776 |
| ≥15 | 962 | 0.941 |
Monotone: a tight gap (boundary close to the k-th) flags a high-miss-risk query; a wide gap certifies low risk. (No query had gap<7 at C=100 — by that depth the Hamming distribution has always spread.) So the gap is a valid per-query miss-risk certificate: the engine can flag the ~4% of queries likely to miss and either widen C for them (which is exactly what the adaptive rule does) or return a confidence with the result for SLA control.
Conclusions
- Adaptive-margin C cuts mean rerank ~34% at matched ~0.994 recall — a real win in the high-recall operating band, marginal in RAM but a direct latency cut where rerank is the cost (disk-resident, per-045).
- The simple absolute-margin rule overshoots at the extreme-recall tail (mean C 1753 for 0.9997); it wants a cap / relative margin to be globally Pareto.
- The Hamming-gap certificate is predictive (0.78 vs 0.94 recall across gap buckets) — a compute-without-truth per-query miss-risk signal, the mechanism behind the adaptive rule and a hook for per-query SLAs.
Caveats
- One cell (N=100k, Cohere). Plain rotated binary (no residual); 046's residual composes and would lift the absolute recall but not the fixed-vs-adaptive shape.
- mean-C is the work metric; the in-RAM QPS impact is small (scan-dominated). The disk translation uses the 045 ~0.4 ms/read constant, not a re-measured disk run.
- Certificate buckets are coarse (two populated); the monotone trend is the claim, not the exact per-bucket numbers.