Experiment 034
ADSampling rerank, stacked on the binary funnel
Perf record: 034-adsampling-rerank-stack.json.
Cohere v3, 1M × 1024, cosine, Granite box. --quant binads --rerank C --eps0 E.
The pivot
031–033 built ADSampling/PDX as standalone exact scans — interesting, but ~37 QPS max, a different Pareto region that will never beat binarization's 851 QPS. The right move (per the project's own findings and the steer to add value on top of binary): stack the research technique onto the binary funnel rather than run it beside.
So: keep the fast rotated binary scan to get C candidates, then rerank them with ADSampling-pruned exact L2 (on the rotated f32 store) instead of full L2 — candidates that provably can't enter the top-k stop early. This lets the funnel use a larger C (more recall) without the rerank cost growing linearly.
Result — cheaper rerank, gain grows with C, near-lossless
Same rotated binary scan; plain exact rerank vs ADSampling rerank:
| C | rerank | recall@10 | QPS | Δ QPS |
|---|---|---|---|---|
| 1000 | plain | 0.9970 | 535 | — |
| 1000 | ADSampling | 0.9962 | 563 | +5% |
| 2000 | plain | 0.9992 | 494 | — |
| 2000 | ADSampling | 0.9984 | 524 | +6% |
| 4000 | plain | 0.9999 | 421 | — |
| 4000 | ADSampling | 0.9991 | 463 | +10% |
eps0=1.5 pushes further (+14% @ C=4000) at recall ~0.99.
Conclusions
- It adds value where rerank matters. The speedup grows with C (+5% → +10%) because rerank's share of the per-query cost grows with C, and ADSampling prunes most of it. At eps0=2.1 it's near-lossless (recall −0.001). So the high-recall funnel (C=2000–4000, recall 0.999+) gets 6–10% cheaper for free.
- This is the correct shape of a research win here: stacked on binary, not competing with it. Standalone the same technique gave ~37 QPS (033); attached to the funnel it lifts an 850-class pipeline at its expensive end.
- Bounded by the same truth as everything since 016: the scan still dominates at low C, so ADSampling rerank does little there (+5%); it only pays once you over-retrieve. Tiling (016) remains the bigger lever; this composes with it.
Caveats
- Untiled here to compare scan-matched plain vs ADSampling rerank cleanly; combined with tiling the absolute QPS rises and the rerank-pruning effect still applies.
- Rerank runs on the rotated f32 store (L2 preserved); eps0 trades recall for prune aggression, delta=32.