Experiment 060
3-tier funnel (PQ-prune): ~8× fewer disk reads at 0.99 recall
Perf record: 060-pq-prune-funnel.json. Granite box.
src/bin/funnel3.rs. Cohere 1M × 1024. Task #1 (experiment B).
The pipeline
binary scan (RAM) → top-C1 by Hamming → PQ-ADC re-rank those C1 (RAM) → top-C2 → exact rerank C2. The metric is exact reads (= C2), because in the disk regime
each exact rerank is a random SSD read — and C=1000 of them per query is what
death-spiraled carousel×disk (052). Can a RAM PQ-prune tier cut C2 ≪ C1 at the same
recall?
Result — yes, if the PQ is accurate enough
2-tier baseline (binary → exact): 0.999 @ 1000 reads. 3-tier (C1=1000):
| PQ M (bytes/vec) | C2=64 (15.6× fewer) | C2=128 (7.8×) | C2=256 (3.9×) | C2=500 (2×) |
|---|---|---|---|---|
| 16 | 0.571 | 0.732 | 0.872 | 0.970 |
| 32 | 0.818 | 0.920 | 0.975 | 0.996 |
| 64 | 0.955 | 0.989 | 0.997 | 0.999 |
- M=64 PQ-prune → 0.989 recall at 7.8× fewer reads (128 vs 1000), 0.955 at 15.6×.
- M=16 is too coarse (0.73 @ C2=128) — the prune tier must be an accurate enough estimate to keep the true neighbours in its top-C2.
Why it matters
This fixes the carousel × disk bottleneck (052): there, the unshared per-query rerank did C=1000 random SSD reads and collapsed under load. Insert a 64 B/vec PQ tier in RAM (64 MB for 1M — cache-resident) and the disk only sees ~128 reads/query at 0.99 recall — an ~8× cut, turning the death-spiral into a viable serve. The PQ codes join the funnel's other RAM-resident parts (128 MB binary codes), with the 4 GB f32 on SSD touched ~8× less.
Conclusion
The 3-tier funnel (binary → PQ-prune → exact) is the disk-regime answer: ~8× fewer SSD reads at ~0.99 recall, gated on a sufficiently accurate PQ (M≈64). It only helps on disk (RAM rerank is already cheap, 015/037), and it composes with the carousel (052) and the disk hybrid (045).
Caveats
- Reads measured as C2 (the exact-rerank count); a live disk run would confirm the latency win, but C2 is the SSD-read count.
- M=64 PQ = 64 B/vec extra RAM; still small vs the f32 it saves reading.
- One dataset (Cohere). PQ trained on a 100k sample, K=256.