notes

Experiment 060

3-tier funnel (PQ-prune): ~8× fewer disk reads at 0.99 recall

Perf record: 060-pq-prune-funnel.json. Granite box. src/bin/funnel3.rs. Cohere 1M × 1024. Task #1 (experiment B).

The pipeline

binary scan (RAM) → top-C1 by Hamming → PQ-ADC re-rank those C1 (RAM) → top-C2 → exact rerank C2. The metric is exact reads (= C2), because in the disk regime each exact rerank is a random SSD read — and C=1000 of them per query is what death-spiraled carousel×disk (052). Can a RAM PQ-prune tier cut C2 ≪ C1 at the same recall?

Result — yes, if the PQ is accurate enough

2-tier baseline (binary → exact): 0.999 @ 1000 reads. 3-tier (C1=1000):

PQ M (bytes/vec)C2=64 (15.6× fewer)C2=128 (7.8×)C2=256 (3.9×)C2=500 (2×)
160.5710.7320.8720.970
320.8180.9200.9750.996
640.9550.9890.9970.999

Why it matters

This fixes the carousel × disk bottleneck (052): there, the unshared per-query rerank did C=1000 random SSD reads and collapsed under load. Insert a 64 B/vec PQ tier in RAM (64 MB for 1M — cache-resident) and the disk only sees ~128 reads/query at 0.99 recall — an ~8× cut, turning the death-spiral into a viable serve. The PQ codes join the funnel's other RAM-resident parts (128 MB binary codes), with the 4 GB f32 on SSD touched ~8× less.

Conclusion

The 3-tier funnel (binary → PQ-prune → exact) is the disk-regime answer: ~8× fewer SSD reads at ~0.99 recall, gated on a sufficiently accurate PQ (M≈64). It only helps on disk (RAM rerank is already cheap, 015/037), and it composes with the carousel (052) and the disk hybrid (045).

Caveats