notes

Experiment 044

Quantization verdict: the rotated binary funnel is the frontier

Metadata: 044-quantization-frontier.json. Synthesis entry (no new run) — closes questions/quantization.md by collecting the recall-for-bytes results already measured across history/008037.

The question

Quantization cuts the bytes streamed per query, targeting the memory-bandwidth bound from 001/002. Which point on the recall↔bytes tradeoff is best for the within-cell scan — and is shrinking bytes always worth the recall cost?

What we built and learned

Quant familyEntriesVerdict
Scalar int8 (full-precision scan)008Works, but the binary funnel dominates it on QPS at equal recall.
Binary 1-bit (sign) funnel009, 012, 013, 016The winner. Sign-bit Hamming scan → top-C → f32 rerank. Scan runs at the popcount/bandwidth floor (012: popcount autovectorizes to VPOPCNTDQ; multi-accumulator hurts, −2.2×).
Multi-bit prefix codes014, 027More bits per dim don't pay their bandwidth vs spending the same budget on rotation.
Asymmetric / LUT distance010, 011LUT lost once popcount was re-priced (012).
Rotation (FWHT / SRHT, RaBitQ-style)026, 029Random rotation before sign-binarization is the recall knob — rescues recall at fixed bit budget, at popcount speed. Orthogonal to the scan kernel.
RaBitQ unbiased estimator028The (2·maskedΣ − Σq)/‖o′‖₁ estimator works and is unbiased, but doesn't beat the plain funnel's QPS.
Smaller rerank store (int8 / bf16)015, 037Negative. Shrinking the rerank vectors doesn't help — we're popcount-bound on the scan, not bandwidth-bound on the rerank. 037 (bf16) lost recall for no QPS.

Conclusion

The recall-for-bytes question is answered end-to-end: the rotated binary funnel (1-bit sign scan + f32 rerank, with random rotation as the recall dial) is the Pareto frontier for the within-cell scan. Going below 1 bit isn't meaningful here; going above 1 bit (prefix, int8) doesn't pay; shrinking the rerank tier (int8/bf16) is a dead end because the rerank isn't the bottleneck. This is the same engine that beat HNSW-in-cell at high dimension in 043.

Remaining open branch (untested)

Product Quantization (k-means subspace codebooks + ADC table lookup) and extended RaBitQ B-bit codes were never built. PQ is a fundamentally different mechanism — codebook reconstruction via lookup tables, not bit-truncation of the raw vector — so it isn't covered above. It's the one quantization family worth a future entry if we revisit this axis. Caution: the LUT-distance result (011) is the bar — table lookups lost to autovectorized popcount once re-priced, so PQ's ADC must clear that same bar to be worth it.