Experiment 044

Quantization verdict: the rotated binary funnel is the frontier

Metadata: 044-quantization-frontier.json. Synthesis entry (no new run) — closes questions/quantization.md by collecting the recall-for-bytes results already measured across history/008–037.

The question

Quantization cuts the bytes streamed per query, targeting the memory-bandwidth bound from 001/002. Which point on the recall↔bytes tradeoff is best for the within-cell scan — and is shrinking bytes always worth the recall cost?

What we built and learned

Quant family	Entries	Verdict
Scalar int8 (full-precision scan)	`008`	Works, but the binary funnel dominates it on QPS at equal recall.
Binary 1-bit (sign) funnel	`009`, `012`, `013`, `016`	The winner. Sign-bit Hamming scan → top-C → f32 rerank. Scan runs at the popcount/bandwidth floor (`012`: popcount autovectorizes to VPOPCNTDQ; multi-accumulator hurts, −2.2×).
Multi-bit prefix codes	`014`, `027`	More bits per dim don't pay their bandwidth vs spending the same budget on rotation.
Asymmetric / LUT distance	`010`, `011`	LUT lost once popcount was re-priced (`012`).
Rotation (FWHT / SRHT, RaBitQ-style)	`026`, `029`	Random rotation before sign-binarization is the recall knob — rescues recall at fixed bit budget, at popcount speed. Orthogonal to the scan kernel.
RaBitQ unbiased estimator	`028`	The (2·maskedΣ − Σq)/‖o′‖₁ estimator works and is unbiased, but doesn't beat the plain funnel's QPS.
Smaller rerank store (int8 / bf16)	`015`, `037`	Negative. Shrinking the rerank vectors doesn't help — we're popcount-bound on the scan, not bandwidth-bound on the rerank. `037` (bf16) lost recall for no QPS.

Conclusion

The recall-for-bytes question is answered end-to-end: the rotated binary funnel (1-bit sign scan + f32 rerank, with random rotation as the recall dial) is the Pareto frontier for the within-cell scan. Going below 1 bit isn't meaningful here; going above 1 bit (prefix, int8) doesn't pay; shrinking the rerank tier (int8/bf16) is a dead end because the rerank isn't the bottleneck. This is the same engine that beat HNSW-in-cell at high dimension in 043.

Remaining open branch (untested)

Product Quantization (k-means subspace codebooks + ADC table lookup) and extended RaBitQ B-bit codes were never built. PQ is a fundamentally different mechanism — codebook reconstruction via lookup tables, not bit-truncation of the raw vector — so it isn't covered above. It's the one quantization family worth a future entry if we revisit this axis. Caution: the LUT-distance result (011) is the bar — table lookups lost to autovectorized popcount once re-priced, so PQ's ADC must clear that same bar to be worth it.