Experiment 028
RaBitQ unbiased estimator: best recall-per-bit (slow scan)
Perf record: 028-rabitq-estimator.json.
Cohere v3, 1M × 1024, cosine, Granite box. --quant rabitq (rotation auto, 2 rounds).
Research → implementation
The full RaBitQ method (not just the rotation of 026) replaces Hamming with an unbiased inner-product estimator. In the rotated space, for unit vectors:
⟨q, o⟩ ≈ ⟨q', code⟩ / ⟨code, o'⟩ = (2·Σ_{set bits} q'ᵢ − Σ q') / ‖o'‖₁
where q' is the rotated query (kept full precision), the code is the {±1/√D}
sign vector, and ‖o'‖₁ is stored per data vector — that per-vector factor is
what unbiases the estimate (the √D cancels). Implemented as RaBitQ::build
(rotated sign bits + norm1[]) and knn_rabitq (the estimator).
Result — far better recall-per-bit, but the scan is slow
| method | stage-1 | C=100 | C=500 | C=1000 | scan QPS |
|---|---|---|---|---|---|
| plain binary | 0.442 | — | — | — | 583 |
| rot-symmetric (026) | 0.463 | 0.899 | 0.988 | 0.998 | ~580 |
| RaBitQ | 0.606 | 0.985 | 1.000 | 1.000 | 14.7 |
Conclusions
- RaBitQ has the best stage-1 recall by a wide margin — 0.606 vs 0.463 for rotated-symmetric (+14 pts) and 0.442 for plain binary. The unbiased estimator, keeping the query's magnitude and correcting per-vector, is a much sharper ranker than sign-vs-sign Hamming. This reproduces the paper's central claim on our data.
- It needs ~5–10× smaller rerank C for the same recall. RaBitQ hits 0.985 at C=100; symmetric needs C≈500–1000 to get there. At billion scale, where the f32 rerank tier dominates cost, that smaller C is the whole game.
- But as implemented the scan is 40× slower (14.7 vs ~580 QPS) — the estimator
is an asymmetric set-bit gather (no popcount), the same kernel wall as 010/011.
So today RaBitQ is recall-best, not throughput-best. RaBitQ's real
implementation closes this by quantizing the query to a few bits and computing
the estimate with bit-sliced popcount /
vpshufbLUTs — the parked SIMD kernel.
Where this leaves it
- For max recall / smallest C (billion-scale, or when the rerank tier is the cost): RaBitQ is the method, if the fast estimator kernel is built.
- For throughput today at 1M: rotated-symmetric Hamming + tiling (free, popcount-speed) remains the deployable winner; RaBitQ's recall edge isn't worth the 40× scan penalty at this scale.
Caveats
- 300 queries, reps=1 (recall is deterministic; QPS single-shot but CV 0 here).
- The fast bit-quantized RaBitQ kernel is the same unbuilt SIMD-LUT work parked at 011; recall here is the upper bound that kernel would deliver at popcount speed.