Experiment 026

Random rotation before binarization (RaBitQ/ITQ): free recall

Perf record: 026-rabitq-rotation.json. Cohere v3, 1M × 1024, cosine, Granite box. --quant binary --rotate N.

Research → idea

Searched the literature for something that beats plain sign-bit binary. The standout is RaBitQ (Gao & Long, SIGMOD 2024, arXiv:2405.12497): it quantizes to 1 bit/dim but with an unbiased estimator and a sharp error bound, and a big part of why it works is a random orthogonal rotation applied before sign-binarization — the same trick as ITQ (Gong & Lazebnik, 2011). Intuition: our naive sign bit keeps only the sign of each raw coordinate; if variance is concentrated in a few dims, most bits are near-random. A random rotation spreads the variance evenly so every bit carries independent signal.

Implemented as a fast structured rotation — alternating random ±1 sign-flips and FWHT (a fast Johnson–Lindenstrauss transform), O(D log D), deterministic seed so base and query share the rotation. Rerank still uses the original f32.

Result — recall up at every C, QPS unchanged

C	no-rot recall	rot×2 recall	Δ
stage-1 (no rerank)	0.4681	0.4925	+2.4 pts
100	0.8865	0.9118	+2.5 pts
500	0.9826	0.9891	+0.65 pts
1000	0.9943	0.9970	+0.27 pts

QPS is identical (581.8 vs 584.9 at stage-1): the rotation is baked into the codes at prep time, and the per-query rotation is a single FWHT (~20K flops, negligible beside the 1M-doc scan). rot×3 ≈ rot×2 → 2 rounds suffice.

Conclusions

Rotation is a free recall improvement — better at every C, no QPS or memory cost. The gain is largest at low C (stage-1 +2.4 pts), which is the useful regime: it lets you hit a target recall at a smaller rerank C, which is where throughput is spent. e.g. rot×2 reaches 0.997 at C=1000 vs no-rot's 0.994.
The effect is modest on Cohere because Cohere is already well-conditioned. v3 is compression-aware and fairly isotropic, so there's less variance imbalance for the rotation to fix. On raw/un-tuned embeddings the gain would be larger — this is exactly the case RaBitQ's error bound is designed to guarantee.
It's the right foundation for the next steps: spreading info evenly across dims should also make prefix truncation (014) lose less recall, and it's the substrate for RaBitQ's unbiased asymmetric estimator. Both tested next.

Caveats

Requires power-of-two dim for the FWHT (Cohere 1024 ✓, SIFT 128 ✓); a general dim would pad or use a dense random orthogonal matrix.
This is the rotation half of RaBitQ; the unbiased estimator (027) is separate.