Experiment 030

Research capstone (RaBitQ block, 026–030)

Perf record: 030-research-capstone.json. Cohere v3, 1M × 1024, cosine, Granite box.

Head-to-head (same run, reps=8)

config	recall@10	QPS	p50
009 baseline — plain binary, tile=1, C=1000	0.9931	516	11.51 ms
research best — rot×2, tile=16, C=1000	0.9960	845	11.40 ms
research best-recall — rot×2, tile=16, C=2000	0.9990	730	13.14 ms

The research adds +0.29 pts recall and +64% QPS over the baseline (and the +64% is on top of identical recall being available), or 0.999 @ 730.

What the research found

Searched the literature for something that beats plain sign-bit binary. The winner was RaBitQ (Gao & Long, SIGMOD 2024) — and its key, free idea (shared with ITQ, 2011): a random orthogonal rotation before binarizing.

entry	idea	outcome
026	random rotation (FWHT) before sign bits	free recall +2.4 pts stage-1, 0.994→0.997 @ C=1000, 0 QPS cost
027	rotation + prefix truncation	rotation rescues the prefix: +4.6 pts @ 512/C=1000
028	RaBitQ unbiased estimator	best recall-per-bit (stage-1 0.606 vs 0.463, ~5–10× smaller C) but 40× slower scan
029	rotated combined Pareto	frontier shifts out at every tier vs non-rotated 018
030	capstone	0.996 @ 845 / 0.999 @ 730 vs 009's 0.993 @ 516

The two takeaways

The rotation is the deployable win — and it's free. Spreading information evenly across dimensions makes every sign bit count, lifting recall at zero QPS or memory cost, and it rescues prefix truncation (027), which reshapes the whole batch frontier (029). Random rotation buys most of the Matryoshka benefit without a Matryoshka-trained model — the standout practical result of the block.
RaBitQ's estimator is the recall ceiling, gated on a kernel. Its unbiased per-vector estimate (028) is far sharper than Hamming (~5–10× smaller rerank C for the same recall), but as a set-bit gather it's 40× slower. Making it fast is the same parked SIMD-LUT work as the asymmetric kernel (011) — the one remaining high-value build, and where the payoff concentrates at billion scale.

Project arc (all blocks)

001–011: built the funnel (exact → serving → binary+rerank 009 → asymmetric).
012–025: performance — tiling (+76% QPS, 016), prefix dial (014), serving (50× vs f32, 017), intra-query latency (13→3 ms, 021). Net: 0.993 @ 851, latency 2.6 ms.
026–030: research — RaBitQ/ITQ rotation (free recall, rescues prefix) → 0.996 @ 845 or 0.999 @ 730, best of the project; RaBitQ estimator maps the recall ceiling pending the fast kernel.

Caveats

Rotation requires power-of-two dim (FWHT). RaBitQ estimator recall is the upper bound a fast kernel would hit at popcount speed.
Spot box; compare within-table. Cohere is well-conditioned, so rotation's gain is conservative — it would be larger on raw embeddings.