Experiment 055

ITQ: a learned rotation beats random, but only by ~1.5 pts

Perf record: 055-itq-learned-rotation.json. Granite box (8 vCPU). src/bin/itq.rs. Cohere 1M × 1024, recall@10 vs exact GT (binary funnel + exact rerank). 50k training sample, 30 ITQ iterations.

What ITQ is

Iterative Quantization (Gong & Lazebnik 2011): instead of a random rotation before sign-binarizing, learn an orthogonal rotation R that minimizes the binarization error ‖sign(VR) − VR‖. Fit by alternating B = sign(VR) and the orthogonal-Procrustes update R = U·Wᵀ from SVD(Vᵀ·B). We test at b=256/512 bits; the b-dim projection is the first b dims of our FWHT-rotated vector, and ITQ learns a b×b rotation on top of it (baseline = R=identity = the current random-rotation codes).

Result — consistent but small

bits	C	random	ITQ	Δ
256	200	0.5003	0.5190	+1.9
256	1000	0.7151	0.7344	+1.9
512	200	0.7846	0.7998	+1.5
512	1000	0.9286	0.9403	+1.2

ITQ beats random rotation everywhere, by +1.2 to +1.9 pts.

The catch: dense rotation

ITQ's R is dense (b×b), so applying it is O(b²)/vector — vs the FWHT random rotation's O(b log b). At 256 bits that's ~32× more rotation work (still a small absolute per-query cost, ~1–2% of the scan), plus an offline SVD fit. So it's not quite "free" like the FWHT rotation.

Conclusion

A learned rotation is a real but modest recall gain (~1.5 pts) over random. For context, residual (046) gives +3 to +9 pts on the same recall axis at zero apply cost — a much bigger lever. So ITQ is worth it only when squeezing the last point and you can afford the dense rotation; otherwise random-rotation + residual already captures most of what a learned rotation would.

Caveats

b-dim projection is the FWHT-rotated prefix; classic ITQ uses PCA — PCA-init might lift ITQ a little more, untested.
50k sample / 30 iters; more could help marginally. The verdict (small gain) is robust.