Experiment 056

OPQ: learned rotation lifts PQ recall, but build cost is brutal & QPS unchanged

Perf record: 056-opq.json. Granite box (8 vCPU). src/bin/pq.rs --opq. Cohere N=200k × 1024, K=256, recall@10 vs exact GT.

What OPQ is

Optimized PQ: learn an orthogonal rotation R that aligns the data to PQ's subspaces (less is lost to quantization), by alternating PQ (k-means per subspace) with an orthogonal-Procrustes update of R from SVD(Xᵀ·X̂). Then PQ on the rotated data.

Result — recall up, but at a cost

M	bytes	PQ recall	OPQ recall	Δ	QPS	OPQ build
16	16	0.9531	0.9765	+2.3	~1140	1273 s
32	32	0.9960	0.9975	+0.15	~630	1293 s

Pure ADC (C=0): M=16 0.22 → 0.30, M=32 0.41 → 0.44 — OPQ helps most where the quantization error dominates (small M, no rerank).

Two catches

Build is ~70× slower — 1273 s vs PQ's ~16 s per M. The dense 1024×1024 learned rotation (SVD + full matmuls over the sample × 5 iters, then rotating the whole base) is the cost. (And applying R at query time is a dense 1024² rotation per query.)
QPS is unchanged — ~1140/630, identical to plain PQ, because OPQ only changes the codes, not the ADC scan. So OPQ inherits PQ's QPS loss vs the binary funnel (963 QPS at 128 B): still gather-bound, still slower.

Conclusion

OPQ is a real recall-per-byte improvement over PQ (+2.3 pts at M=16), strongest at aggressive compression. But it does nothing for QPS and costs an enormous offline fit. So it sharpens PQ's footprint advantage without touching PQ's throughput disadvantage. Family verdict stands (054/056): PQ/OPQ win footprint, lose throughput to the binary funnel — useful only when RAM is the hard constraint.

Caveats

N=200k (the dense rotation makes 1M build impractical here); recall is higher than at 1M but the OPQ−PQ delta is the result.
5 OPQ iters, K=256; more iters help marginally. SIMD ADC (parked) is what OPQ would need to also win QPS.