Experiment 056
OPQ: learned rotation lifts PQ recall, but build cost is brutal & QPS unchanged
Perf record: 056-opq.json. Granite box (8 vCPU). src/bin/pq.rs --opq.
Cohere N=200k × 1024, K=256, recall@10 vs exact GT.
What OPQ is
Optimized PQ: learn an orthogonal rotation R that aligns the data to PQ's subspaces
(less is lost to quantization), by alternating PQ (k-means per subspace) with an
orthogonal-Procrustes update of R from SVD(Xᵀ·X̂). Then PQ on the rotated data.
Result — recall up, but at a cost
| M | bytes | PQ recall | OPQ recall | Δ | QPS | OPQ build |
|---|---|---|---|---|---|---|
| 16 | 16 | 0.9531 | 0.9765 | +2.3 | ~1140 | 1273 s |
| 32 | 32 | 0.9960 | 0.9975 | +0.15 | ~630 | 1293 s |
Pure ADC (C=0): M=16 0.22 → 0.30, M=32 0.41 → 0.44 — OPQ helps most where the quantization error dominates (small M, no rerank).
Two catches
- Build is ~70× slower — 1273 s vs PQ's ~16 s per M. The dense 1024×1024 learned rotation (SVD + full matmuls over the sample × 5 iters, then rotating the whole base) is the cost. (And applying R at query time is a dense 1024² rotation per query.)
- QPS is unchanged — ~1140/630, identical to plain PQ, because OPQ only changes the codes, not the ADC scan. So OPQ inherits PQ's QPS loss vs the binary funnel (963 QPS at 128 B): still gather-bound, still slower.
Conclusion
OPQ is a real recall-per-byte improvement over PQ (+2.3 pts at M=16), strongest at aggressive compression. But it does nothing for QPS and costs an enormous offline fit. So it sharpens PQ's footprint advantage without touching PQ's throughput disadvantage. Family verdict stands (054/056): PQ/OPQ win footprint, lose throughput to the binary funnel — useful only when RAM is the hard constraint.
Caveats
- N=200k (the dense rotation makes 1M build impractical here); recall is higher than at 1M but the OPQ−PQ delta is the result.
- 5 OPQ iters, K=256; more iters help marginally. SIMD ADC (parked) is what OPQ would need to also win QPS.