notes

Experiment 056

OPQ: learned rotation lifts PQ recall, but build cost is brutal & QPS unchanged

Perf record: 056-opq.json. Granite box (8 vCPU). src/bin/pq.rs --opq. Cohere N=200k × 1024, K=256, recall@10 vs exact GT.

What OPQ is

Optimized PQ: learn an orthogonal rotation R that aligns the data to PQ's subspaces (less is lost to quantization), by alternating PQ (k-means per subspace) with an orthogonal-Procrustes update of R from SVD(Xᵀ·X̂). Then PQ on the rotated data.

Result — recall up, but at a cost

MbytesPQ recallOPQ recallΔQPSOPQ build
16160.95310.9765+2.3~11401273 s
32320.99600.9975+0.15~6301293 s

Pure ADC (C=0): M=16 0.22 → 0.30, M=32 0.41 → 0.44 — OPQ helps most where the quantization error dominates (small M, no rerank).

Two catches

  1. Build is ~70× slower — 1273 s vs PQ's ~16 s per M. The dense 1024×1024 learned rotation (SVD + full matmuls over the sample × 5 iters, then rotating the whole base) is the cost. (And applying R at query time is a dense 1024² rotation per query.)
  2. QPS is unchanged — ~1140/630, identical to plain PQ, because OPQ only changes the codes, not the ADC scan. So OPQ inherits PQ's QPS loss vs the binary funnel (963 QPS at 128 B): still gather-bound, still slower.

Conclusion

OPQ is a real recall-per-byte improvement over PQ (+2.3 pts at M=16), strongest at aggressive compression. But it does nothing for QPS and costs an enormous offline fit. So it sharpens PQ's footprint advantage without touching PQ's throughput disadvantage. Family verdict stands (054/056): PQ/OPQ win footprint, lose throughput to the binary funnel — useful only when RAM is the hard constraint.

Caveats