notes

Experiment 046

Cell size × residual encoding: centering is a free recall gain

Perf record: 046-cell-size-residual.json. Granite box (Xeon 6975P-C, 8 vCPU). src/bin/cell.rs. Cohere v3 1M × 1024, cosine. Direction 7 of the breakout loop (the 042 within-cell seam). Energy axis dropped (J/query ≈ package_power / QPS on a fixed-power CPU — QPS in disguise, not an independent signal); the salvaged metric is bytes/query, the bandwidth currency.

The hypothesis (042 seam #2)

Inside one IVF cell the vectors cluster around the cell centroid, so the raw sign-bits are dominated by the shared DC/centroid direction and carry little within-cell information. Subtracting the centroid first (residual encoding) should make the sign-bits encode the actual within-cell variation → higher stage-1 recall at the same bit budget. Residual affects stage-1 selection only; rerank and ground truth stay exact L2 on the raw vectors, isolating the effect.

Result — confirmed, and the gain grows with cell size AND bit pressure

recall@10, raw → residual:

Full 1024 bits (128 B/code):

NC=50C=100C=200
1,0000.929 → 0.944 (+1.5)0.983 → 0.9870.998 → 0.999
20,0000.889 → 0.912 (+2.3)0.955 → 0.9670.986 → 0.991
100,0000.858 → 0.894 (+3.6)0.934 → 0.957 (+2.2)0.975 → 0.986 (+1.1)

Tight 256 bits (32 B/code, ¼ the scan traffic):

NC=50C=100C=200C=500
1,0000.582 → 0.632 (+5.0)0.741 → 0.7810.876 → 0.9010.983 → 0.988
20,0000.421 → 0.502 (+8.1)0.537 → 0.624 (+8.7)0.665 → 0.7340.808 → 0.862
100,0000.360 → 0.433 (+7.2)0.465 → 0.548 (+8.3)0.576 → 0.664 (+8.8)0.717 → 0.797 (+8.0)

Why it matters (bytes/query)

The residual win is largest in the low-bit, bandwidth-saving regime — the one we care about for cost/throughput. At N=100k, the 256-bit residual code is 32 B/vec (4× less scan traffic than full 128 B) yet recovers ~8 pts of the recall lost to truncation, so it dominates raw on recall-per-byte at the low-recall end (0.797 @ 5.25 MB/q residual vs 0.717 @ 5.25 MB/q raw, C=500). Full bits still wins the very-high-recall corner; residual shifts the entire low-bit curve up for free.

Conclusion

Residual encoding (centroid subtraction before rotation+binary) is a free, strictly-positive recall gain for the within-cell scan, costing one centroid vector per cell and nothing at query time beyond a subtract. It's a clean Pareto push, biggest where the engine is most bandwidth-constrained (large cells, tight bit budgets). It composes with everything in the funnel (rotation, tiling, the disk-resident split) since it only changes how the stage-1 codes are formed.

Caveats