notes

Experiment 061

Final capstone: the best of the best

The definitive synthesis (001–060), after the PQ/ITQ/OPQ/SIMD-ADC round. Cohere v3 1M × 1024, one 8-core box (Granite, Xeon 6975P-C).

The production engine (ship this)

Rotated-residual binary funnel + tiling, served by the carousel.

objectiverecall@10QPSp50
max recall0.998688310.6 ms
balanced0.99529638.0 ms
predictable tail (p99≈p50)0.995~35012 ms
serving0.998no-cliff to ~94012→49 ms

Footprint: 128 MB codes in RAM (+ f32 for rerank, RAM or SSD). This is the shipped answer.

The open frontier (the one thing that could beat it)

SIMD ADC — PQ4 + core::simd::swizzle_dyn (=pshufb), safe Rust (059). This round overturned the long-held "popcount always beats gather": a SIMD LUT makes 4-bit-PQ ADC ~13× faster than scalar PQ and 2–9× faster than untiled popcount, because its 8 B/vec codes (16× smaller) stay cache-resident while popcount's 128 B/vec hits the DRAM wall. It's the first technique that could rival the funnel on QPS. Not yet a clean win (the funnel tiles; 4-bit recall needs a rerank) — but the decisive next build is pq4-scan (tiled) → exact rerank, end-to-end recall/QPS.

The disk answer

3-tier funnel: binary → PQ-prune (M=64, RAM) → exact (060). Cuts the exact/SSD reads ~8× at 0.989 recall (128 vs 1000), fixing the carousel×disk death-spiral (052). Stacks with the carousel + disk hybrid (045).

What's settled (don't retry)

Bit-floor (058, fewer bits ≠ QPS), HNSW-in-cell (043), scalar PQ/OPQ (054/056, footprint only), ITQ (055, +1.5 pts — residual's +3–9 dominates), bf16/int8 rerank (037/015), register-tiling (023). All measured, all negative or dominated.

The two laws that explain everything (049)

The one-line verdict

A 1-bit binary funnel (rotation+residual) served by the carousel does ~0.995 recall @ ~960 QPS @ 8 ms on a single 8-core box — and the only thing that might beat it is SIMD-ADC PQ4 (the open frontier), while PQ-prune makes the disk regime viable.