Experiment 049
Capstone: a predictive roofline model for the funnel (QPS & recall)
Perf record: 049-roofline-model.json. Model fit/validated
against the measured history (scale 047, cell 046, crossover 043, tile-retune
038) — no new run. Granite box (Xeon 6975P-C, 8 vCPU). The capstone of the breakout
loop (10a→7→8→5): two closed-form laws that predict the engine's performance for any
(N, bits, C, cores), and validate against independent runs.
The two laws
Throughput (compute roofline):
QPS(N, cores) ≈ 1.10e9 · (cores/8) / N [tiled, tile≥8; tile=1 uses 0.63e9]
The tiled binary scan is popcount-compute-bound at a fixed aggregate rate G₈ ≈ 1.10 Gcmp/s on 8 cores (047). QPS is just that rate divided by the cell size. This is why there is no memory cliff (047): the roofline is compute, not bandwidth, so it's flat through L3→DRAM. Rerank adds a term — negligible in RAM (scan-dominated), but ≈ C × 0.4 ms on EBS SSD (045), where it dominates.
Recall (power law):
miss = 1 − recall ≈ e^12.45 · N^0.28 · C^−1.08 · bits^−1.97 (raw rotated binary)
Fit on 38 points (log-miss R² = 0.90). The exponents are physical:
- C^−1.08 — miss ≈ 1/C: each doubling of the funnel width halves the miss rate.
- bits^−1.97 — miss ≈ 1/bits²: doubling the scan bits quarters the miss (more bits ⇒ much tighter Hamming↔true-distance concordance). The strongest lever.
- N^0.28 — miss grows only sublinearly with cell size; big cells are survivable.
(Residual encoding (046) lowers the constant e^12.45; rotation (026) is baked into the fit. So the law describes the current best stage-1.)
Validation
QPS vs 047 (tile=8), predicted = G₈·1e9/N:
| N | actual | pred |
|---|---|---|
| 100k | 11246 | 11013 |
| 1M | 1130 | 1101 |
| 10M | 114 | 110 |
| 100M | 11.2 | 11.0 |
Within ~2% across a 1000× range. (038's 1M tile=8 = 921 QPS vs scan-only 1101 — the gap is the C=1000 rerank overhead, exactly the rerank term.)
Recall vs 043 — an independent crossover run, full 1024 bits:
mean abs recall error 0.0134 (1.3 pts) across N=1k–100k × C=50–500. Systematic ~+4 pts over-prediction only at C=50 (the power law is slightly optimistic in the very-low-C / low-recall corner); ≤1 pt everywhere recall ≥ 0.95 — the operating range.
What it unifies
The model collapses the whole history into one picture:
- Scan = compute roofline G/N (047, 038) → QPS predictable from N and cores alone.
- Recall = an orthogonal knob set by C, bits, rotation, residual (009/026/043/046) → the power law. You move along it without touching the QPS roofline (reranking C is ~free in RAM).
- HNSW-in-cell loses at high D (043) because a graph walk can't beat the G/N compute roofline at cell scale while paying full-D distances — the model says the funnel's QPS is set by popcount throughput, which HNSW can't undercut.
- Disk economics (045) = the rerank term turning on: C × read-latency. The adaptive funnel (048) shrinks mean C, so it cuts exactly that term.
- Scaling (047): since QPS = G/N with no cliff, doubling the corpus halves QPS predictably — capacity planning is arithmetic.
Using it (capacity planning becomes arithmetic)
To hit a target recall R at cell size N: pick bits then solve
C ≈ (e^12.45 · N^0.28 · bits^−1.97 / (1−R))^(1/1.08), and read off
QPS ≈ 1.10e9·(cores/8)/N. Example predictions (8 cores, tiled): 1M/1024-bit/C=500 →
recall ~0.98 @ ~1100 QPS; 10M/1024-bit/C=500 → ~0.97 @ ~110 QPS.
Conclusions
- QPS is a compute roofline, G/N — flat through the memory hierarchy, predictable within ~2%.
- Recall is a clean power law in N, C, bits (miss ~ N^0.28·C^−1.08·bits^−1.97), predictive to ~1 pt on an independent run in the operating range.
- The two axes are separable, which is the funnel's whole design virtue: tune recall (C/bits/rotation/residual) without moving the throughput roofline.
Caveats
- Fit on Cohere v3, rotate×2; the constant (and likely the exponents slightly) are dataset-dependent. The form (G/N; miss ~ N^a·C^p·bits^b) is the portable claim.
- Power law is optimistic at very low C (<50) / low recall; trust it where recall ≥ 0.95.
- QPS law is scan-only; add the rerank term explicitly when rerank is costly (disk).
- Extrapolation beyond the fitted ranges (e.g. 256-bit at 100M) is unverified — the model correctly flags such configs as low-recall but the magnitude is untested.