notes

Experiment 049

Capstone: a predictive roofline model for the funnel (QPS & recall)

Perf record: 049-roofline-model.json. Model fit/validated against the measured history (scale 047, cell 046, crossover 043, tile-retune 038) — no new run. Granite box (Xeon 6975P-C, 8 vCPU). The capstone of the breakout loop (10a→7→8→5): two closed-form laws that predict the engine's performance for any (N, bits, C, cores), and validate against independent runs.

The two laws

Throughput (compute roofline):

QPS(N, cores) ≈ 1.10e9 · (cores/8) / N        [tiled, tile≥8; tile=1 uses 0.63e9]

The tiled binary scan is popcount-compute-bound at a fixed aggregate rate G₈ ≈ 1.10 Gcmp/s on 8 cores (047). QPS is just that rate divided by the cell size. This is why there is no memory cliff (047): the roofline is compute, not bandwidth, so it's flat through L3→DRAM. Rerank adds a term — negligible in RAM (scan-dominated), but ≈ C × 0.4 ms on EBS SSD (045), where it dominates.

Recall (power law):

miss = 1 − recall ≈ e^12.45 · N^0.28 · C^−1.08 · bits^−1.97       (raw rotated binary)

Fit on 38 points (log-miss R² = 0.90). The exponents are physical:

(Residual encoding (046) lowers the constant e^12.45; rotation (026) is baked into the fit. So the law describes the current best stage-1.)

Validation

QPS vs 047 (tile=8), predicted = G₈·1e9/N:

Nactualpred
100k1124611013
1M11301101
10M114110
100M11.211.0

Within ~2% across a 1000× range. (038's 1M tile=8 = 921 QPS vs scan-only 1101 — the gap is the C=1000 rerank overhead, exactly the rerank term.)

Recall vs 043 — an independent crossover run, full 1024 bits:

mean abs recall error 0.0134 (1.3 pts) across N=1k–100k × C=50–500. Systematic ~+4 pts over-prediction only at C=50 (the power law is slightly optimistic in the very-low-C / low-recall corner); ≤1 pt everywhere recall ≥ 0.95 — the operating range.

What it unifies

The model collapses the whole history into one picture:

Using it (capacity planning becomes arithmetic)

To hit a target recall R at cell size N: pick bits then solve C ≈ (e^12.45 · N^0.28 · bits^−1.97 / (1−R))^(1/1.08), and read off QPS ≈ 1.10e9·(cores/8)/N. Example predictions (8 cores, tiled): 1M/1024-bit/C=500 → recall ~0.98 @ ~1100 QPS; 10M/1024-bit/C=500 → ~0.97 @ ~110 QPS.

Conclusions

  1. QPS is a compute roofline, G/N — flat through the memory hierarchy, predictable within ~2%.
  2. Recall is a clean power law in N, C, bits (miss ~ N^0.28·C^−1.08·bits^−1.97), predictive to ~1 pt on an independent run in the operating range.
  3. The two axes are separable, which is the funnel's whole design virtue: tune recall (C/bits/rotation/residual) without moving the throughput roofline.

Caveats