notes

♫ Note

Search on an FPGA (AWS F2 / AMD Virtex UltraScale+ HBM)

Status: parked. Recorded so the direction is on record without building it. Sibling of accelerator-gemm.md — the GPU/ASIC question — but it reaches the opposite conclusion, which is exactly why it's worth keeping separate.

The test (from the Exa concepts notes)

prep/exa/concepts.md §"hardware lottery": does it reduce to multiply-add? For an FPGA the more useful question is what primitive does it reward? — and an FPGA rewards a different one than a tensor engine:

Why this inverts the GPU/ASIC conclusion

GPU / ASIC (accelerator-gemm.md)FPGA (this note)
Rewardsdense exact GEMM — discard the funnelthe funnel itself — 1-bit XNOR-popcount
Binary codemust inflate ±1 → int8 (8× bytes)native 1-bit, no penalty
Dataflowfixed; you adapt to the enginecustom — build the funnel as a pipeline
Latencysingle query = GEMV, <5% util, must batchdeterministic streaming, low single-query latency
Top-Kawkward off-engine (WarpSelect)a heap/selection unit is synthesizable inline

So an FPGA is the accelerator that says "your clever do-less funnel was right — let me wire it into silicon" rather than "throw it away and brute-force everything." That's the appeal.

AWS F2 specifically

F2 (successor to F1) uses AMD Virtex UltraScale+ HBM FPGAs (VU47P-class) with on-package HBM2 — high bandwidth feeding the binary codes straight into the popcount pipeline. (Verify live — specs below are approximate as of 2026-06:)

Why it's parked — the development cost is the wall

What would decide it (if ever un-parked)

  1. A binary XNOR-popcount scan kernel on f2.6xlarge over SIFT1M 1-bit codes — measure QPS/latency and $/QPS at fixed recall vs the CPU funnel and vs the GPU dense baseline from accelerator-gemm.md. Three-way $/QPS is the real comparison.
  2. Whether the rerank + top-K stage stays on-FPGA (full pipeline) or hands back to host CPU, and which keeps the latency win.
  3. Whether HLS gets a credible design standing without dropping to hand-written Verilog — that ratio decides whether this is ever economically sane vs the GPU path.

Related