notes

Experiment 065

Matryoshka-256 binary funnel + rerank (OpenAI text-embedding-3)

Perf record: 065-matryoshka-256-openai-funnel.json. c8a.4xlarge (Zen5, 16 vCPU) spot, us-east-2. First run on a genuine Matryoshka embedding — closes the question left open by the prefix experiments (014/027): Cohere v3 isn't Matryoshka, so truncating it to 256 collapsed recall to ~0.72. Here we use an embedding trained to be truncatable.

Setup

Result — recall dials from 0.95 to 0.998 on the rerank width

rerank Crecall@10QPSp50p99
5000.947460412.05 ms2.10 ms
10000.973152752.41 ms2.49 ms
20000.987842273.09 ms3.15 ms
40000.994430924.40 ms4.58 ms
80000.997920266.54 ms6.88 ms

Chosen operating point: C=2000 — recall 0.9878, 4227 QPS, p50 3.1 ms, p99 3.15 ms. (+4 recall points over C=500 for ~30% fewer QPS and <1 ms more latency.)

What it shows

Caveats