notes

♫ Note

ANN-Benchmarks

What it is

The de-facto standard benchmark for approximate nearest-neighbor search. Runs many ANN implementations through a reproducible (Docker-based) harness and plots the recall vs. queries-per-second Pareto frontier per dataset, plus index build time and index size. This is the reference frontier to compare our engine's recall/QPS against.

Algorithms it compares (~37)

faiss-ivf, scann, hnswlib, hnsw (several variants), glass, NGT (qg/panng/onng), vamana(diskann), pynndescent, annoy, n2, flann, pgvector, qdrant, milvus(knowhere), weaviate, vald(NGT-anng), vearch, and baselines bruteforce-blas / bf.

Useful angle: bruteforce-blas is the exact baseline (BLAS GEMM) — directly relevant to our batched/GEMM-style direction. The graph indexes (hnsw/glass/NGT) and IVF (faiss-ivf) sit at the high-recall/high-QPS frontier we'd eventually be measured against.

Datasets (name — dim — metric — k)

DatasetDimMetrick
sift-128-euclidean128L210
gist-960-euclidean960L210
fashion-mnist-784-euclidean784L210
glove-100-angular100angular/cosine10
glove-25-angular25angular/cosine10
nytimes-256-angular256angular/cosine10
sift-256-hamming256Hamming10
word2bits-800-hamming800Hamming10
kosarak-jaccardJaccard10

We use sift-128-euclidean (as .fvecs). Note the variety of metrics — angular/Hamming/Jaccard would need different distance kernels than our L2.

Methodology notes

Related