102 Branches 3 Tags

This branch is 109 commits ahead of, 434 commits behind main.

Name	Name	Last commit message	Last commit date
Latest commit ingberam new leaderboard announcement Feb 1, 2024 e5f799e · Feb 1, 2024 History 430 Commits
.github/workflows	.github/workflows	added neurips23 index file	Jun 19, 2023
assets	assets	Update sponsors (#180 )	Oct 20, 2023
docs	docs	added neurips23 index file	Jun 19, 2023
fonts	fonts	rebuild page	Jul 9, 2023
neurips23_slides	neurips23_slides	more slides	Dec 18, 2023
results	results	T2 diskann (#17 )	Sep 17, 2021
templates	templates	added neurips23 index file	Jun 19, 2023
.gitignore	.gitignore	Update .gitignore	Sep 24, 2021
CNAME	CNAME	Update CNAME	Dec 5, 2023
GitHub_Logo_White.png	GitHub_Logo_White.png	moved icons to navbar	Jul 6, 2023
README.md	README.md	added neurips23 index file	Jun 19, 2023
cmt_logo.png	cmt_logo.png	Add files via upload	Jun 22, 2023
discord-logo-white.svg	discord-logo-white.svg	moved icons to navbar	Jul 6, 2023
index.html	index.html	point index to neurip23 html	Jul 6, 2023
neurips21.html	neurips21.html	move schedule down the page	Sep 30, 2023
neurips23.html	neurips23.html	new leaderboard announcement	Feb 1, 2024
style.css	style.css	new leaderboard announcement	Feb 1, 2024

Repository files navigation

Billion-Scale ANN

http://big-ann-benchmarks.com/

Install

The only prerequisite is Python (tested with 3.6) and Docker. Works with newer versions of Python as well but probably requires an updated requirements.txt on the host. (Suggestion: copy requirements.txt to requirements${PYTHON_VERSION}.txt and remove all fixed versions. requirements.txt has to be kept for the docker containers.)

Clone the repo.
Run pip install -r requirements.txt (Use requirements_py38.txt if you have Python 3.8.)
Install docker by following instructions here. You might also want to follow the post-install steps for running docker in non-root user mode.
Run python install.py to build all the libraries inside Docker containers.

Storing Data

The framework assumes that all data is stored in data/. Please use a symlink if your datasets and indices are supposed to be stored somewhere else. The location of the linked folder matters a great deal for SSD-based search performance in T2. A local SSD such as the one found on Azure Ls-series VMs is better than remote disks, even premium ones. See T1/T2 for more details.

Data sets

See http://big-ann-benchmarks.com/ for details on the different datasets.

Dataset Preparation

Before running experiments, datasets have to be downloaded. All preparation can be carried out by calling

python create_dataset.py --dataset [bigann-1B | deep-1B | text2image-1B | ssnpp-1B | msturing-1B | msspacev-1B]

Note that downloading the datasets can potentially take many hours.

For local testing, there exist smaller random datasets random-xs and random-range-xs. Furthermore, most datasets have 1M, 10M and 100M versions, run python create_dataset -h to get an overview.

Running the benchmark

Run python run.py --dataset $DS --algorithm $ALGO where DS is the dataset you are running on, and ALGO is the name of the algorithm. (Use python run.py --list-algorithms) to get an overview. python run.py -h provides you with further options.

The parameters used by the implementation to build and query the index can be found in algos.yaml.

Running the track 1 baseline

After running the installation, we can evaluate the baseline as follows.

for DS in bigann-1B  deep-1B  text2image-1B  ssnpp-1B  msturing-1B  msspacev-1B;
do
    python run.py --dataset $DS --algorithm faiss-t1;
done

On a 28-core Xeon E5-2690 v4 that provided 100MB/s downloads, carrying out the baseline experiments took roughly 7 days.

To evaluate the results, run

sudo chmod -R 777 results/
python data_export.py --output res.csv
python3.8 eval/show_operating_points.py --algorithm faiss-t1 --threshold 10000

Including your algorithm and Evaluating the Results

See Track T1/T2 for more details on evaluation for Tracks T1 and T2.

See Track T3 for more details on evaluation for Track T3.

Credits

This project is a version of ann-benchmarks by Erik Bernhardsson and contributors targetting billion-scale datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Billion-Scale ANN

Install

Storing Data

Data sets

Dataset Preparation

Running the benchmark

Running the track 1 baseline

Including your algorithm and Evaluating the Results

Credits

About

Releases 3

Packages

Contributors 44

Languages

License

harsha-simhadri/big-ann-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Billion-Scale ANN

Install

Storing Data

Data sets

Dataset Preparation

Running the benchmark

Running the track 1 baseline

Including your algorithm and Evaluating the Results

Credits

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 44

Languages

Packages