Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NeurIPS 2023 OOD Track] FDUx2-mysteryann-dif Submission #219

Merged
merged 5 commits into from
Nov 1, 2023

Conversation

matchyc
Copy link
Contributor

@matchyc matchyc commented Oct 31, 2023

If allowed, Here is another entry named FDUx2-mysteryann-dif for the OOD track.

The implementation and settings vary from the FDUx2-mysteryann inside the graph.

We would like to express our gratitude to the organizers for your patient evaluation.

Signed-off-by: Meng Chen <[email protected]>
Signed-off-by: Meng Chen <[email protected]>
Signed-off-by: Meng Chen <[email protected]>
@matchyc
Copy link
Contributor Author

matchyc commented Oct 31, 2023

All workflow checks passed at the same commit in the forked repo (https://github.com/matchyc/big-ann-benchmarks/actions), so I think it is not an actual issue.

@maumueller maumueller self-requested a review October 31, 2023 12:28
@maumueller
Copy link
Collaborator

@matchyc How much disk space is used for a successful run? I ran it on c6i.2xlarge with 32gb disk space and it ran out of disk space during graph building.

@maumueller
Copy link
Collaborator

As an additional comment, are your two entries sharing the same index file structure? I see data/indices/ood/mysteryann/ being used here, and I wonder if there is going to be a problem with running both of your entries on the same machine?

@matchyc
Copy link
Contributor Author

matchyc commented Nov 1, 2023

@maumueller Thank you for your check, I suppose there are two points

  1. Text-to-Image-10M, the original dataset holds 7.5GB. As a complete index (like Vamana), we hold the graph structure, additional information, and original dataset. So for the mystery index, the folder, say data/indices/ood/mysteryann/Text2Image1B-10000000/M_bp35_L_pq800_NoT5_ord for a single index should occupy "Dataset size + graph structure size + additional information", that is about 15GB in our AWS ec2 c6i.2xlarge. Considering the competition framework will download the data in big-ann-benchmarks/data/text2image1B, so there is at least "2" original dataset. To sum up, I think we need at least "original dataset size + index" that is "15GB + 7.5GB" for an entry. Noticing that the system occupies a part of disk space, 32GB may not be enough. Note that once the index has been built, we can delete some files but we didn't do that in our code for now. The memory footprint during the search is relatively low, about 5-6GB is OK, but sure, we do need this disk space.

  2. Due to I evaluated the two entries in different AWS ec2 machines, I suppose I forgot to modify the index saving directory path for mysteryann-dif, indices should not share the same directory. We need to modify big-ann-benchmarks/neurips23/ood/mysteryann-dif/mysteryann-dif.py line 94 replace mysteryann with mysteryann-dif to make sure the mysteryann-dif entry will be put into data/indices/ood/mysteryann-dif/Text2Image1B-10000000/, so two mysteryann entries will be saved in different directories if they are run on the same machine. I can commit a change to mysteryann-dif.py. Thanks a lot!

@matchyc
Copy link
Contributor Author

matchyc commented Nov 1, 2023

The latest commit only modifies the index saving directory as described in the previous comment.

@harsha-simhadri
Copy link
Owner

Here is what I see on Azure VM. I didnt look into the discussion above. I ran the code without thought.

mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 80, 'T': 8}))",text2image-10M,10,25275.19583843993,0.0,14736.402364730835,3755588.0,0,0,ood,0.892301
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 83, 'T': 8}))",text2image-10M,10,24512.149202597633,0.0,14736.402364730835,3755588.0,0,0,ood,0.8944889999999999
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 85, 'T': 8}))",text2image-10M,10,24046.26546827023,0.0,14736.402364730835,3755588.0,0,0,ood,0.895942
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 88, 'T': 8}))",text2image-10M,10,23340.657413143323,0.0,14736.402364730835,3755588.0,0,0,ood,0.897888
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 90, 'T': 8}))",text2image-10M,10,22901.846817624802,0.0,14736.402364730835,3755588.0,0,0,ood,0.899239
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 92, 'T': 8}))",text2image-10M,10,22522.06871203133,0.0,14736.402364730835,3755588.0,0,0,ood,0.9005380000000001
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 93, 'T': 8}))",text2image-10M,10,22300.047563606622,0.0,14736.402364730835,3755588.0,0,0,ood,0.901164
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 95, 'T': 8}))",text2image-10M,10,21872.812192859583,0.0,14736.402364730835,3755588.0,0,0,ood,0.9023680000000001
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 100, 'T': 8}))",text2image-10M,10,21037.447151884575,0.0,14736.402364730835,3755588.0,0,0,ood,0.905259
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 103, 'T': 8}))",text2image-10M,10,20499.346258031015,0.0,14736.402364730835,3755588.0,0,0,ood,0.906837
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 107, 'T': 8}))",text2image-10M,10,19801.79901932018,0.0,14736.402364730835,3755588.0,0,0,ood,0.908883
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 110, 'T': 8}))",text2image-10M,10,19327.226268045328,0.0,14736.402364730835,3755588.0,0,0,ood,0.9103730000000001
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 115, 'T': 8}))",text2image-10M,10,18628.094531127448,0.0,14736.402364730835,3755588.0,0,0,ood,0.912575
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 120, 'T': 8}))",text2image-10M,10,17949.65593422942,0.0,14736.402364730835,3755588.0,0,0,ood,0.914668

@matchyc
Copy link
Contributor Author

matchyc commented Nov 1, 2023

Here is what I see on Azure VM. I didnt look into the discussion above. I ran the code without thought.

mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 80, 'T': 8}))",text2image-10M,10,25275.19583843993,0.0,14736.402364730835,3755588.0,0,0,ood,0.892301
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 83, 'T': 8}))",text2image-10M,10,24512.149202597633,0.0,14736.402364730835,3755588.0,0,0,ood,0.8944889999999999
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 85, 'T': 8}))",text2image-10M,10,24046.26546827023,0.0,14736.402364730835,3755588.0,0,0,ood,0.895942
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 88, 'T': 8}))",text2image-10M,10,23340.657413143323,0.0,14736.402364730835,3755588.0,0,0,ood,0.897888
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 90, 'T': 8}))",text2image-10M,10,22901.846817624802,0.0,14736.402364730835,3755588.0,0,0,ood,0.899239
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 92, 'T': 8}))",text2image-10M,10,22522.06871203133,0.0,14736.402364730835,3755588.0,0,0,ood,0.9005380000000001
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 93, 'T': 8}))",text2image-10M,10,22300.047563606622,0.0,14736.402364730835,3755588.0,0,0,ood,0.901164
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 95, 'T': 8}))",text2image-10M,10,21872.812192859583,0.0,14736.402364730835,3755588.0,0,0,ood,0.9023680000000001
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 100, 'T': 8}))",text2image-10M,10,21037.447151884575,0.0,14736.402364730835,3755588.0,0,0,ood,0.905259
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 103, 'T': 8}))",text2image-10M,10,20499.346258031015,0.0,14736.402364730835,3755588.0,0,0,ood,0.906837
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 107, 'T': 8}))",text2image-10M,10,19801.79901932018,0.0,14736.402364730835,3755588.0,0,0,ood,0.908883
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 110, 'T': 8}))",text2image-10M,10,19327.226268045328,0.0,14736.402364730835,3755588.0,0,0,ood,0.9103730000000001
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 115, 'T': 8}))",text2image-10M,10,18628.094531127448,0.0,14736.402364730835,3755588.0,0,0,ood,0.912575
mysteryann-dif,"mystery_dif(('M_bp45_L_pq800_NoT5_ord', {'L_pq': 120, 'T': 8}))",text2image-10M,10,17949.65593422942,0.0,14736.402364730835,3755588.0,0,0,ood,0.914668

Lower than expected for a few percent, but I think It's reasonable, thanks! So I suppose it's ready to merge.

@maumueller maumueller merged commit 762f74d into harsha-simhadri:main Nov 1, 2023
14 of 16 checks passed
@matchyc
Copy link
Contributor Author

matchyc commented Nov 20, 2023

@harsha-simhadri Same entry from one team, thank you for your reminder!

Conference Name: Practical Vector Search Challenge: NeurIPS 2023 Competition track
Paper ID: 15
Paper Title: Fast OOD-ANN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants