Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase v1.9.0 #5

Closed
wants to merge 209 commits into from
Closed

Rebase v1.9.0 #5

wants to merge 209 commits into from

Conversation

LiuRuoyu01
Copy link

No description provided.

iotamudelta and others added 30 commits April 17, 2024 03:43
Summary:
The CMakeLists.txt in faiss/gpu uses the $<LINK_LIBRARY:WHOLE_ARCHIVE expression which requires at least cmake 3.24.

Pull Request resolved: facebookresearch#3305

Reviewed By: mlomeli1

Differential Revision: D56234500

Pulled By: algoriddle

fbshipit-source-id: dfe7df3379c5250dedec7d1988cffa889fc1c393
Summary:
In this commit facebookresearch@ab2b7f5, they changed format based on clang-format-18. However, we still use clang-format-11 in our circle ci job which caused the failure. In this PR, we are going to switch to clang-format-18

Pull Request resolved: facebookresearch#3372

Reviewed By: kuarora

Differential Revision: D56280363

Pulled By: junjieqi

fbshipit-source-id: f832ab2112f762e6000b55a155e3e43fe99071d7
Summary:
Pull Request resolved: facebookresearch#3371

This will never happen because N is fixed at compile time and the buffer is large enough. It is misleading to add error handling code for a case that will never happen.

Reviewed By: kuarora

Differential Revision: D56274458

fbshipit-source-id: ca706f1223dbc97e69d5ac9750b277afa4df80a7
Summary:
The current loop goes from 0 to 31.  It has an if statement to do an assignment for j < 16 and a different assignment for j >= 16.  By unrolling the loop to do the j < 16 and the j >= 16 iterations in parallel the if j < 16 is eliminated and the number of loop iterations is reduced in half.

Then unroll the loop for the j < 16 and the j >=16 to a depth of 2.

This change results in approximately a 55% reduction in the execution time for the bench_ivf_fastscan.py workload on Power 10 when compiled with CMAKE_INSTALL_CONFIG_NAME=Release.

The removal of the if (j < 16) statement and the unrolling of the loop removes branch cycle stall and register dependencies on instruction issue. The result is the unrolled code is able issue instructions earlier thus reducing the total number of cycles required to execute the function.

Pull Request resolved: facebookresearch#3364

Reviewed By: kuarora

Differential Revision: D56455690

Pulled By: mdouze

fbshipit-source-id: 490a17a40d9d4439b1a8ea22e991e706d68fb2fa
…kresearch#3345)

Summary:
This pull request is for issue facebookresearch#3330. This patch makes sure that packed code arrays are in big endian format. Kindly let us know if we need any changes or if we can have a better approach.

Pull Request resolved: facebookresearch#3345

Reviewed By: junjieqi

Differential Revision: D55957630

Pulled By: mdouze

fbshipit-source-id: f728f9563f6b942af9d8899b54662d7ceb811206
Summary:
Pull Request resolved: facebookresearch#3361

Fix a few issues in the PR.
Normally all tests should pass on a litlle-endian machine

Reviewed By: junjieqi

Differential Revision: D56003181

fbshipit-source-id: 405dec8c71898494f5ddcd2718c35708a1abf9cb
Summary:
Pull Request resolved: facebookresearch#3383

In this diff, I am fixing minor issues in bench_fw where either certain fields are not accessible when index is build from codec. It also requires index to be discovered using codec alias as index factory is not always available.

In subsequent diff internal to meta will have testcase that execute this path.

Reviewed By: algoriddle

Differential Revision: D56444641

fbshipit-source-id: b7af7e7bb47b20bbb5515a66f41dd24f42459d52
Summary: Fixes facebookresearch#3343

Reviewed By: kuarora, junjieqi

Differential Revision: D56526842

fbshipit-source-id: b7c4377495db4e68283cf4ce2b7c8fae008cd404
Summary:
The osx failed

https://app.circleci.com/pipelines/github/facebookresearch/faiss/5698/workflows/4e029c32-8d8b-4db7-99e2-8e802aad6653/jobs/32701

Pull Request resolved: facebookresearch#3357

Reviewed By: kuarora

Differential Revision: D56039739

Pulled By: junjieqi

fbshipit-source-id: dd434a8817148364797eae39c09e0e1e9edbe858
Summary: Remove debugging log lines

Reviewed By: mlomeli1

Differential Revision: D56626636

fbshipit-source-id: 2721b84e4e1359d1372df2b2c95cc668c6a75c3f
Summary:
This demonstrates how to query several independent IVF indexes with a trained index in common. This avoids to duplicate the coarse quantizer and metadata in memory.

On the Faiss side, it also implements a InvertedListIterator on top of the flat inverted lists, which can prove useful.

Reviewed By: junjieqi

Differential Revision: D56575887

fbshipit-source-id: cc3b26e952ee21f24b10169b5b614066600cf4b8
Summary:
`nullptr` is typesafe. `0` and `NULL` are not. In the future, only `nullptr` will be allowed.

This diff helps us embrace the future _now_ in service of enabling `-Wzero-as-null-pointer-constant`.

Reviewed By: palmje

Differential Revision: D56650318

fbshipit-source-id: 803ae62114c39143b65946f6f0387715eaf7f534
Summary:
This commit is the first in a series in an attempt to incrementally enable all jobs currenlty performed by CircleCI. It includes the main configuration files provided by GitHub team + 1 build.

Original PR: facebookresearch#3325

Reviewed By: junjieqi

Differential Revision: D56671582

fbshipit-source-id: c8a21cd69aabaf86134eb86753e90b1facf51bc3
Summary: GitHub checks

Reviewed By: junjieqi

Differential Revision: D56733297

fbshipit-source-id: fe5a2ca7c67f36a4fe986af78fb6dc8f4f843150
…rch#3381)

Summary:
Fixes facebookresearch#3379

Pull Request resolved: facebookresearch#3381

Reviewed By: junjieqi

Differential Revision: D56570120

Pulled By: kuarora

fbshipit-source-id: 758ea4ab866609d6dd5621e6e6ffda583ba52503
Summary: Migration to GitHub actions

Reviewed By: junjieqi

Differential Revision: D56745520

fbshipit-source-id: 5311a549842f19672ae574edaa8be3ea5a580dbc
…3405)

Summary:
Pull Request resolved: facebookresearch#3405

Migration to GitHub Actions

Reviewed By: junjieqi

Differential Revision: D56843276

fbshipit-source-id: 3d5c7ee9a36a783407dfdcc3574c377da5f9db78
…h#3406)

Summary:
Pull Request resolved: facebookresearch#3406

Migration to GitHub Actions

Reviewed By: junjieqi

Differential Revision: D56848895

fbshipit-source-id: 5a351534d9151369a9104314fee203657ac40043
)

Summary:
Pull Request resolved: facebookresearch#3407

Migration to GitHub Actions

Reviewed By: junjieqi

Differential Revision: D56856565

fbshipit-source-id: d7400eb9cb7bd68e93a712af81c6cbb7e76e2400
… via GitHub Actions (facebookresearch#3409)

Summary:
Pull Request resolved: facebookresearch#3409

Migration to GitHub Actions

Reviewed By: junjieqi

Differential Revision: D56917083

fbshipit-source-id: 93a2358ce5697b26aa40ced505f42c584fa8c46c
… availability (facebookresearch#3410)

Summary:
Pull Request resolved: facebookresearch#3410

Migration to GitHub Actions

Reviewed By: junjieqi

Differential Revision: D56921925

fbshipit-source-id: 64e7a636b47d828110a6d413c8645e5343b578cb
…3411)

Summary:
Pull Request resolved: facebookresearch#3411

Migration to GitHub

Reviewed By: kuarora

Differential Revision: D56923116

fbshipit-source-id: 1e2b493b0dd81ce850f2970e6d28c713f6a9927b
Summary:
Pull Request resolved: facebookresearch#3417

facebookresearch#3351

Reviewed By: junjieqi

Differential Revision: D57120422

fbshipit-source-id: e2e446642e7be8647f5115f90916fad242e31286
…okresearch#3418)

Summary:
Pull Request resolved: facebookresearch#3418

Migration to GitHub Actions

Reviewed By: junjieqi

Differential Revision: D57133934

fbshipit-source-id: 255b7afbbb90cc966916cd900174833416b0bc51
…earch#3416)

Summary:
The code generated for function fvec_L2sqr generated by OpenXL do not perform as good as the codes generated by gcc on Power. The macros to enable imprecise floating point operation don’t cover Power with OpenXL. This patch adds the OpenXL compiler options for the PowerPC macros to achieve better performance.

Pull Request resolved: facebookresearch#3416

Reviewed By: asadoughi

Differential Revision: D57210015

Pulled By: mdouze

fbshipit-source-id: 6b838a2fa4d4996fe52c9f1105827004626fe720
…er libc (facebookresearch#3426)

Summary:
Pull Request resolved: facebookresearch#3426

GitHub Actions only supports Ubuntu 22 and newer and this change is necessary to enable CUDA builds to complete the migration.

Reviewed By: algoriddle

Differential Revision: D57261685

fbshipit-source-id: 34467f57426864ffa8b32f6018ccdc7bb4424b57
…ch#3427)

Summary:
Pull Request resolved: facebookresearch#3427

Migration to GitHub Actions

Reviewed By: algoriddle

Differential Revision: D57261696

fbshipit-source-id: d7b8c26259fd3de1cf59fc460f6af20185ceacfe
…ookresearch#3428)

Summary:
Pull Request resolved: facebookresearch#3428

GitHub Actions currently does not support runners with AVX-512  but committed to add this support in early 2025. We will be running these on CircleCI until then. This placeholder build configuration will allow us to enable it with a 1-liner when the hosts are available.

Reviewed By: algoriddle

Differential Revision: D57261783

fbshipit-source-id: 1fb985a0c3dbb11851af63c95bde6494d25d0ac2
…h#3430)

Summary:
This PR removes unneeded ARM NEON SIMD instructions for ScalarQuantizer.

The removed instructions are completely redundant, and I believe that it is a funky way of converting two `float32x4_t` variables (which hold 4 float values in a single SIMD register) into a single `float32x4x2_t` variable (two SIMD registers packed together).

Clang compiler is capable of eliminating these instructions. The only GCC that can eliminate these unneeded instructions is GCC 14, which was released very recently (Apr-May 2024).

mdouze

Pull Request resolved: facebookresearch#3430

Reviewed By: mlomeli1

Differential Revision: D57369849

Pulled By: mdouze

fbshipit-source-id: 09d7cf16e113df3eb9ddbfa54d074b58b452ba7f
Summary:
Pull Request resolved: facebookresearch#3442

fix install instruction for GPU + pytorch

Reviewed By: mlomeli1

Differential Revision: D57376959

fbshipit-source-id: 74caff960be7dbf8102e7593ce1485452a18de6e
Michael Norris and others added 29 commits September 17, 2024 09:34
Summary:
Pull Request resolved: facebookresearch#3854

We need some more functions exposed for use in telemetry wrapper classes. This PR changes some functions in read_index to be non static and exposes them in the header. (Laser can also write IndexIVFPQ and IndexIVFScalarQuantizer, so those are added to read_index).

Reviewed By: asadoughi

Differential Revision: D62623242

fbshipit-source-id: 5b29d986570d4439d066b1815d15a21b45e90482
…ch#3868)

Summary:
This causes an access violation error.

The reason why this was not caught in unit tests for AVX/NEON is that this code branch is unlikely to be used.

The reason why this was not caught in unit tests for a plain non-SIMD binary is unclear.

More ResidualQuantizer patches to follow.

Pull Request resolved: facebookresearch#3868

Reviewed By: mengdilin

Differential Revision: D62882531

Pulled By: mnorris11

fbshipit-source-id: fc50c7409d6064605f783c342b0d313145ffe948
Summary:
replace
``` C++
template <class Codec, bool uniform, int SIMD>
struct QuantizerTemplate {};
```

with
``` C++
enum class QuantizerTemplateScaling { UNIFORM = 0, NON_UNIFORM = 1 };

template <class Codec, QuantizerTemplateScaling SCALING, int SIMD>
struct QuantizerTemplate {};
```

This allows adding more Scalar Quantizer scaling types (such as rowwise or rowwise + non-uniform) in the future.

Pull Request resolved: facebookresearch#3870

Reviewed By: mengdilin

Differential Revision: D63033311

Pulled By: mnorris11

fbshipit-source-id: f62b3dcdf446251229a863fdd9aa5e00d9b02c07
Summary:
Pull Request resolved: facebookresearch#3873

The previous version required scipy to do the accumulation, which is replaced here with a nifty piece of numpy accumulation.
This removes the need for scipy for non-sparse data.

Reviewed By: junjieqi

Differential Revision: D62884307

fbshipit-source-id: 5443634e487387a2b518fd2a7f9a3d9a40abd4b4
Summary:
Pull Request resolved: facebookresearch#3872

The contrib.torch subdirectory is intended to receive modules in python that are useful for similarity search and that apply to CPU or GPU pytorch tensors.

The current version includes CPU clustering on torch tensors. To be added:
* implementation of PQ

Reviewed By: asadoughi

Differential Revision: D62759207

fbshipit-source-id: 87dbaa5083e3f2f4f60526815e22ded4e83e8559
Summary:
Pull Request resolved: facebookresearch#3876

Demo script for distributed kmeans. It provides a `DatasetAssign` object and shows how to run it with torch.distributed.

Reviewed By: asadoughi, pankajsingh88

Differential Revision: D63013820

fbshipit-source-id: 22c959f3afdc04fd4aa8b9aeed309ea6290b1328
…tions. (facebookresearch#3853)

Summary:
The distance and scalar quantizer functions currently have AVX2 implementations. This patch adds the AVX-512 equivalents for each of the AVX2 implementations.

While preparing to push this PR, I realized that you have already implemented the AVX-512 equivalent for [HNSW::MinimaxHeap::pop_min](https://github.com/facebookresearch/faiss/blob/a166e13a25b2a5fe46adce4d7d06677d5199e598/faiss/impl/HNSW.cpp#L1176-L1265), which is great.

Pull Request resolved: facebookresearch#3853

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

Top of the stack D62993711 is green

Reviewed By: asadoughi

Differential Revision: D62989543

Pulled By: mengdilin

fbshipit-source-id: 913403fadbfc512d195fe3411ee761d8ad025245
Summary:
Pull Request resolved: facebookresearch#3878

Looks like D63013820 broke external CI (example failures: https://github.com/facebookresearch/faiss/actions/runs/10965502942/job/30451466102 and https://github.com/facebookresearch/faiss/actions/runs/10964917863 )

with stacktrace
```
Traceback (most recent call last):
  File "/home/runner/work/faiss/faiss/build/faiss/python/setup.py", line 16, in <module>
    shutil.copytree("contrib/torch", "faiss/contrib/torch")
  File "/home/runner/miniconda3/lib/python3.11/shutil.py", line 573, in copytree
    return _copytree(entries=entries, src=src, dst=dst, symlinks=symlinks,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/runner/miniconda3/lib/python3.11/shutil.py", line 471, in _copytree
    os.makedirs(dst, exist_ok=dirs_exist_ok)
  File "<frozen os>", line 225, in makedirs
FileExistsError: [Errno 17] File exists: 'faiss/contrib/torch'
```

`faiss/contrib/torch'` should be copied over from the line above `faiss/contrib`

Reviewed By: asadoughi

Differential Revision: D63145404

fbshipit-source-id: 0c2df0b3a912aeb48671ca0213a1ea4dd8b44510
Summary:
facebookresearch#3870 conflicted with changes in facebookresearch#3853 Rebasing D62989543 for PR 3853 internally did not catch the breakage since we don't have avx512 coverage internally unfortunately :(

=== Test Plan ===
Tested on a local machine and compilation and C++ tests worked
CI for AVX512 and conda build should succeed

Pull Request resolved: facebookresearch#3880

Reviewed By: junjieqi

Differential Revision: D63156374

Pulled By: mengdilin

fbshipit-source-id: 4bf51b2e7795bb55d388a31c79bded742f87d6e9
…JK (facebookresearch#3879)

Summary:
Pull Request resolved: facebookresearch#3879

1. Adds JK `faiss/telemetry:use_faiss_telemetry_core` to the top level logging util in `wrapper_logging_utils.h`. This is currently set to false. I plan to deprecate the other knobs under https://www.internalfb.com/intern/justknobs/?name=faiss%2Ftelemetry and just use one, as Unicorn can't really have their own JK easily (they subclass a lot of FAISS classes too).
2. Copied StringIOReader from Unicorn to telemetry wrapper in `io.h`. This will be deleted from Unicorn in the follow up diff.
3. Updated Laser tests to reflect correct index_read factory string changes.
4. Adds reverse_index_factory. More tests for it in subsequent diff.

Reviewed By: junjieqi

Differential Revision: D62670316

fbshipit-source-id: de1b2ed385593bb43798d29d16d90407920a3251
Summary:
Add `CMakeList` compile `faiss/perf_tests` benchmarks. We will run the google benchmarks as part of CI so people can see benchmarking results (there is no diff-to-diff regression detection in open-sourced CI)

==== Test Plan =====

Sees logs in CI that looks like
```
Run on (4 X 3184.9 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 2.69, 2.84, 1.56
----------------------------------------------------------------------------------------------
Benchmark                                    Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------
QT_4bit/iterations:20                 53646755 ns     53643729 ns           20 code_size=1k
QT_4bit_uniform/iterations:20         52248603 ns     52246874 ns           20 code_size=1k
QT_6bit/iterations:20                 63697930 ns     63693459 ns           20 code_size=1.5k
QT_8bit/iterations:20                 43305175 ns     43303946 ns           20 code_size=2k
QT_8bit_direct/iterations:20          30771920 ns     30770261 ns           20 code_size=2k
QT_8bit_direct_signed/iterations:20   30744625 ns     30742891 ns           20 code_size=2k
QT_8bit_uniform/iterations:20         44227773 ns     44224242 ns           20 code_size=2k
QT_bf16/iterations:20                 32758794 ns     32758717 ns           20 code_size=4k
QT_fp16/iterations:20                 41068848 ns     41066492 ns           20 code_size=4k
2024-09-20T23:15:01+00:00
Running ./build/perf_tests/bench_scalar_quantizer_decode
Run on (4 X 3244.56 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 2.43, 2.78, 1.56
----------------------------------------------------------------------------------------------
Benchmark                                    Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------
QT_4bit/iterations:20                   338300 ns       338284 ns           20 code_size=64
QT_4bit_uniform/iterations:20           332928 ns       332914 ns           20 code_size=64
QT_6bit/iterations:20                   4[1568](https://github.com/facebookresearch/faiss/actions/runs/10966335129/job/30454475438?pr=3878#step:3:1585)3 ns       415674 ns           20 code_size=96
QT_8bit/iterations:20                   266034 ns       266026 ns           20 code_size=128
QT_8bit_direct/iterations:20             37552 ns        37553 ns           20 code_size=128
QT_8bit_direct_signed/iterations:20      39701 ns        39696 ns           20 code_size=128
QT_8bit_uniform/iterations:20           261535 ns       261529 ns           20 code_size=128
QT_bf16/iterations:20                    45518 ns        45506 ns           20 code_size=256
QT_fp16/iterations:20                   334602 ns       334584 ns           20 code_size=256
2024-09-20T23:15:02+00:00
Running ./build/perf_tests/bench_no_multithreading_rcq_search
Run on (4 X 3243.03 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 2.43, 2.78, 1.56
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
WARNING clustering 65536 points to 65536 centroids: please provide at least 2555904 training points
---------------------------------------------------------------
Benchmark                     Time             CPU   Iterations
---------------------------------------------------------------
search/iterations:20   12763792 ns     10367188 ns           20
2024-09-20T23:15:51+00:00
Running ./build/perf_tests/bench_scalar_quantizer_accuracy
Run on (4 X 3231.04 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 2.85, 2.84, 1.65
----------------------------------------------------------------------------------------------
Benchmark                                    Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------
QT_4bit/iterations:20                    0.000 ns        0.000 ns            0 code_size=64 code_size_two=128k ndiff_for_idempotence=0 sql2_recons_error=0.047396
QT_4bit_uniform/iterations:20            0.000 ns        0.000 ns            0 code_size=64 code_size_two=128k ndiff_for_idempotence=0 sql2_recons_error=0.0473931
QT_6bit/iterations:20                    0.000 ns        0.000 ns            0 code_size=96 code_size_two=192k ndiff_for_idempotence=0 sql2_recons_error=2.6899m
QT_8bit/iterations:20                    0.000 ns        0.000 ns            0 code_size=128 code_size_two=256k ndiff_for_idempotence=0 sql2_recons_error=164.317u
QT_8bit_direct/iterations:20             0.000 ns        0.000 ns            0 code_size=128 code_size_two=256k ndiff_for_idempotence=0 sql2_recons_error=42.5514
QT_8bit_direct_signed/iterations:20      0.000 ns        0.000 ns            0 code_size=128 code_size_two=256k ndiff_for_idempotence=0 sql2_recons_error=42.5494
QT_8bit_uniform/iterations:20            0.000 ns        0.000 ns            0 code_size=128 code_size_two=256k ndiff_for_idempotence=0 sql2_recons_error=164.152u
QT_bf16/iterations:20                    0.000 ns        0.000 ns            0 code_size=256 code_size_two=512k ndiff_for_idempotence=0 sql2_recons_error=92.8328u
QT_fp16/iterations:20                    0.000 ns        0.000 ns            0 code_size=256 code_size_two=512k ndiff_for_idempotence=0 sql2_recons_error=1.44838u
2024-09-20T23:15:51+00:00
Running ./build/perf_tests/bench_scalar_quantizer_encode
Run on (4 X 3243.72 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x2)
  L1 Instruction 32 KiB (x2)
  L2 Unified 512 KiB (x2)
  L3 Unified 32768 KiB (x1)
Load Average: 2.85, 2.84, 1.65
----------------------------------------------------------------------------------------------
Benchmark                                    Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------
QT_4bit/iterations:20                   702046 ns       701319 ns           20 code_size=64
QT_4bit_uniform/iterations:20           595889 ns       595880 ns           20 code_size=64
QT_6bit/iterations:20                  1287503 ns      1287542 ns           20 code_size=96
QT_8bit/iterations:20                   511811 ns       511804 ns           20 code_size=128
QT_8bit_direct/iterations:20            152977 ns       152970 ns           20 code_size=128
QT_8bit_direct_signed/iterations:20     185578 ns       185572 ns           20 code_size=128
QT_8bit_uniform/iterations:20           454412 ns       454408 ns           20 code_size=128
QT_bf16/iterations:20                    51331 ns        51324 ns           20 code_size=256
QT_fp16/iterations:20                   390658 ns       390649 ns           20 code_size=256
```

Pull Request resolved: facebookresearch#3793

Reviewed By: junjieqi

Differential Revision: D63147599

Pulled By: mengdilin

fbshipit-source-id: 03165b5acb3b0647a69f7db144ab76efda2fee11
The internal and external repositories are out of sync. This Pull Request attempts to brings them back in sync by patching the GitHub repository. Please carefully review this patch. You must disable ShipIt for your project in order to merge this pull request. DO NOT IMPORT this pull request. Instead, merge it directly on GitHub using the MERGE BUTTON. Re-enable ShipIt after merging.
…esearch#3889)

Summary:
Pull Request resolved: facebookresearch#3889

1.Changing dependency for bench_fw to *_cpu instead of *_gpu
 - faiss_gpu and torch get incompatible. Once, that is fixed, I'll add gpu dependency back.
- today, we are not using gpu in benchmarking yet.

2.Fixing some naming issue in kmeans which is used when using opaque as false in assemble.
3.codec_name when it is not assigned explicitly, it happens when using assembly

Reviewed By: satymish

Differential Revision: D62671870

fbshipit-source-id: 4a4ecfeef948c99fffba407cbf69d2349544bdfd
Summary:
GCC7 doesnt support all the necessary NEON intrinsics, which is really a shame. However this means that for aarch64 GCC cannot compile faiss with neon intrinsics, so we should avoid using them.

This is similar to facebookresearch#3860, build issues on GCC7, which I need. This one is a bit uglier, since GCC7 does support NEON just not all of the intrinsics.

Pull Request resolved: facebookresearch#3869

Reviewed By: asadoughi

Differential Revision: D63081962

Pulled By: junjieqi

fbshipit-source-id: 69827cd447dd405b3ef70d651996f9ad00b6213e
…facebookresearch#3892)

Summary:
Following the current documentation creates the python package without AVX2 or AV512 support. Updated documentation notes that corresponding faiss version must be built before swigfaiss.

fixes facebookresearch#3883

Pull Request resolved: facebookresearch#3892

Reviewed By: mengdilin

Differential Revision: D63641111

Pulled By: asadoughi

fbshipit-source-id: 2f0598ead8cc5b82ed34841c185e6d2a1d068ba5
Summary:
Pull Request resolved: facebookresearch#3901

1) remove system time from benchmark as this metric has extremely high jitter (50-100%) and is not useful for us

2) clean up command-line arguments and define a main function the external world can call

3) tweak default so microbenchmark runs fast by default (this does not the parameters we pass to microbenchmarks for servicelab)

Reviewed By: mnorris11

Differential Revision: D63650110

fbshipit-source-id: efc81563291f00701a0d1df1d27172adeb3ef231
Summary: Pull Request resolved: facebookresearch#3887

Reviewed By: kuarora

Differential Revision: D63355030

Pulled By: asadoughi

fbshipit-source-id: 38792e49fe678c2811896faca7a3ddcab19f8bd0
Summary:
Pull Request resolved: facebookresearch#3907

same as title.
Fix checking right desc

Reviewed By: satymish

Differential Revision: D63854967

fbshipit-source-id: b8bc48662bc38ac96cf9241bdbe2be2b23f1a37e
Summary: Pull Request resolved: facebookresearch#3921

Reviewed By: pankajsingh88

Differential Revision: D64005877

Pulled By: ramilbakhshyiev

fbshipit-source-id: 663c7ab752db04751c7675095d2545adec4be173
Summary:
Similar to .github/workflows/nightly.yml

Pull Request resolved: facebookresearch#3910

Reviewed By: kuarora, pankajsingh88

Differential Revision: D63923478

Pulled By: asadoughi

fbshipit-source-id: df92a86ba48aa0d19aae40d7ca11aeedf4dfac51
Summary:
Pull Request resolved: facebookresearch#3919

These tests are passing successfully in `dev` mode during my local development when I added them but I recently noticed they are failing on contbuild which is running them in opt/mode: https://www.internalfb.com/intern/test/281475152762853/

Upon further inspection, 2 of these were from floating point comparisons which we can fix with `EXPECT_NEAR`. The another one stems from indeterminism of the results in opt mode, so we will relax the test until we figure out a way to deal with the indeterminism

Reviewed By: junjieqi

Differential Revision: D63942329

fbshipit-source-id: 60f1c0b8a0db93015cd32bf991ab983ff2d1af13
Summary:
Pull Request resolved: facebookresearch#3916

Adding missing wrapper to the torch wrappers in Faiss + test it.

Also factorized a  bit of code between search functions.

Reviewed By: algoriddle

Differential Revision: D63974821

fbshipit-source-id: a0415a57a763e2d1896956c503e503615c167860
Summary:
Sometimes between Sept 25 to Oct 2, downloading and linking against `openblas=*=*openmp*` package to run tests have caused a 4-7x slow down. Link it with the regular openblas package which is not compiled with `USE_OPENMP=1`. We will set the openblas omp threads via the environment variable `OPENBLAS_NUM_THREADS` according to https://github.com/OpenMathLib/OpenBLAS/wiki/Faq#multi-threaded

Pull Request resolved: facebookresearch#3918

Test Plan: SVE CI should finish within 40 minutes

Reviewed By: ramilbakhshyiev

Differential Revision: D64059860

Pulled By: mengdilin

fbshipit-source-id: 3ba2bda5fce5122f051421f459692f15ad5360a4
…rch#3928)

Summary:
Pull Request resolved: facebookresearch#3928

Fix issue in T203425107

Reviewed By: asadoughi

Differential Revision: D64068971

fbshipit-source-id: 56db439793539570a102773ff2c7158d48feb7a9
…arch#3929)

Summary:
* Replaced 1.8.0 to 1.9.0.
* Fixed x86-64 architecture reference: https://en.wikipedia.org/wiki/X86-64

Tested with: `conda install -c pytorch/label/staging faiss-cpu`

Pull Request resolved: facebookresearch#3929

Reviewed By: ramilbakhshyiev

Differential Revision: D64082430

Pulled By: asadoughi

fbshipit-source-id: 8a1427a7c14b8c3de4a341533b138d9d8f8490f2
accelerate the build speed when deleting the ID while retaining the feature of forced reconstruction.

for example:
/**
 * Copyright (c) Facebook, Inc. and its affiliates.
 *
 * This source code is licensed under the MIT license found in the
 * LICENSE file in the root directory of this source tree.
 */

// 64-bit int
using idx_t = faiss::idx_t;

int main() {
    int d = 8;   // dimension
    int nb = 10; // database size

    std::mt19937 rng;
    std::uniform_real_distribution<> distrib;

    float* xb = new float[d * nb];

    for (int i = 0; i < nb; i++) {
        for (int j = 0; j < d; j++)
            xb[d * i + j] = distrib(rng);
        xb[d * i] += i / 1000.;
    }

    faiss::IndexFlatL2 index(d);
    faiss::IndexIDMap2 index_id_map2(&index);
    idx_t* xids = new idx_t[nb]();

    // data map
    // [0,  1,  2,  3,  4,  5,  6,  7,  8,  9]
    // [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
    // [0->10,  1->11,  2->12,  3->13,  4->14,  5->15,  6->16,  7->17,  8->18,
    // 9->19]
    for (int i = 0; i < nb; i++) {
        xids[i] = nb + i;
    }

    // test 1  // delete head
    {
        index_id_map2.add_with_ids(nb, xb, xids);

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }
        printf("ntotal = %zd\n", index_id_map2.ntotal);

        // delete head
        {
            std::vector<idx_t> ids{10, 11};
            faiss::IDSelectorArray sel{ids.size(), ids.data()};
            index_id_map2.remove_ids(sel);
        }

        auto rev_map_1 = index_id_map2.rev_map;

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }

        // construct_rev_map
        { index_id_map2.construct_rev_map(); }

        auto rev_map_2 = index_id_map2.rev_map;

        FAISS_ASSERT(rev_map_1 == rev_map_2);
        printf("compare equal\n\n");

        index_id_map2.reset();
    }

    // test 2  // delete tail
    {
        index_id_map2.add_with_ids(nb, xb, xids);

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }
        printf("ntotal = %zd\n", index_id_map2.ntotal);

        // delete tail
        {
            std::vector<idx_t> ids{18, 19};
            faiss::IDSelectorArray sel{ids.size(), ids.data()};
            index_id_map2.remove_ids(sel);
        }

        auto rev_map_1 = index_id_map2.rev_map;

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }

        // construct_rev_map
        { index_id_map2.construct_rev_map(); }

        auto rev_map_2 = index_id_map2.rev_map;

        FAISS_ASSERT(rev_map_1 == rev_map_2);
        printf("compare equal\n\n");

        index_id_map2.reset();
    }

    // test 3  // delete middle  continuous
    {
        index_id_map2.add_with_ids(nb, xb, xids);

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }
        printf("ntotal = %zd\n", index_id_map2.ntotal);

        // delete middle  continuous
        {
            std::vector<idx_t> ids{15, 16, 17};
            faiss::IDSelectorArray sel{ids.size(), ids.data()};
            index_id_map2.remove_ids(sel);
        }

        auto rev_map_1 = index_id_map2.rev_map;

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }

        // construct_rev_map
        { index_id_map2.construct_rev_map(); }

        auto rev_map_2 = index_id_map2.rev_map;

        FAISS_ASSERT(rev_map_1 == rev_map_2);
        printf("compare equal\n\n");

        index_id_map2.reset();
    }

    // test 4  // delete middle  not continuous
    {
        index_id_map2.add_with_ids(nb, xb, xids);

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }
        printf("ntotal = %zd\n", index_id_map2.ntotal);

        // delete middle  not continuous
        {
            std::vector<idx_t> ids{12, 14, 17};
            faiss::IDSelectorArray sel{ids.size(), ids.data()};
            index_id_map2.remove_ids(sel);
        }

        auto rev_map_1 = index_id_map2.rev_map;

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }

        // construct_rev_map
        { index_id_map2.construct_rev_map(); }

        auto rev_map_2 = index_id_map2.rev_map;

        FAISS_ASSERT(rev_map_1 == rev_map_2);
        printf("compare equal\n\n");

        index_id_map2.reset();
    }

    // test 5  // delete head to tail
    {
        index_id_map2.add_with_ids(nb, xb, xids);

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }
        printf("ntotal = %zd\n", index_id_map2.ntotal);

        // delete head to tail
        {
            std::vector<idx_t> ids{10, 14, 19};
            faiss::IDSelectorArray sel{ids.size(), ids.data()};
            index_id_map2.remove_ids(sel);
        }

        auto rev_map_1 = index_id_map2.rev_map;

        for (const auto& [xid, index] : index_id_map2.rev_map) {
            printf("xid=%zd, index=%zd\n", xid, index);
        }

        // construct_rev_map
        { index_id_map2.construct_rev_map(); }

        auto rev_map_2 = index_id_map2.rev_map;

        FAISS_ASSERT(rev_map_1 == rev_map_2);
        printf("compare equal\n\n");

        index_id_map2.reset();
    }

    delete[] xids;
    delete[] xb;
    return 0;
}
@LiuRuoyu01 LiuRuoyu01 closed this Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.