Skip to content

Commit

Permalink
[integration] Add bm25s library for bm25 retrieval models/indices (
Browse files Browse the repository at this point in the history
…#763)

Hello!

## Pull Request overview
* Add BM25S as a library on the Hub. 

## Links
* Repository: https://github.com/xhluca/bm25s
* Models: https://huggingface.co/models?other=bm25s

## Details
BM25S is a very new library for efficient BM25: an important algorithm
for full-text search. It can be combined with vector search (e.g.
Sentence Transformers' domain) for hybrid search, which is commonly used
and very powerful. I'm considering mentioning this library more in
Sentence Transformers and its example/docs, so it'll be nice to get the
"Use in Library" button & download count for these models/indices. I've
set `filter` to False as it's not a big library currently.

cc @xhluca for context: this should add a "Use this model" button in the
top right of all models on the Hub that have a `bm25s` tag in the model
card metadata. It'll then show the following snippet:
```python
from bm25s.hf import BM25HF

retriever = BM25HF.load_from_hub("${model.id}")
```

- Tom Aarsen
  • Loading branch information
tomaarsen authored Jun 19, 2024
1 parent 3e6f756 commit 392a091
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 0 deletions.
6 changes: 6 additions & 0 deletions packages/tasks/src/model-libraries-snippets.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@ export const bertopic = (model: ModelData): string[] => [
model = BERTopic.load("${model.id}")`,
];

export const bm25s = (model: ModelData): string[] => [
`from bm25s.hf import BM25HF
retriever = BM25HF.load_from_hub("${model.id}")`,
];

const diffusers_default = (model: ModelData) => [
`from diffusers import DiffusionPipeline
Expand Down
10 changes: 10 additions & 0 deletions packages/tasks/src/model-libraries.ts
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,16 @@ export const MODEL_LIBRARIES_UI_ELEMENTS = {
wildcard: { path: "*.npz" },
},
},
bm25s: {
prettyLabel: "BM25S",
repoName: "bm25s",
repoUrl: "https://github.com/xhluca/bm25s",
snippets: snippets.bm25s,
filter: false,
countDownloads: {
term: { path: "params.index.json" },
},
},
chat_tts: {
prettyLabel: "ChatTTS",
repoName: "ChatTTS",
Expand Down

0 comments on commit 392a091

Please sign in to comment.