Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new article on how to choose embedder type #3058

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions config/sidebar-learn.json
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@
"label": "Deactivate AI-powered search",
"slug": "deactivate_ai_powered_search"
},
{
"source": "learn/ai_powered_search/choose_an_embedder.mdx",
"label": "Which embedder should I choose?",
"slug": "choose_an_embedder"
},
{
"source": "learn/ai_powered_search/difference_full_text_ai_search.mdx",
"label": "Differences between full-text and AI-powered search",
Expand Down
32 changes: 32 additions & 0 deletions learn/ai_powered_search/choose_an_embedder.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: Which embedder should I choose? — Meilisearch documentation
description: General guidance on how to choose the embedder best suited for projects using AI-powered search.
---

# Which embedder should I choose?

Meilisearch officially supports many different embedders, such as OpenAI, Hugging Face, and Ollama, as well as the majority of embedding generators with a RESTful API.

This article contains general guidance on how to choose the embedder best suited for your project.

## When in doubt, choose OpenAI

OpenAI returns relevant search results across different subjects and datasets. It is suited for the majority of applications and Meilisearch actively supports and improves OpenAI functionality with every new release.

In the majority of cases, and especially if this is your first time working with LLMs and AI-powered search, choose OpenAI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Feel free to ignore this one as it might be just me) I wonder if this doesn't sound a bit too biased towards OpenAI, making it sound like it's our provider of choice over others. Maybe we can phrase it around "ease of config" or "easiest for beginners" as it only requires pasting the OpenAI key.


## If you are already using a specific AI service, choose the REST embedder

If you are already using a specific model from a compatible embedder, choose Meilisearch's REST embedder. This ensures you continue building upon tooling and workflows already in place with minimal configuration necessary.

## If dealing with non-textual content, choose the user-provided embedder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## If dealing with non-textual content, choose the user-provided embedder
## If dealing with non-textual content, choose the Custom (user-provided) embedder


Meilisearch does not support searching images, audio, or any other content not presented as text. This limitation applies to both queries and documents. For example, Meilisearch's built-in embedder sources cannot search using an image instead of text. They also cannot use text to search for images without attached textual metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinging @dureuill as I'm not 100% sure about this one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is correct. We may want to specify that, by supplying the embeddings generated using their own embedder, the user can indeed achieve these use cases.


In these cases, you will have to supply your own embedder.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In these cases, you will have to supply your own embedder.
In these cases, you will have to supply your own embeddings.


## Only choose Hugging Face when self-hosting small static datasets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was initially advice for Cloud users using HF embedders because we were generating the embeddings locally (running on our Cloud infra). This is no longer the case, we removed the option on the Cloud and we've replaced it with the Hugging Face Inference points using the REST embedder option.

Self-hosted users can still use HuggingFace as an embedder option, as they can tweak their infrastructure to fit their specific needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can either remove this section, or point users in the direction of how to set a HF embedder using the REST option (for Cloud) and the API reference (for self hosted)


Although it returns very relevant search results, the Hugging Face embedder must run directly in your server. This may lead to lower performance and extra costs when you are hosting Meilisearch in a service like DigitalOcean or AWS.

That said, Hugging Face can be a good embedder for datasets under 10k documents that you don't plan to update often. Meilisearch Cloud does not support Hugging Face embedders.