From bd4a8a758d2ccca4ea49b6145702e630c05a89e0 Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Thu, 28 Nov 2024 19:58:42 +0100 Subject: [PATCH 1/5] add rough draft --- .../ai_powered_search/choose_an_embedder.mdx | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 learn/ai_powered_search/choose_an_embedder.mdx diff --git a/learn/ai_powered_search/choose_an_embedder.mdx b/learn/ai_powered_search/choose_an_embedder.mdx new file mode 100644 index 000000000..d135deccf --- /dev/null +++ b/learn/ai_powered_search/choose_an_embedder.mdx @@ -0,0 +1,32 @@ +--- +title: How to choose an embedder — Meilisearch documentation +description: This article contains general guidance on how to choose the embedder best suited for projects using AI-powered search. +--- + +# How to choose an embedder + +Meilisearch officially supports many different embedders, such as OpenAI, Hugging Face, and Ollama, as well as the majority of embedding generators with a RESTful API. It can be difficult to understand their differences and how to pick one. + +This article contains general guidance on how to choose the embedder best suited for your project. + +## When in doubt, choose OpenAI + +OpenAI returns relevant search results across different subjects and datasets. It is suited for the majority of applications and Meilisearch actively supports and improves OpenAI functionality with every new release. + +In the majority of cases, and especially if this is your first time working with LLMS, choose OpenAI. + +## If you are already using a specific AI service, choose the REST embedder + +If you are already using a specific model from a compatible embedder, choose Meilisearch's REST embedder. This ensures you continue building upon tooling and workflows already in place with minimal configuration necessary. + +## If dealing with non-textual content, choose the user-provided embedder + +Meilisearch does not support searching images, audio, or any other content not presented as text. This limitation applies to both queries and documents—for example, you cannot search using an image instead of text, and you cannot use text to search for images without attached textual metadata. + +In these cases, you will have to supply your own embedder. + +## If working with small static datasets, consider choosing Hugging Face + +Although it returns very relevant search results, the Hugging Face embedder must run directly in your server. This may lead to lower performance and extra costs when you are hosting Meilisearch in a service like DigitalOcean or AWS. + +That said, Hugging Face can be a good embedder for datasets under 10k documents that you don't intend to update often. \ No newline at end of file From 12492a12611790c8e0e685a1c2dc40ba3f2b51cc Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Tue, 21 Jan 2025 15:58:27 +0100 Subject: [PATCH 2/5] minor copy changes --- learn/ai_powered_search/choose_an_embedder.mdx | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/learn/ai_powered_search/choose_an_embedder.mdx b/learn/ai_powered_search/choose_an_embedder.mdx index d135deccf..4fd751504 100644 --- a/learn/ai_powered_search/choose_an_embedder.mdx +++ b/learn/ai_powered_search/choose_an_embedder.mdx @@ -1,11 +1,11 @@ --- title: How to choose an embedder — Meilisearch documentation -description: This article contains general guidance on how to choose the embedder best suited for projects using AI-powered search. +description: General guidance on how to choose the embedder best suited for projects using AI-powered search. --- # How to choose an embedder -Meilisearch officially supports many different embedders, such as OpenAI, Hugging Face, and Ollama, as well as the majority of embedding generators with a RESTful API. It can be difficult to understand their differences and how to pick one. +Meilisearch officially supports many different embedders, such as OpenAI, Hugging Face, and Ollama, as well as the majority of embedding generators with a RESTful API. This article contains general guidance on how to choose the embedder best suited for your project. @@ -13,7 +13,7 @@ This article contains general guidance on how to choose the embedder best suited OpenAI returns relevant search results across different subjects and datasets. It is suited for the majority of applications and Meilisearch actively supports and improves OpenAI functionality with every new release. -In the majority of cases, and especially if this is your first time working with LLMS, choose OpenAI. +In the majority of cases, and especially if this is your first time working with LLMs and AI-powered search, choose OpenAI. ## If you are already using a specific AI service, choose the REST embedder @@ -21,12 +21,12 @@ If you are already using a specific model from a compatible embedder, choose Mei ## If dealing with non-textual content, choose the user-provided embedder -Meilisearch does not support searching images, audio, or any other content not presented as text. This limitation applies to both queries and documents—for example, you cannot search using an image instead of text, and you cannot use text to search for images without attached textual metadata. +Meilisearch does not support searching images, audio, or any other content not presented as text. This limitation applies to both queries and documents. For example, Meilisearch's built-in embedder sources cannot search using an image instead of text. They also cannot use text to search for images without attached textual metadata. -In these cases, you will have to supply your own embedder. +In these cases, you will have to supply your own embedder. Meilisearch Cloud does not support custom embedders. -## If working with small static datasets, consider choosing Hugging Face +## Only choose Hugging Face when self-hosting small static datasets Although it returns very relevant search results, the Hugging Face embedder must run directly in your server. This may lead to lower performance and extra costs when you are hosting Meilisearch in a service like DigitalOcean or AWS. -That said, Hugging Face can be a good embedder for datasets under 10k documents that you don't intend to update often. \ No newline at end of file +That said, Hugging Face can be a good embedder for datasets under 10k documents that you don't plan to update often. Meilisearch Cloud does not support Hugging Face embedders. From 411793547f5cc542705209e1488534902ea016e4 Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Tue, 21 Jan 2025 16:09:54 +0100 Subject: [PATCH 3/5] change title, add article to sidebar --- config/sidebar-learn.json | 5 +++++ learn/ai_powered_search/choose_an_embedder.mdx | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/config/sidebar-learn.json b/config/sidebar-learn.json index f487ea6ca..a10658def 100644 --- a/config/sidebar-learn.json +++ b/config/sidebar-learn.json @@ -54,6 +54,11 @@ "label": "Deactivate AI-powered search", "slug": "deactivate_ai_powered_search" }, + { + "source": "learn/ai_powered_search/choose_an_embedder.mdx", + "label": "Which embedder should I choose?", + "slug": "choose_an_embedder" + }, { "source": "learn/ai_powered_search/difference_full_text_ai_search.mdx", "label": "Differences between full-text and AI-powered search", diff --git a/learn/ai_powered_search/choose_an_embedder.mdx b/learn/ai_powered_search/choose_an_embedder.mdx index 4fd751504..2396de8ba 100644 --- a/learn/ai_powered_search/choose_an_embedder.mdx +++ b/learn/ai_powered_search/choose_an_embedder.mdx @@ -1,9 +1,9 @@ --- -title: How to choose an embedder — Meilisearch documentation +title: Which embedder should I choose? — Meilisearch documentation description: General guidance on how to choose the embedder best suited for projects using AI-powered search. --- -# How to choose an embedder +# Which embedder should I choose? Meilisearch officially supports many different embedders, such as OpenAI, Hugging Face, and Ollama, as well as the majority of embedding generators with a RESTful API. From 1b12476a6f589ceb5b94bba9d75e8049961c06e0 Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Tue, 21 Jan 2025 16:21:32 +0100 Subject: [PATCH 4/5] fix incorrect statement regarding user-provided embedders on the cloud --- learn/ai_powered_search/choose_an_embedder.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/learn/ai_powered_search/choose_an_embedder.mdx b/learn/ai_powered_search/choose_an_embedder.mdx index 2396de8ba..15da304e0 100644 --- a/learn/ai_powered_search/choose_an_embedder.mdx +++ b/learn/ai_powered_search/choose_an_embedder.mdx @@ -23,7 +23,7 @@ If you are already using a specific model from a compatible embedder, choose Mei Meilisearch does not support searching images, audio, or any other content not presented as text. This limitation applies to both queries and documents. For example, Meilisearch's built-in embedder sources cannot search using an image instead of text. They also cannot use text to search for images without attached textual metadata. -In these cases, you will have to supply your own embedder. Meilisearch Cloud does not support custom embedders. +In these cases, you will have to supply your own embedder. ## Only choose Hugging Face when self-hosting small static datasets From 856e7dd70a56fbd5094eae21c3934803e4ba7dd6 Mon Sep 17 00:00:00 2001 From: gui machiavelli Date: Thu, 23 Jan 2025 13:55:42 +0100 Subject: [PATCH 5/5] Update learn/ai_powered_search/choose_an_embedder.mdx Co-authored-by: macraig --- learn/ai_powered_search/choose_an_embedder.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/learn/ai_powered_search/choose_an_embedder.mdx b/learn/ai_powered_search/choose_an_embedder.mdx index 15da304e0..6e5402d94 100644 --- a/learn/ai_powered_search/choose_an_embedder.mdx +++ b/learn/ai_powered_search/choose_an_embedder.mdx @@ -23,7 +23,7 @@ If you are already using a specific model from a compatible embedder, choose Mei Meilisearch does not support searching images, audio, or any other content not presented as text. This limitation applies to both queries and documents. For example, Meilisearch's built-in embedder sources cannot search using an image instead of text. They also cannot use text to search for images without attached textual metadata. -In these cases, you will have to supply your own embedder. +In these cases, you will have to supply your own embeddings. ## Only choose Hugging Face when self-hosting small static datasets