From e046b59ebf9e7354eca85c2beb1fb3329f124506 Mon Sep 17 00:00:00 2001 From: Zachary Tong Date: Wed, 1 Oct 2014 09:27:57 -0400 Subject: [PATCH] Add conclusion to Relevancy chapter --- 170_Relevance.asciidoc | 1 + 170_Relevance/80_Conclusion.asciidoc | 40 ++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+) create mode 100644 170_Relevance/80_Conclusion.asciidoc diff --git a/170_Relevance.asciidoc b/170_Relevance.asciidoc index 0b3c32442..bb83867ef 100644 --- a/170_Relevance.asciidoc +++ b/170_Relevance.asciidoc @@ -28,3 +28,4 @@ include::170_Relevance/70_Pluggable_similarities.asciidoc[] include::170_Relevance/75_Changing_similarities.asciidoc[] +include::170_Relevance/80_Conclusion.asciidoc[] diff --git a/170_Relevance/80_Conclusion.asciidoc b/170_Relevance/80_Conclusion.asciidoc new file mode 100644 index 000000000..390f1938d --- /dev/null +++ b/170_Relevance/80_Conclusion.asciidoc @@ -0,0 +1,40 @@ + +[[relevance-conclusion]] +=== Relevance tuning is the last 10% + +In this chapter, we looked at a how Lucene generates scores based on TF/IDF. +Understanding the score generation process is critical so you can +tune, modulate, attenuate and manipulate the score for your particular +business domain. + +In practice, simple combinations of queries will get you good search results. +But to get _great_ search results, you'll often have to start tinkering with +the previously mentioned tuning methods. + +Often, applying a boost on a strategic field, or re-arranging a query to +emphasize a particular clause, will be sufficient to make your results great. +Sometimes you'll need more invasive changes. This is usually the case if your +scoring requirements diverge heavily from Lucene's word-based TF/IDF model (e.g. you +want to score based on time or distance). + +With that said, relevancy tuning is a rabbit-hole that you can easily fall into +and never emerge. The concept of "most relevant" is a nebulous target to hit and +different people often have different ideas about document ranking. It is easy +to get into a cycle of constant fiddling without any apparent progress. + +We would encourage you to avoid this (very tempting) behavior and instead properly +instrument your search results. Monitor how often your users click the top +result, the top-ten, the first page, how often they execute a secondary query +without selecting a result first, how often they click a result and immediately +go back to the search results, etc etc. + +These are all indicators of how relevant your search results are to the user. +If your query is returning highly relevant results, the user will select one of +the top five results, find what they want and leave. Non-relevant results cause +the user to click around and try new search queries. + +Once you have instrumentation in place, tuning your query is simple. Make a change, +monitor it's effect on your users, repeat as necessary. The tools outlined in this +chapter are just that: tools. You have to use them appropriately to propel +your search results into _great_ category, and the only way to do that is with +strong measurement of user behavior. \ No newline at end of file