Skip to content

Commit

Permalink
Add conclusion to Relevancy chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
polyfractal committed Oct 1, 2014
1 parent edf1d66 commit e046b59
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 0 deletions.
1 change: 1 addition & 0 deletions 170_Relevance.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@ include::170_Relevance/70_Pluggable_similarities.asciidoc[]

include::170_Relevance/75_Changing_similarities.asciidoc[]

include::170_Relevance/80_Conclusion.asciidoc[]
40 changes: 40 additions & 0 deletions 170_Relevance/80_Conclusion.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@

[[relevance-conclusion]]
=== Relevance tuning is the last 10%

In this chapter, we looked at a how Lucene generates scores based on TF/IDF.
Understanding the score generation process is critical so you can
tune, modulate, attenuate and manipulate the score for your particular
business domain.

In practice, simple combinations of queries will get you good search results.
But to get _great_ search results, you'll often have to start tinkering with
the previously mentioned tuning methods.

Often, applying a boost on a strategic field, or re-arranging a query to
emphasize a particular clause, will be sufficient to make your results great.
Sometimes you'll need more invasive changes. This is usually the case if your
scoring requirements diverge heavily from Lucene's word-based TF/IDF model (e.g. you
want to score based on time or distance).

With that said, relevancy tuning is a rabbit-hole that you can easily fall into
and never emerge. The concept of "most relevant" is a nebulous target to hit and
different people often have different ideas about document ranking. It is easy
to get into a cycle of constant fiddling without any apparent progress.

We would encourage you to avoid this (very tempting) behavior and instead properly
instrument your search results. Monitor how often your users click the top
result, the top-ten, the first page, how often they execute a secondary query
without selecting a result first, how often they click a result and immediately
go back to the search results, etc etc.

These are all indicators of how relevant your search results are to the user.
If your query is returning highly relevant results, the user will select one of
the top five results, find what they want and leave. Non-relevant results cause
the user to click around and try new search queries.

Once you have instrumentation in place, tuning your query is simple. Make a change,
monitor it's effect on your users, repeat as necessary. The tools outlined in this
chapter are just that: tools. You have to use them appropriately to propel
your search results into _great_ category, and the only way to do that is with
strong measurement of user behavior.

0 comments on commit e046b59

Please sign in to comment.