Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggester module in Solr builds up the dictionary based on sentences and not on words #2

Open
innovationchef opened this issue Aug 29, 2018 · 0 comments

Comments

@innovationchef
Copy link
Member

I added the following XML to the solrconfig.xml file for implementing suggester module.

 <searchComponent name="suggest" class="solr.SuggestComponent">
      <lst name="suggester">
        <str name="name">mySuggester</str>
        <str name="lookupImpl">FuzzyLookupFactory</str>
        <str name="dictionaryImpl">DocumentDictionaryFactory</str>
        <str name="field">BioChemEntity.name</str>
        <str name="suggestAnalyzerFieldType">string</str>
      </lst>
    </searchComponent>
    <requestHandler name="/suggest" class="solr.SearchHandler"
                    startup="lazy" >
      <lst name="defaults">
        <str name="suggest">true</str>
        <str name="suggest.count">10</str>
      </lst>
      <arr name="components">
        <str>suggest</str>
      </arr>
    </requestHandler>

This would capture the BioChemEntity.name entries and populate the suggester dictionary that is used internally by Solr to provide suggestions.

The suggestions were captured by the following -
http://localhost:8983/solr/solr_core_name/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&suggest.q=sam
to get suggestions for the word starting with 'sam'. However, it does not provide suggestions like - "sample", "sample SAMD00007" which are indexed in the core field "name" and has been used for building up the dictionary. I could figure out the following issues here -

  1. The current Suggester module is case-sensitive. The BioChemEntity.name = "Sample 1" cannot come up in the suggestions if the suggestions query is starting with small letters- "sam".
  2. The current Suggester module is using the whole phrase present in BioChemEntity.name field to build up the internal dictionary. So, if we indexed BioChemEntity.name=Source GSM00089 and passed a query to provide suggestions for "GSM", it would not return any results as the word GSM00089 is the second word of the entry and the suggester module starts looking from the first character of the first word.
innovationchef added a commit that referenced this issue Aug 29, 2018
…n our project (Note: the script works fine in general for Suggester implementation, however in our case, there are other issues as explained in issue #1 and #2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant