Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve behaviour for stopwords-only search #623

Open
gerritbruening opened this issue Nov 19, 2018 · 3 comments
Open

Improve behaviour for stopwords-only search #623

gerritbruening opened this issue Nov 19, 2018 · 3 comments
Assignees
Milestone

Comments

@gerritbruening
Copy link
Contributor

gerritbruening commented Nov 19, 2018

I do not find
http://dev.faustedition.net/search?q=%22wo+ist+sie%22
when I want to navigate here:
http://dev.faustedition.net/print/faust.33#l7056

However, I find http://dev.faustedition.net/print/faust.3#l299 when I search for
http://dev.faustedition.net/search?q=%22kennst+du+den%22
so antilabic verses do not cause this problem.

@thvitt
Copy link
Member

thvitt commented Nov 19, 2018

It works when using the StandardAnalyzer ('alle Wortformen') instead of the German analyzer. I guess phrase search fails since ist is a stopword in the GermanAnalyzer?

@gerritbruening
Copy link
Contributor Author

Hm, I have never been involved in stopwords.

@thvitt
Copy link
Member

thvitt commented Nov 21, 2018

Stopwords are a fixed list of very frequent words (function words etc.) that are not indexed since they would produce a large number of results that are insignificant for usual use cases of a search engine. When a query contains such a word, it is usually ignored – so a query like wer ist deutsche bundeskanzlerin is reduced to deutsche bundeskanzlerin. Unfortunately, all of the words wo, ist, and sie are stopwords for the default index configuration, so we get no results.

The alle Wortformen query doesn’t use stopwords. So solutions might be to (a) disable stopwords for the default index as well or (b) provide a hint to search using the other index in case of no results.

@thvitt thvitt changed the title Some search result is missing Improve behaviour for stopwords-only search Sep 23, 2022
@thvitt thvitt added this to the 1.3 milestone Sep 23, 2022
@thvitt thvitt modified the milestones: 1.3, 1.4 Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants