-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitizing the 'rows' request parameter results in no documents #2
Comments
Do you not see this behavior without the component active? |
Correct, I do not see this behavior if the component is inactive. |
The only thing I can think of is that perhaps when doing distributed paging, each sub shard is somehow asked to increase rows and then the "entry point" shard limits the response. But will need to debug and read some more code to validate. Can you attempt some debug logging ( It could be that the Request sanitizer should only do its magic for top-level requests and not do anything for local shard requests? |
Hi @janhoy, we've determined that a single shard will be sufficient for our solrcloud implementation. If I have some free time in the coming weeks I can take another look at this and help debug it, but I'm afraid I can't spend much more time debugging it now. The issue should be easy to reproduce with two shards and a low row sanitation value, say 100. I'm happy to provide our solr configs. Thanks! |
@sjtower did you ever dig into this? Thinking some more, requesting 1000 hits from offset 16000 means that you need to request 2000 rows from each shard since it could worst case be that all hits come from one of the shards. So a fix would likely be to disable the plugin on the distributed requests somehow. On the other hand, if you do deep paging like this, then CursorMark is perhaps a better alternative, and it may not cause the same behavior. |
I was never able to dig into this, apologies. |
I have a solr cloud setup with 16 shards.
I've set up the request sanitizer to limit rows to 1000 with the following in solrconfig.xml:
This works as expected and limits rows to 1000. However, the rows sanitation is affecting the
start
request parameter as well.When I query this URL I see a valid response containing documents:
http://solr-901:8983/solr/journals_dev/select?fl=id&fq=doc_type:full&q=*:*&rows=1000&start=15000&wt=json
However, when I query this URL I see a response containing no documents:
http://solr-901:8983/solr/journals_dev/select?fl=id&fq=doc_type:full&q=*:*&rows=1000&start=16000&wt=json
Notice that the only difference is the
start
value.I have determined that this behavior is dictated by the number of shards multiplied by the rows sanitation number. So for my case, 16 shards x 1000 row limit means I will get no results when I query with
start
> 16,000.Is this expected behavior, and is there any way I can work around it? We use paging on our website and this will affect any searches that go beyond result 16,000. We still need to limit rows, though.
Thanks!
The text was updated successfully, but these errors were encountered: