-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Selecting all for particular search results in error #3701
Comments
I'm wondering if the load balancer has something to do with this problem. If I bypass the load balancer and go directly to one of the |
If you can track down any of those 503s to a specific app server it's unlikely that this is the same issue we're seeing with DSA - there, the app servers never even gets the traffic. @jcoyne's thought that this is a capacity issue looks more likely to me absent other evidence. I don't know how those app servers are configured, but there's likely headroom available just from a config change. ( although if 'select all' here is generating 1,187 unique GETs to the backend server from a single client, that seems less desirable from a scalability perspective. That's an arms race I don't think we can win with server configs. ) |
@julianmorley we determined the max number is actually 100 requests. |
@jcoyne I just tried it - I see what you mean. I think your first assumption that this is blatting the app server is right, though. Those 503s came back fast, looks like whichever server the LB sent this client to ran out of available connections. |
@julianmorley @jcoyne this was simple enough to test and confirm that the POST requests resulting in the 503 response are never making it to any of the I tailed each of the Apache request logs on the 5 At the same time on each of the SearchWorks prod VMs I tailed the request logs: I guess there could be conditions where a request makes it to the VM, but something goes wrong and it's never recorded in the request log. But I'm not sure how to determine that. |
Were the requests that got through spread evenly amongst the 5 VMs? The LB is set to balance via 'least connections', so I'd expect the 76 that got through to be on at least 2, preferably 4 or 5 VMs. The 503s were definitely coming from the LB:
Anyhow, I tried this out with a different search; ( my search was https://searchworks.stanford.edu/?per_page=100&q=nelson&search_field=search ) Retries of the 503s worked fine, so that's more fuel for the "too many requests too fast" theory. I'd suggest trying to batch those requests client-side so you're not generating 100 simultaneous requests to the same URL. EDIT: subsequent trial of my 'nelson' test case gave 503s, identical to OP. I'm thinking that's the LB's anti-DOS deciding that I'm a naughty person after initially allowing my tomfoolery. |
Thanks @julianmorley that's helpful and makes complete sense that we're hitting some kind of anti-DOS protection at the load-balancer. And yes, the 76 successful POST requests were nicely distributed across the 5 VMs. With this info we'll look at changes in the app so we're not firing off so many requests for this feature. |
We received a report from Charles Fosselman in East Asia that he tried to "Select all" for this search: https://searchworks.stanford.edu/?f%5Baccess_facet%5D%5B%5D=Online&per_page=100&q=Zhongguo+li+shi&search_field=search
And received an error.

Cory/Chris indicated in Gryphon Core that Access team members will need to look into it.
Noting two details:
Jira issue with original feedback: SW-4254
The text was updated successfully, but these errors were encountered: