-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admin interface hangs on 24-core machine #523
admin interface hangs on 24-core machine #523
Comments
So this behaviour actually seems to be coming from Jetty. Specifically the The only solutions (without changes to Jetty) that I can see are:
A patch to allow Jetty to use different executors for handling requests versus acceptors and selectors would be fairly simple, perhaps that is the best route here? |
I've submit a bug for this against Jetty, though it is arguable what the correct solution is, and actually whether this is a bug in Jetty, or whether they expect us to be doing this calculation ourselves. |
@reines I haven't had a chance to look, but does this issue still apply in the latest version of Jetty (9.2.0)? |
I believe so, or at least it does in master, I didn't specifically check the 9.2.0 tag. |
OK, let's see what the Jetty folks say and take things from there. |
I think original design of thread pool is this, share all threads in one thread pool, avoid thread context switching cost. |
It looks like Jetty has changed their defaults to be sensible: jetty/jetty.project@2d52280 thus resolving this issue in jetty-9.2.2.v20140723 |
This should be fixed when #453 gets merged in |
The issue should be fixed in the current master which is using Jetty 9.2.3.v20140905 (commit 93d3ee5). Please add a comment if the problem is still occurring. |
This seems to be fixed for maxThreads but not adminMaxThreads. If maxThreads is set too low, it errors as expected. If adminMaxThreads is set to something far too low (say 2), the service will happily start without displaying an error message from jetty and then not be able to deal with requests. It also hangs on shutdown for the full 30 second timeout while it is waiting for jetty to shutdown. I suspect either the admin connector is taking a different route through jetty and skipping the exception, or the exception is getting eaten somewhere on the dropwizard side. Using dropwizard 0.9.1 fwiw |
The issue is reported in #523. The OS on CI Linux machines returns an astonishingly big amount of CPUs (~ 128). Jetty changed their algorithm for calculating the maximum amount of selector and acceptor threads, see `eclipse/jetty.project@2d52280`. But, in Dropwizard we still set the amount of acceptors as #CPUs/2 and selectors #CPUs. Looks like that's too much for Jetty, and it can't handle such a big amount of threads. Because of that we have random errors on our CI environment and, what worse, possibly hurt users who actually run their applications on machines with a big amount of CPU. A solution is to delegate calculating the amount to Jetty (which has more sane defaults) and document the defaults.
* Upgrade Jetty to 9.4.0.v20161208 Hopefully, it's just a version bump. Resolves #1874. * Increase waiting interval for batch HTTP/2 test * Delegate calculation amount of acceptors and selectors threads to Jetty The issue is reported in #523. The OS on CI Linux machines returns an astonishingly big amount of CPUs (~ 128). Jetty changed their algorithm for calculating the maximum amount of selector and acceptor threads, see `eclipse/jetty.project@2d52280`. But, in Dropwizard we still set the amount of acceptors as #CPUs/2 and selectors #CPUs. Looks like that's too much for Jetty, and it can't handle such a big amount of threads. Because of that we have random errors on our CI environment and, what worse, possibly hurt users who actually run their applications on machines with a big amount of CPU. A solution is to delegate calculating the amount to Jetty (which has more sane defaults) and document the defaults.
which includes this fix dropwizard/dropwizard#523.
which includes this fix dropwizard/dropwizard#523.
which includes this fix dropwizard/dropwizard#523.
which includes this fix dropwizard/dropwizard#523.
Primary symptom: on a machine with 22 or more cores and the default server configuration, the admin interface accepts TCP connections and then never processes the requests, causing a browser to hang forever. This can happen on any machine given the wrong config parameters.
Details:
maxThreads
> ∑ (acceptorThreads
+selectorThreads
) over allapplicationConnectors
adminMaxThreads
> ∑ (acceptorThreads
+selectorThreads
) over alladminConnectors
maxThreads
includes all theacceptorThreads
andselectorThreads
, and what's left over is used for handling requests. If there's nothing left over, the requests queue up and never get handled.applicationConnectors
or theadminConnectors
(respectively) will accept TCP connections but then never handle themmaxThreads
(1024) andadminMaxThreads
(64) are fixed, while the defaults foracceptorThreads
(#CPUs/2) andselectorThreads
(#CPUs) vary on different machines.There are potentially 3 parts to this bug:
maxThreads
only for processing requests (and not for selector threads and acceptor threads) might be more intuitive.Tested with version
v0.7.0.rc3
.How to reproduce: use the
dropwizard-example
application on a 24-core machine, or use the following server config on any machine:App starts successfully, but requests to either the application or the admin interface hang.
The text was updated successfully, but these errors were encountered: