-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server stop will wait until all background models are unloaded #323
Conversation
@@ -341,6 +341,7 @@ InferenceServer::Stop(const bool force) | |||
} | |||
} else { | |||
const auto& live_models = model_repository_manager_->LiveModelStates(); | |||
size_t bg_models_size = model_repository_manager_->BackgroundModelsSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would probably be good to amend the log output just below as well for clarity on shutdown.
"Found X live models, Y unloading models, and Z inflight non-inference requests"
Otherwise, in the event of 0 live and 0 inflight but 1 unloading, the logs could be confusing:
Timeout 30: Found 0 live and 0 inflight
Timeout 29: Found 0 live and 0 inflight
Timeout 28: Found 0 live and 0 inflight
...
Timeout failed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up on triton-inference-server/server#6835 (comment) :
A live model could also be unloading. How about
"Found X live models, Y ghost models, and Z inflight non-inference requests"
or
"Found (X + Y) live models and Z in-flight non-inference requests"?
I would slightly prefer the latter one, because I don't know if there is any test depending on that timeout statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to sum them like the latter if the background model is enough to block the shutdown in the same way as live models
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated: Show bg model size when printing timeout info
Signal (15) received.
I0127 01:32:27.144715 128 server.cc:307] Waiting for in-flight requests to complete.
I0127 01:32:27.144726 128 model_lifecycle.cc:223] StopAllModels()
I0127 01:32:27.144735 128 model_lifecycle.cc:241] InflightStatus()
I0127 01:32:27.144743 128 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0127 01:32:27.144750 128 model_lifecycle.cc:390] AsyncUnload() 'identity_fp32'
I0127 01:32:27.144792 128 server.cc:338] All models are stopped, unloading models
I0127 01:32:27.144799 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:27.144808 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:27.144815 128 server.cc:347] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0127 01:32:27.144820 128 backend_model_instance.cc:795] Stopping backend thread for identity_fp32_0...
I0127 01:32:27.144822 128 server.cc:353] identity_fp32 v1: UNLOADING
I0127 01:32:27.144861 128 identity.cc:730] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0127 01:32:27.145098 128 identity.cc:638] TRITONBACKEND_ModelFinalize: delete model state
I0127 01:32:27.145113 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1
I0127 01:32:27.145121 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0127 01:32:28.145006 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:28.145051 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:28.145060 128 server.cc:347] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I0127 01:32:29.145238 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:29.145299 128 model_lifecycle.cc:265] BackgroundModelsSize()
...
I0127 01:32:38.150694 128 server.cc:347] Timeout 19: Found 1 live models and 0 in-flight non-inference requests
I0127 01:32:38.188567 128 python_be.cc:2342] TRITONBACKEND_ModelFinalize: delete model state
I0127 01:32:38.188650 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1
I0127 01:32:38.188665 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0127 01:32:39.151129 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:39.151212 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:39.151340 128 server.cc:347] Timeout 18: Found 0 live models and 0 in-flight non-inference requests
...
Related PR: triton-inference-server/server#6835
When the server is stopping, it will now check the background for any model(s) that are loading/unloading. If there are models in transition in the background, the server will wait until all transitions have completed before commencing the shutdown.