Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server stop will wait until all background models are unloaded #323

Merged
merged 4 commits into from
Jan 27, 2024

Conversation

kthui
Copy link
Contributor

@kthui kthui commented Jan 26, 2024

Related PR: triton-inference-server/server#6835

When the server is stopping, it will now check the background for any model(s) that are loading/unloading. If there are models in transition in the background, the server will wait until all transitions have completed before commencing the shutdown.

@kthui kthui changed the title Server stop to wait until all background models are unloaded Server stop will wait until all background models are unloaded Jan 26, 2024
@kthui kthui requested review from GuanLuo, nnshah1 and rmccorm4 January 26, 2024 17:44
@kthui kthui marked this pull request as ready for review January 26, 2024 17:44
@@ -341,6 +341,7 @@ InferenceServer::Stop(const bool force)
}
} else {
const auto& live_models = model_repository_manager_->LiveModelStates();
size_t bg_models_size = model_repository_manager_->BackgroundModelsSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to amend the log output just below as well for clarity on shutdown.

"Found X live models, Y unloading models, and Z inflight non-inference requests"

Otherwise, in the event of 0 live and 0 inflight but 1 unloading, the logs could be confusing:

Timeout 30: Found 0 live and 0 inflight
Timeout 29: Found 0 live and 0 inflight
Timeout 28: Found 0 live and 0 inflight
...
Timeout failed...

Copy link
Contributor Author

@kthui kthui Jan 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on triton-inference-server/server#6835 (comment) :

A live model could also be unloading. How about
"Found X live models, Y ghost models, and Z inflight non-inference requests"
or
"Found (X + Y) live models and Z in-flight non-inference requests"?

I would slightly prefer the latter one, because I don't know if there is any test depending on that timeout statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to sum them like the latter if the background model is enough to block the shutdown in the same way as live models

Copy link
Contributor Author

@kthui kthui Jan 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: Show bg model size when printing timeout info

Signal (15) received.
I0127 01:32:27.144715 128 server.cc:307] Waiting for in-flight requests to complete.
I0127 01:32:27.144726 128 model_lifecycle.cc:223] StopAllModels()
I0127 01:32:27.144735 128 model_lifecycle.cc:241] InflightStatus()
I0127 01:32:27.144743 128 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0127 01:32:27.144750 128 model_lifecycle.cc:390] AsyncUnload() 'identity_fp32'
I0127 01:32:27.144792 128 server.cc:338] All models are stopped, unloading models
I0127 01:32:27.144799 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:27.144808 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:27.144815 128 server.cc:347] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0127 01:32:27.144820 128 backend_model_instance.cc:795] Stopping backend thread for identity_fp32_0...
I0127 01:32:27.144822 128 server.cc:353] identity_fp32 v1: UNLOADING
I0127 01:32:27.144861 128 identity.cc:730] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0127 01:32:27.145098 128 identity.cc:638] TRITONBACKEND_ModelFinalize: delete model state
I0127 01:32:27.145113 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1
I0127 01:32:27.145121 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0127 01:32:28.145006 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:28.145051 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:28.145060 128 server.cc:347] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I0127 01:32:29.145238 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:29.145299 128 model_lifecycle.cc:265] BackgroundModelsSize()
...
I0127 01:32:38.150694 128 server.cc:347] Timeout 19: Found 1 live models and 0 in-flight non-inference requests
I0127 01:32:38.188567 128 python_be.cc:2342] TRITONBACKEND_ModelFinalize: delete model state
I0127 01:32:38.188650 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1
I0127 01:32:38.188665 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0127 01:32:39.151129 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:39.151212 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:39.151340 128 server.cc:347] Timeout 18: Found 0 live models and 0 in-flight non-inference requests
...

@kthui kthui requested a review from rmccorm4 January 27, 2024 00:49
@kthui kthui merged commit 34d2650 into main Jan 27, 2024
1 check passed
@kthui kthui deleted the jacky-shutdown-hang branch January 27, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants