Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Server stop will wait until all background models are unloaded #323

Merged
merged 4 commits into from
Jan 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions src/model_repository_manager/model_lifecycle.cc
Original file line number Diff line number Diff line change
Expand Up @@ -259,6 +259,14 @@ ModelLifeCycle::InflightStatus()
return inflight_status;
}

size_t
ModelLifeCycle::BackgroundModelsSize()
{
LOG_VERBOSE(2) << "BackgroundModelsSize()";
std::lock_guard<std::mutex> map_lock(map_mtx_);
return background_models_.size();
}

const ModelStateMap
ModelLifeCycle::ModelStates()
{
Expand Down
3 changes: 3 additions & 0 deletions src/model_repository_manager/model_lifecycle.h
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,9 @@ class ModelLifeCycle {
// that don't have in-flight inferences will not be included.
const std::set<std::tuple<ModelIdentifier, int64_t, size_t>> InflightStatus();

// Return the number of model(s) in the background.
size_t BackgroundModelsSize();

private:
struct ModelInfo {
ModelInfo(
Expand Down
6 changes: 6 additions & 0 deletions src/model_repository_manager/model_repository_manager.cc
Original file line number Diff line number Diff line change
Expand Up @@ -981,6 +981,12 @@ ModelRepositoryManager::InflightStatus()
return model_life_cycle_->InflightStatus();
}

size_t
ModelRepositoryManager::BackgroundModelsSize()
{
return model_life_cycle_->BackgroundModelsSize();
}

const ModelStateMap
ModelRepositoryManager::LiveModelStates(bool strict_readiness)
{
Expand Down
3 changes: 3 additions & 0 deletions src/model_repository_manager/model_repository_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -438,6 +438,9 @@ class ModelRepositoryManager {
/// if it doesn't have in-flight inferences.
const std::set<std::tuple<ModelIdentifier, int64_t, size_t>> InflightStatus();

/// \return the number of model(s) in the background.
size_t BackgroundModelsSize();

/// \param strict_readiness If true, only models that have at least one
/// ready version will be considered as live. Otherwise, the models that
/// have loading / unloading versions will also be live.
Expand Down
9 changes: 5 additions & 4 deletions src/server.cc
Original file line number Diff line number Diff line change
Expand Up @@ -341,10 +341,11 @@ InferenceServer::Stop(const bool force)
}
} else {
const auto& live_models = model_repository_manager_->LiveModelStates();
size_t bg_models_size = model_repository_manager_->BackgroundModelsSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would probably be good to amend the log output just below as well for clarity on shutdown.

"Found X live models, Y unloading models, and Z inflight non-inference requests"

Otherwise, in the event of 0 live and 0 inflight but 1 unloading, the logs could be confusing:

Timeout 30: Found 0 live and 0 inflight
Timeout 29: Found 0 live and 0 inflight
Timeout 28: Found 0 live and 0 inflight
...
Timeout failed...

Copy link
Contributor Author

@kthui kthui Jan 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on triton-inference-server/server#6835 (comment) :

A live model could also be unloading. How about
"Found X live models, Y ghost models, and Z inflight non-inference requests"
or
"Found (X + Y) live models and Z in-flight non-inference requests"?

I would slightly prefer the latter one, because I don't know if there is any test depending on that timeout statement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to sum them like the latter if the background model is enough to block the shutdown in the same way as live models

Copy link
Contributor Author

@kthui kthui Jan 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated: Show bg model size when printing timeout info

Signal (15) received.
I0127 01:32:27.144715 128 server.cc:307] Waiting for in-flight requests to complete.
I0127 01:32:27.144726 128 model_lifecycle.cc:223] StopAllModels()
I0127 01:32:27.144735 128 model_lifecycle.cc:241] InflightStatus()
I0127 01:32:27.144743 128 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0127 01:32:27.144750 128 model_lifecycle.cc:390] AsyncUnload() 'identity_fp32'
I0127 01:32:27.144792 128 server.cc:338] All models are stopped, unloading models
I0127 01:32:27.144799 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:27.144808 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:27.144815 128 server.cc:347] Timeout 30: Found 2 live models and 0 in-flight non-inference requests
I0127 01:32:27.144820 128 backend_model_instance.cc:795] Stopping backend thread for identity_fp32_0...
I0127 01:32:27.144822 128 server.cc:353] identity_fp32 v1: UNLOADING
I0127 01:32:27.144861 128 identity.cc:730] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0127 01:32:27.145098 128 identity.cc:638] TRITONBACKEND_ModelFinalize: delete model state
I0127 01:32:27.145113 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1
I0127 01:32:27.145121 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0127 01:32:28.145006 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:28.145051 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:28.145060 128 server.cc:347] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
I0127 01:32:29.145238 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:29.145299 128 model_lifecycle.cc:265] BackgroundModelsSize()
...
I0127 01:32:38.150694 128 server.cc:347] Timeout 19: Found 1 live models and 0 in-flight non-inference requests
I0127 01:32:38.188567 128 python_be.cc:2342] TRITONBACKEND_ModelFinalize: delete model state
I0127 01:32:38.188650 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1
I0127 01:32:38.188665 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1
I0127 01:32:39.151129 128 model_lifecycle.cc:190] LiveModelStates()
I0127 01:32:39.151212 128 model_lifecycle.cc:265] BackgroundModelsSize()
I0127 01:32:39.151340 128 server.cc:347] Timeout 18: Found 0 live models and 0 in-flight non-inference requests
...

size_t num_models = live_models.size() + bg_models_size;

LOG_INFO << "Timeout " << exit_timeout_iters << ": Found "
<< live_models.size() << " live models and "
<< inflight_request_counter_
LOG_INFO << "Timeout " << exit_timeout_iters << ": Found " << num_models
<< " live models and " << inflight_request_counter_
<< " in-flight non-inference requests";
if (LOG_VERBOSE_IS_ON(1)) {
for (const auto& m : live_models) {
Expand All @@ -355,7 +356,7 @@ InferenceServer::Stop(const bool force)
}
}

if ((live_models.size() == 0) && (inflight_request_counter_ == 0)) {
if ((num_models == 0) && (inflight_request_counter_ == 0)) {
return Status::Success;
}
}
Expand Down
Loading