Server stop will wait until all background models are unloaded #323

kthui · 2024-01-26T00:20:38Z

Related PR: triton-inference-server/server#6835

When the server is stopping, it will now check the background for any model(s) that are loading/unloading. If there are models in transition in the background, the server will wait until all transitions have completed before commencing the shutdown.

…lete loading" This reverts commit 62a4e99.

rmccorm4 · 2024-01-26T20:36:22Z

src/server.cc

@@ -341,6 +341,7 @@ InferenceServer::Stop(const bool force)
      }
    } else {
      const auto& live_models = model_repository_manager_->LiveModelStates();
+      size_t bg_models_size = model_repository_manager_->BackgroundModelsSize();


It would probably be good to amend the log output just below as well for clarity on shutdown.

"Found X live models, Y unloading models, and Z inflight non-inference requests"

Otherwise, in the event of 0 live and 0 inflight but 1 unloading, the logs could be confusing:

Timeout 30: Found 0 live and 0 inflight Timeout 29: Found 0 live and 0 inflight Timeout 28: Found 0 live and 0 inflight ... Timeout failed...

Following up on triton-inference-server/server#6835 (comment) :

A live model could also be unloading. How about
"Found X live models, Y ghost models, and Z inflight non-inference requests"
or
"Found (X + Y) live models and Z in-flight non-inference requests"?

I would slightly prefer the latter one, because I don't know if there is any test depending on that timeout statement.

I think it's fine to sum them like the latter if the background model is enough to block the shutdown in the same way as live models

Updated: Show bg model size when printing timeout info

Signal (15) received. I0127 01:32:27.144715 128 server.cc:307] Waiting for in-flight requests to complete. I0127 01:32:27.144726 128 model_lifecycle.cc:223] StopAllModels() I0127 01:32:27.144735 128 model_lifecycle.cc:241] InflightStatus() I0127 01:32:27.144743 128 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences I0127 01:32:27.144750 128 model_lifecycle.cc:390] AsyncUnload() 'identity_fp32' I0127 01:32:27.144792 128 server.cc:338] All models are stopped, unloading models I0127 01:32:27.144799 128 model_lifecycle.cc:190] LiveModelStates() I0127 01:32:27.144808 128 model_lifecycle.cc:265] BackgroundModelsSize() I0127 01:32:27.144815 128 server.cc:347] Timeout 30: Found 2 live models and 0 in-flight non-inference requests I0127 01:32:27.144820 128 backend_model_instance.cc:795] Stopping backend thread for identity_fp32_0... I0127 01:32:27.144822 128 server.cc:353] identity_fp32 v1: UNLOADING I0127 01:32:27.144861 128 identity.cc:730] TRITONBACKEND_ModelInstanceFinalize: delete instance state I0127 01:32:27.145098 128 identity.cc:638] TRITONBACKEND_ModelFinalize: delete model state I0127 01:32:27.145113 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1 I0127 01:32:27.145121 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1 I0127 01:32:28.145006 128 model_lifecycle.cc:190] LiveModelStates() I0127 01:32:28.145051 128 model_lifecycle.cc:265] BackgroundModelsSize() I0127 01:32:28.145060 128 server.cc:347] Timeout 29: Found 1 live models and 0 in-flight non-inference requests I0127 01:32:29.145238 128 model_lifecycle.cc:190] LiveModelStates() I0127 01:32:29.145299 128 model_lifecycle.cc:265] BackgroundModelsSize() ... I0127 01:32:38.150694 128 server.cc:347] Timeout 19: Found 1 live models and 0 in-flight non-inference requests I0127 01:32:38.188567 128 python_be.cc:2342] TRITONBACKEND_ModelFinalize: delete model state I0127 01:32:38.188650 128 model_lifecycle.cc:618] OnDestroy callback() 'identity_fp32' version 1 I0127 01:32:38.188665 128 model_lifecycle.cc:620] successfully unloaded 'identity_fp32' version 1 I0127 01:32:39.151129 128 model_lifecycle.cc:190] LiveModelStates() I0127 01:32:39.151212 128 model_lifecycle.cc:265] BackgroundModelsSize() I0127 01:32:39.151340 128 server.cc:347] Timeout 18: Found 0 live models and 0 in-flight non-inference requests ...

kthui added 3 commits January 24, 2024 16:55

Unload all models wait until all models complete loading

62a4e99

[NOT ROOT CAUSE] Revert "Unload all models wait until all models comp…

5324dd3

…lete loading" This reverts commit 62a4e99.

Server shutdown must wait for background models

b1ad173

kthui mentioned this pull request Jan 26, 2024

Add test for shutdown while unloading model in background triton-inference-server/server#6835

Merged

kthui changed the title ~~Server stop to wait until all background models are unloaded~~ Server stop will wait until all background models are unloaded Jan 26, 2024

kthui requested review from GuanLuo, nnshah1 and rmccorm4 January 26, 2024 17:44

kthui marked this pull request as ready for review January 26, 2024 17:44

rmccorm4 reviewed Jan 26, 2024

View reviewed changes

kthui requested a review from rmccorm4 January 27, 2024 00:49

Show bg model size when printing timeout info

5aef7f1

rmccorm4 approved these changes Jan 27, 2024

View reviewed changes

kthui merged commit 34d2650 into main Jan 27, 2024
1 check passed

kthui deleted the jacky-shutdown-hang branch January 27, 2024 02:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server stop will wait until all background models are unloaded #323

Server stop will wait until all background models are unloaded #323

kthui commented Jan 26, 2024

rmccorm4 Jan 26, 2024

kthui Jan 27, 2024 •

edited

Loading

rmccorm4 Jan 27, 2024

kthui Jan 27, 2024 •

edited

Loading

Server stop will wait until all background models are unloaded #323

Server stop will wait until all background models are unloaded #323

Conversation

kthui commented Jan 26, 2024

rmccorm4 Jan 26, 2024

Choose a reason for hiding this comment

kthui Jan 27, 2024 • edited Loading

Choose a reason for hiding this comment

rmccorm4 Jan 27, 2024

Choose a reason for hiding this comment

kthui Jan 27, 2024 • edited Loading

Choose a reason for hiding this comment

kthui Jan 27, 2024 •

edited

Loading

kthui Jan 27, 2024 •

edited

Loading