Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for shutdown while unloading model in background #6835

Merged
merged 1 commit into from
Jan 27, 2024

Conversation

kthui
Copy link
Contributor

@kthui kthui commented Jan 26, 2024

Related PR: triton-inference-server/core#323

Add a new test that loads a model that will unload for 10 seconds. The model is then placed into the background and unloaded, by model config overwrite. Before the model can finish unloading in the background, the server is stopped. The server must wait until the background model has finish unloading before shutting down.

The new test verifies the wait is properly commenced by checking the number of models successfully unloaded after the server stops and shuts down. If the server shuts down without waiting for the background model, there will only be one successful model unload logged. If the server shuts down after waiting for the background model, there will be two successful model unload logged.

@kthui kthui force-pushed the jacky-shutdown-hang branch from cdbd021 to 3e3bae2 Compare January 26, 2024 00:23
@kthui kthui changed the title Add test for shutdown while unloading model(s) in background Add test for shutdown while unloading model in background Jan 26, 2024
@kthui kthui requested review from nnshah1, rmccorm4 and GuanLuo January 26, 2024 17:44
@kthui kthui marked this pull request as ready for review January 26, 2024 17:44
Comment on lines +3338 to +3346
# Load the Identity version, which will put the Python version into the
# background and unload it, the unload will take at least 10 seconds.
override_config = "{\n"
override_config += '"name": "identity_fp32",\n'
override_config += '"backend": "identity"\n'
override_config += "}"
triton_client.load_model(model_name, config=override_config)
identity_model_config = triton_client.get_model_config(model_name)
self.assertEqual(identity_model_config["backend"], "identity")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is doing all this any different than just calling client.unload_model("identity_fp32") ? Maybe I don't understand what the "background" models are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This is probably the most tricky behavior on the model_lifecycle.cc.

When a model is unloaded normally, it goes through the AsyncUnload() function, which keeps the model in the foreground. On the next AsyncLoad(), it will reuse the object. Therefore, the model never goes into the background if this route is taken.

When a model is reloaded (via AsyncLoad()), the model is placed into the background if it cannot finish unloading by the time the reloaded model is ready (which should be the case for almost 100%).

Thus, the case is only reproducible when the model is reloaded and immediately followed by a server shutdown.

Copy link
Contributor Author

@kthui kthui Jan 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible follow-up question: Why it is ok for the model to be unloading in the foreground while the server is shutting down?
When the model is unloading, the LiveModelStates() will see the unloading model because its state is neither unknown nor unavailable. Once it completes unloading, its state goes into unknown.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for explaining!

Just for my own curiosity - when the python model is put in background and identity model starts getting loaded -- does the python model immediately start unloading? Or only when the identity model has successfully finished loading, so we know that we don't need the background model for recovery anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't want any glitch during a reload, the Python model is left serving in the foreground while the identity model is loading in the background. Once the identity model finishes loading, the two models swap their places, so the Python model is in the background unloading and the identity model is in the foreground serving.

@kthui kthui requested a review from rmccorm4 January 27, 2024 00:49
@kthui kthui merged commit b0e7e50 into main Jan 27, 2024
3 checks passed
@kthui kthui deleted the jacky-shutdown-hang branch January 27, 2024 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants