-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for shutdown while unloading model in background #6835
Conversation
cdbd021
to
3e3bae2
Compare
# Load the Identity version, which will put the Python version into the | ||
# background and unload it, the unload will take at least 10 seconds. | ||
override_config = "{\n" | ||
override_config += '"name": "identity_fp32",\n' | ||
override_config += '"backend": "identity"\n' | ||
override_config += "}" | ||
triton_client.load_model(model_name, config=override_config) | ||
identity_model_config = triton_client.get_model_config(model_name) | ||
self.assertEqual(identity_model_config["backend"], "identity") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is doing all this any different than just calling client.unload_model("identity_fp32")
? Maybe I don't understand what the "background" models are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. This is probably the most tricky behavior on the model_lifecycle.cc
.
When a model is unloaded normally, it goes through the AsyncUnload()
function, which keeps the model in the foreground. On the next AsyncLoad()
, it will reuse the object. Therefore, the model never goes into the background if this route is taken.
When a model is reloaded (via AsyncLoad()
), the model is placed into the background if it cannot finish unloading by the time the reloaded model is ready (which should be the case for almost 100%).
Thus, the case is only reproducible when the model is reloaded and immediately followed by a server shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible follow-up question: Why it is ok for the model to be unloading in the foreground while the server is shutting down?
When the model is unloading, the LiveModelStates()
will see the unloading model because its state is neither unknown nor unavailable. Once it completes unloading, its state goes into unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks for explaining!
Just for my own curiosity - when the python model is put in background and identity model starts getting loaded -- does the python model immediately start unloading? Or only when the identity model has successfully finished loading, so we know that we don't need the background model for recovery anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't want any glitch during a reload, the Python model is left serving in the foreground while the identity model is loading in the background. Once the identity model finishes loading, the two models swap their places, so the Python model is in the background unloading and the identity model is in the foreground serving.
Related PR: triton-inference-server/core#323
Add a new test that loads a model that will unload for 10 seconds. The model is then placed into the background and unloaded, by model config overwrite. Before the model can finish unloading in the background, the server is stopped. The server must wait until the background model has finish unloading before shutting down.
The new test verifies the wait is properly commenced by checking the number of models successfully unloaded after the server stops and shuts down. If the server shuts down without waiting for the background model, there will only be one successful model unload logged. If the server shuts down after waiting for the background model, there will be two successful model unload logged.