You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description
An instance of Triton Server is started, and will not properly exit or shut down, just continuously wait for 0 inflight requests to terminate. Pinned memory also will not get unpinned in deconstruction, leading to issues.
Triton Information
What version of Triton are you using?
V2.19.0
Are you using the Triton container or did you build it yourself?
Self Built
To Reproduce
No models necessary for this behavior. No inference requests are made. Inference Server is launched in explicit control mode. No models are loaded in. Loading and unloading models and then attempting to terminate the server also reproduces this behavior.
TRITONSERVER_ServerStop(server_) is called and causes the following message to output:
I0925 19:37:50.657135 4035 server.cc:252] Waiting for in-flight requests to complete.
I0925 19:37:50.657365 4035 server.cc:267] Timeout 0: Found 0 live models and 0 in-flight non-inference requests.
I have added breakpoints at all TRITONSERVER calls to ensure that none are hanging indefinitely, and none are.
Server is declared as: TRITONSERVER_Server* server_ = nullptr
The server set up code.
TRITONSERVER_ServerOptions* serverOptions = nullptr;
TRITONSERVER_Error* error = TRITONSERVER_ServerOptionsNew(&serverOptions);
if (error != nullptr) {
logError("Failed to initialize ServerOptions instance", error);
}
getDefaultServerOptions(serverOptions);
server_ = nullptr;
error = TRITONSERVER_ServerNew(&server_, serverOptions);
if (error != nullptr) {
logError("Failed to create instance of server", error);
}
error = TRITONSERVER_ServerOptionsDelete(serverOptions);
if (error = nullptr) {
logError("Failed to delete the Server Options object", error);
}
size_t health_iters = 0;
bool serverHealthy = checkServerHealth();
while (!serverHealthy) {
serverHealthy = checkServerHealth();
if (++health_iters >= 10) {
m_triv_log(lg::error) << "failed to find healthy inference server";
}
std::this_thread::sleep_for(std::chrono::milliseconds(500));
}
And the code for getDefaultSeverOptions():
void getDefaultServerOptions(TRITONSERVER_ServerOptions* serverOptions)
{
TRITONSERVER_Error* error = TRITONSERVER_ServerOptionsSetLogVerbose(serverOptions, verboseLevel_);
if (error != nullptr) {
logError("Failed to set Verbose level to " + std::to_string(verboseLevel_), error);
}
int timeout = 0;
error = TRITONSERVER_ServerOptionsSetExitTimeout(serverOptions, timeout);
if (error != nullptr) {
logError("Failed to set server timeout to " + std::to_string(timeout), error);
}
error = TRITONSERVER_ServerOptionsSetModelControlMode(serverOptions, TRITONSERVER_MODEL_CONTROL_EXPLICIT);
if(error != nullptr) {
logError("Failed to set model control mode to explicit", error);
}
error = TRITONSERVER_ServerOptionsSetModelRepositoryPath(serverOptions, modelRepoPath_.c_str());
if (error != nullptr) {
logError("Failed to set model repo to " + modelRepoPath_, error);
}
for (const auto& bcs : backendConfig_) {
std::string configuration = std::string(std::get<0>(bcs))+","+std::get<1>(bcs)+","+std::get<2>(bcs);
error = TRITONSERVER_ServerOptionsSetBackendConfig(
serverOptions, std::get<0>(bcs), std::get<1>(bcs), std::get<2>(bcs)
);
if (error != nullptr) {
logError("Failed to set Backend Config to " + configuration, error);
}
}
error = TRITONSERVER_ServerOptionsSetBackendDirectory(serverOptions,backendPath_.c_str());
if (error != nullptr) {
logError("Failed to set Backend Directory to " + backendPath_, error);
}
error = TRITONSERVER_ServerOptionsSetRepoAgentDirectory(
serverOptions, repoAgentPath_.c_str()
);
if (error != nullptr) {
logError("Failed to set Repo Directory to " + repoAgentPath_, error);
}
error = TRITONSERVER_ServerOptionsSetStrictModelConfig(
serverOptions, strictConfig_
);
if (error != nullptr) {
logError("Failed to set Strict Config to " + std::to_string(strictConfig_), error);
}
#ifdef TRITON_ENABLE_GPU
double minComputeCapability = TRITON_MIN_COMPUTE_CAPABILITY;
m_triv_log(lg::info) << "GPU is enabled";
#else
double minComputeCapability = 0;
#endif // TRITON_ENABLE_GPU
error = TRITONSERVER_ServerOptionsSetMinSupportedComputeCapability(
serverOptions, minComputeCapability
);
if (error != nullptr) {
logError("Failed to set CUDA Min Compute Capability to " + std::to_string(minComputeCapability),
error);
}
}
If a triton instance is created, destructed, and then a new instance is created we receive the following output, indicating improper deletion/shutdown of previous instance:
W0925 19:50:28.026635 5745 pinned_memory_manager.cc:221] New pinned memory pool of size 268435456 could not be created since one already exists of size 268435456
W0925 19:50:28.027015 5745 cuda_memory_manager.cc:86] New CUDA memory pools could not be created since they already exists
Expected behavior
Properly exited/shutdown triton server. I have seen an the indicator message of proper shutdown but since I cannot get to this behavior I cannot paste the expected shutdown message here.
The text was updated successfully, but these errors were encountered:
nathanjacobiOXOS
changed the title
Triton C++ API not properly stopping server instance
Triton 2.19 C++ API not properly stopping server instance
Sep 25, 2023
Are you using the same version of Triton server and client?
2.19.0 is ~18 months old. Can you try this with a more recent release like 2.37.0 (23.08) and see if you see the same behavior? It's possible this behavior was already fixed. Any bug fixes would only be applied to future releases, so it would be good to check whether this behavior exists in a recent version of Triton.
Hi @dyastremsky , updating to 2.35 has fixed the deconstruction issue, and the memory pinning issue, however I am now encountering issues in a custom repository agent. Opened Issue #6359
Description
An instance of Triton Server is started, and will not properly exit or shut down, just continuously wait for 0 inflight requests to terminate. Pinned memory also will not get unpinned in deconstruction, leading to issues.
Triton Information
What version of Triton are you using?
V2.19.0
Are you using the Triton container or did you build it yourself?
Self Built
To Reproduce
No models necessary for this behavior. No inference requests are made. Inference Server is launched in explicit control mode. No models are loaded in. Loading and unloading models and then attempting to terminate the server also reproduces this behavior.
TRITONSERVER_ServerStop(server_) is called and causes the following message to output:
I have added breakpoints at all TRITONSERVER calls to ensure that none are hanging indefinitely, and none are.
Server is declared as:
TRITONSERVER_Server* server_ = nullptr
The server set up code.
And the code for getDefaultSeverOptions():
If a triton instance is created, destructed, and then a new instance is created we receive the following output, indicating improper deletion/shutdown of previous instance:
Expected behavior
Properly exited/shutdown triton server. I have seen an the indicator message of proper shutdown but since I cannot get to this behavior I cannot paste the expected shutdown message here.
The text was updated successfully, but these errors were encountered: