Model loading failure: densenet_onnx fails to load due to "pthread_setaffinity_np" failure #86

shrek · 2021-11-30T21:59:50Z

Description

I am testing tritonserver on the example models fetched using this script:
https://github.com/triton-inference-server/server/blob/main/docs/examples/fetch_models.sh

triton server is run as follows:

export MODEL_PATH=/tmp/tensorrt-inference-server
/opt/tritonserver/bin/tritonserver  --strict-model-config=false --model-store=$MODEL_PATH/docs/examples/model_repository 2>&1 | tee $MODEL_PATH/svrStatus.txt

the server fails with:

I1130 21:40:16.147155 3120 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

The densenet_onnx model fails to load with:

| densenet_onnx        | 1       | UNAVAILABLE: Internal: onnx runtime error 1: /workspace/onnxruntime/onnxruntime/core/platform/posix/env.cc:173 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolInterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 2 error msg: No such file or directory |

The container has has a restricted cpuset which likely contributes to the above failure:

cat /sys/fs/cgroup/cpuset/cpuset.cpus
9-12,49-52

The tritonserver works fine on another container whose cpuset looks like this:

cat /sys/fs/cgroup/cpuset/cpuset.cpus
0-255

Likely the onnxruntime threadoptions affinity setting has to match the cpuset.

Triton Information
What version of Triton are you using?
2.15.0

Are you using the Triton container or did you build it yourself?
using the nvidia ngc container tritonserver:21.10-py3

To Reproduce

Run the tritonserver container with a restricted cpuset.
Inside container:

MODEL_PATH=/tmp/tensorrt-inference-server


git clone https://github.com/NVIDIA/tensorrt-inference-server.git
cd ${MODEL_PATH}/docs/examples/
bash fetch_models.sh

/opt/tritonserver/bin/tritonserver  --strict-model-config=false --model-store=$MODEL_PATH/docs/examples/model_repository 2>&1 | tee $MODEL_PATH/svrStatus.txt

Expected behavior

there should be no failure to load the densenet_onnx model.

The text was updated successfully, but these errors were encountered:

inkinworld · 2021-12-15T06:31:58Z

same problem.

Rikanishu · 2022-01-21T16:20:49Z

+1, also subscribing to this issue

scse-l · 2022-10-25T02:21:12Z

same problem

ruanmk · 2023-04-15T02:34:49Z

same problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model loading failure: densenet_onnx fails to load due to "pthread_setaffinity_np" failure #86

Model loading failure: densenet_onnx fails to load due to "pthread_setaffinity_np" failure #86

shrek commented Nov 30, 2021

inkinworld commented Dec 15, 2021

Rikanishu commented Jan 21, 2022

scse-l commented Oct 25, 2022

ruanmk commented Apr 15, 2023

Model loading failure: densenet_onnx fails to load due to "pthread_setaffinity_np" failure #86

Model loading failure: densenet_onnx fails to load due to "pthread_setaffinity_np" failure #86

Comments

shrek commented Nov 30, 2021

inkinworld commented Dec 15, 2021

Rikanishu commented Jan 21, 2022

scse-l commented Oct 25, 2022

ruanmk commented Apr 15, 2023