Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model loading failure: densenet_onnx fails to load due to "pthread_setaffinity_np" failure #86

Open
shrek opened this issue Nov 30, 2021 · 4 comments

Comments

@shrek
Copy link

shrek commented Nov 30, 2021

Description

I am testing tritonserver on the example models fetched using this script:
https://github.com/triton-inference-server/server/blob/main/docs/examples/fetch_models.sh

triton server is run as follows:

export MODEL_PATH=/tmp/tensorrt-inference-server
/opt/tritonserver/bin/tritonserver  --strict-model-config=false --model-store=$MODEL_PATH/docs/examples/model_repository 2>&1 | tee $MODEL_PATH/svrStatus.txt

the server fails with:

I1130 21:40:16.147155 3120 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

The densenet_onnx model fails to load with:

| densenet_onnx        | 1       | UNAVAILABLE: Internal: onnx runtime error 1: /workspace/onnxruntime/onnxruntime/core/platform/posix/env.cc:173 onnxruntime::{anonymous}::PosixThread::PosixThread(const char*, int, unsigned int (*)(int, Eigen::ThreadPoolInterface*), Eigen::ThreadPoolInterface*, const onnxruntime::ThreadOptions&) pthread_setaffinity_np failed, error code: 2 error msg: No such file or directory |

The container has has a restricted cpuset which likely contributes to the above failure:

cat /sys/fs/cgroup/cpuset/cpuset.cpus
9-12,49-52

The tritonserver works fine on another container whose cpuset looks like this:

cat /sys/fs/cgroup/cpuset/cpuset.cpus
0-255

Likely the onnxruntime threadoptions affinity setting has to match the cpuset.

Triton Information
What version of Triton are you using?
2.15.0

Are you using the Triton container or did you build it yourself?
using the nvidia ngc container tritonserver:21.10-py3

To Reproduce

Run the tritonserver container with a restricted cpuset.
Inside container:

MODEL_PATH=/tmp/tensorrt-inference-server


git clone https://github.com/NVIDIA/tensorrt-inference-server.git
cd ${MODEL_PATH}/docs/examples/
bash fetch_models.sh

/opt/tritonserver/bin/tritonserver  --strict-model-config=false --model-store=$MODEL_PATH/docs/examples/model_repository 2>&1 | tee $MODEL_PATH/svrStatus.txt

Expected behavior

there should be no failure to load the densenet_onnx model.

@inkinworld
Copy link

same problem.

@Rikanishu
Copy link

+1, also subscribing to this issue

@scse-l
Copy link

scse-l commented Oct 25, 2022

same problem

1 similar comment
@ruanmk
Copy link

ruanmk commented Apr 15, 2023

same problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants