-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORT backend always returns tensor on CPU #73
Comments
@Tabrizian @tanmayv25 can you look into the same |
This issue explains the current limitation and why output is always on CPU : triton-inference-server/server#3364 |
Once triton-inference-server/server#3364 is merged we will enable output binding to gpus in ort backend. |
Any update ? |
@Slyne This will be available in triton 22.02 release. |
@askhade Thank you for informing! |
@askhade |
Description
The ORT backend always returns output tensors on CPU even when the instance is on GPU - when run using BLS through the python backend.
Expected behavior
The output tensor should be on the GPU when the instance kind is GPU for the ONNX model.
The text was updated successfully, but these errors were encountered: