ORT backend always returns tensor on CPU #73

aklife97 · 2021-09-28T22:15:57Z

Description
The ORT backend always returns output tensors on CPU even when the instance is on GPU - when run using BLS through the python backend.

Expected behavior
The output tensor should be on the GPU when the instance kind is GPU for the ONNX model.

CoderHam · 2021-09-28T23:27:37Z

@Tabrizian @tanmayv25 can you look into the same

askhade · 2021-10-04T19:39:57Z

This issue explains the current limitation and why output is always on CPU : triton-inference-server/server#3364

askhade · 2021-11-17T21:04:47Z

Once triton-inference-server/server#3364 is merged we will enable output binding to gpus in ort backend.

Slyne · 2021-12-15T07:46:06Z

Any update ?
@askhade
@deadeyegoodwin

askhade · 2022-01-25T17:06:14Z

@Slyne This will be available in triton 22.02 release.

Slyne · 2022-01-26T12:16:57Z

@askhade Thank you for informing!

vu0607 · 2024-05-23T11:20:55Z

@askhade
I serving encoder-decoder model (TrOCR) on Triton onnx backend. I meet a problem:
First, I call and get output from encoder model in server. Afterthat, because output on GPU, I need to transfer to CPU for converting to numpy before call output from decoder model on server. It make botteneck. Hope you can help me with issue. Thanks a lot.

askhade self-assigned this Jan 25, 2022

Tabrizian mentioned this issue Jan 27, 2022

BLS script + FORCE_CPU_ONLY_INPUT_TENSORS -> output tensor from ORT is NEVER on GPU memory triton-inference-server/server#3857

Closed

askhade linked a pull request Feb 8, 2022 that will close this issue

enable io binding for outputs #98

Merged

deadeyegoodwin closed this as completed in #98 Feb 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORT backend always returns tensor on CPU #73

ORT backend always returns tensor on CPU #73

aklife97 commented Sep 28, 2021

CoderHam commented Sep 28, 2021 •

edited

Loading

askhade commented Oct 4, 2021

askhade commented Nov 17, 2021

Slyne commented Dec 15, 2021 •

edited

Loading

askhade commented Jan 25, 2022

Slyne commented Jan 26, 2022

vu0607 commented May 23, 2024 •

edited

Loading

ORT backend always returns tensor on CPU #73

ORT backend always returns tensor on CPU #73

Comments

aklife97 commented Sep 28, 2021

CoderHam commented Sep 28, 2021 • edited Loading

askhade commented Oct 4, 2021

askhade commented Nov 17, 2021

Slyne commented Dec 15, 2021 • edited Loading

askhade commented Jan 25, 2022

Slyne commented Jan 26, 2022

vu0607 commented May 23, 2024 • edited Loading

CoderHam commented Sep 28, 2021 •

edited

Loading

Slyne commented Dec 15, 2021 •

edited

Loading

vu0607 commented May 23, 2024 •

edited

Loading