You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I would like to use the Intel oneDNN Execution Provider (EP) in ONNX Runtime built for Triton Inference Server ONNX Backend.
Describe the solution you'd like
Ideally, the oneDNN EP should be enabled the same way we can enable the usage of OpenVino EP in model configuration:
Describe alternatives you've considered
I've tried to pass dnnl under cpu_execution_accelerator, but this is not supported.
oneDNN might yield greater performance improvements for CPU inference than OpenVino, that is why it would be great to be able to use it within the Triton Inference Server.
When using the python wheel from the ONNX Runtime built with DNNL execution provider, it will be automatically prioritized over the CPU execution provider. Python APIs details are here.
Is your feature request related to a problem? Please describe.
I would like to use the Intel oneDNN Execution Provider (EP) in ONNX Runtime built for Triton Inference Server ONNX Backend.
Describe the solution you'd like
Ideally, the oneDNN EP should be enabled the same way we can enable the usage of OpenVino EP in model configuration:
Describe alternatives you've considered
I've tried to pass
dnnl
undercpu_execution_accelerator
, but this is not supported.oneDNN might yield greater performance improvements for CPU inference than OpenVino, that is why it would be great to be able to use it within the Triton Inference Server.
Update: Furthermore, it seems that
onednn
is enabled by default for ONNX Runtime wheel built withonednn
over the default ONNX Runtime CPU Execution Provider:Additional context
ONNX Runtime documentation: https://fs-eire.github.io/onnxruntime/docs/execution-providers/oneDNN-ExecutionProvider.html
The text was updated successfully, but these errors were encountered: