You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FastDeploy version: e.g 0.8.0 or the latest code in develop branch
OS Platform: e.g. Linux x64 / Windows x64 / Mac OSX 12.1(arm or intel)
Hardware: e.g. Nvidia GPU 3080Ti CUDA 11.8 CUDNN 8.6
Program Language: e.g. C++
Problem description
Please attach the log file if there's problem happend. yolox_deploy.zip
Environment
FastDeploy version: e.g 0.8.0 or the latest code in develop branch
OS Platform: e.g. Linux x64 / Windows x64 / Mac OSX 12.1(arm or intel)
Hardware: e.g. Nvidia GPU 3080Ti CUDA 11.8 CUDNN 8.6
Program Language: e.g. C++
Problem description
Please attach the log file if there's problem happend.
yolox_deploy.zip
问题描述:
1.进行多线程调用时,
一个线程 是内部推理 10次 (单个模型4ms)40 ms, cuda 利用率 59%
两个线程 是内部推理 10次 (单个模型4ms)80 ms,cuda 利用率 67%
三个线程 是内部推理 10次 (单个模型4ms)110 ms,cuda 利用率 96%
以上cuda利用率均正常。
实验1:以下是另外一个测试验证,nvpp分析,模型不存在并行,效率是随着模型并行个数的增多,线性增加。

实验2(fasterdeploy推理)

通过nvpp 分析,cuda 流已经有并发情况,但是时间为啥还是线性增加。按道理cuda流并行后,效率会有所提速。
能否帮忙分析问题所在!
The text was updated successfully, but these errors were encountered: