Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多线程下,模型并行效率线性递增 #1261

Open
longsy316 opened this issue Feb 8, 2023 · 4 comments
Open

多线程下,模型并行效率线性递增 #1261

longsy316 opened this issue Feb 8, 2023 · 4 comments

Comments

@longsy316
Copy link

longsy316 commented Feb 8, 2023

Environment

FastDeploy version: e.g 0.8.0 or the latest code in develop branch
OS Platform: e.g. Linux x64 / Windows x64 / Mac OSX 12.1(arm or intel)
Hardware: e.g. Nvidia GPU 3080Ti CUDA 11.8 CUDNN 8.6
Program Language: e.g. C++

Problem description

Please attach the log file if there's problem happend.
yolox_deploy.zip

问题描述:
1.进行多线程调用时,
一个线程 是内部推理 10次 (单个模型4ms)40 ms, cuda 利用率 59%
两个线程 是内部推理 10次 (单个模型4ms)80 ms,cuda 利用率 67%
三个线程 是内部推理 10次 (单个模型4ms)110 ms,cuda 利用率 96%
以上cuda利用率均正常。

实验1:以下是另外一个测试验证,nvpp分析,模型不存在并行,效率是随着模型并行个数的增多,线性增加。
7

实验2(fasterdeploy推理)
_20230208134611
通过nvpp 分析,cuda 流已经有并发情况,但是时间为啥还是线性增加。按道理cuda流并行后,效率会有所提速。

能否帮忙分析问题所在!

@longsy316
Copy link
Author

已经解决,现在已经可以实现模型并行,cuda的nv利用率非常高了。

1678861486950
四个模型的并行可以同时在20ms 以内,只要cuda利用率在70 80 下;

@Hr-Song
Copy link

Hr-Song commented Mar 16, 2023

我遇到了类似的问题,请问怎么解决的?

@luameows
Copy link

luameows commented Oct 9, 2023

@longsy316 @Hr-Song 请教下是怎么处理的呢,我在测试也发现用tensorrt作为backend的时候,多线程推理似乎底层被上了锁,耗时都是串行增加的

@sanersbug
Copy link

@longsy316 @Hr-Song @longsy316 请问问题怎么解决的,我遇到了同样的问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants