You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Can somebody explain why inference with batch isn't more efficient in MNN? When I run detection on single image it takes 7 miliseconds and when I run on batch of 32 images it takes 8 miliseconds per image. This is only the time of inference measured by time of runSession without preparing images and postprocessing. What can I use to reach better results?
The text was updated successfully, but these errors were encountered:
The issue's bug has been resolved. Now batch inference time depend on the compute flops of device. If one image has reach the peak flops, then batch image will not more efficient.
The get the device's compute peak flops, can use ./run_test.out speed/MatMulBConst
Normally GPU has more flops than CPU, you can use opencl instead of CPU to forward.
jxt1234
added
the
User
The user ask question about how to use. Or don't use MNN correctly and cause bug.
label
Jan 29, 2025
I've run the modified test with dimensions closer to mine. The results are below. My image is a tensor of shape (3, 413, 413).
Could you explain me if I correctly understand that there is no large difference in FLOPS result between 10x and 100x larger sizes and therefore does not achieve better detection time when using batch?
I've also tried to quantize model, which reduced it size from 6.9MB to 1.8MB, but time increased from 7.5ms to 11ms which also seems strange to me. I used low precision in my model's BackendConfig.
Do you have any different advices what can I use to reduce time of inference if I cannot use GPU?
(base) daniel@Daniel-PC:~/Desktop/MNN/build$ ./run_test.out speed/MatMulBConst
CPU Group: [ 14 12 15 13 ], 800000 - 3600000
CPU Group: [ 11 8 6 4 2 0 9 10 7 5 3 1 ], 800000 - 4900000
The device supports: i8sdot:0, fp16:0, i8mm: 0, sve2: 0
running speed/MatMulBConstTest.
MatMul B Const (Conv1x1): [540, 540, 320], run 100
_runConst, 203, cost time: 9.487000 ms
[540, 540, 320], Avg time: 1.366700 ms , flops: 68.275406 G
MatMul B Const (Conv1x1): [1024, 1024, 1024], run 100
_runConst, 203, cost time: 18.649000 ms
[1024, 1024, 1024], Avg time: 13.539290 ms , flops: 79.305626 G
MatMul B Const (Conv1x1): [3, 416, 416], run 1000
_runConst, 203, cost time: 0.081000 ms
[3, 416, 416], Avg time: 0.011036 ms , flops: 47.043137 G
MatMul B Const (Conv1x1): [30, 416, 416], run 1000
_runConst, 203, cost time: 0.153000 ms
[30, 416, 416], Avg time: 0.079503 ms , flops: 65.301689 G
MatMul B Const (Conv1x1): [300, 416, 416], run 100
_runConst, 203, cost time: 0.807000 ms
[300, 416, 416], Avg time: 0.753230 ms , flops: 68.925568 G
speed/MatMulBConstTest cost time: 1693.446 ms
√√√ all <speed/MatMulBConst> tests passed.
TEST_NAME_UNIT: 单元测试
TEST_CASE_AMOUNT_UNIT: {"blocked":0,"failed":0,"passed":1,"skipped":0}
TEST_CASE={"name":"单元测试","failed":0,"passed":1}
Originally posted by @mingyunzzu in #673
Can somebody explain why inference with batch isn't more efficient in MNN? When I run detection on single image it takes 7 miliseconds and when I run on batch of 32 images it takes 8 miliseconds per image. This is only the time of inference measured by time of runSession without preparing images and postprocessing. What can I use to reach better results?
The text was updated successfully, but these errors were encountered: