You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,say my baseline 1 instance and TP=4 , throughput is x
Suppose I have 8*A100 gpu, and I want to deploy a 72b model. I have the following two ways:
method
instance
TP
throughput
baseline
1
4
x
A
2
4
2x (apparently)
B
1
8
1.5x
I am a little confused, option B give bad throughput lesser than 2x, is it normal ?
or How can I get throughput greater than 2x with just 8A100 gpu? (or I can't?)
thanks for helping!
The text was updated successfully, but these errors were encountered:
I think this is normal. If your devices are lacking connectivity, like no NVLink on them. TP introduces overhead of communication between GPUs, thus slowing down the speed and making it less than 2x.
Hi,say my baseline
1 instance and TP=4
, throughput isx
Suppose I have 8*A100 gpu, and I want to deploy a 72b model. I have the following two ways:
I am a little confused, option B give bad throughput lesser than 2x, is it normal ?
or How can I get throughput greater than 2x with just 8A100 gpu? (or I can't?)
thanks for helping!
The text was updated successfully, but these errors were encountered: