Enable Priority Configs (All Gather) (Master) #11734
Labels
feature
op_cat: ccl
Op Generalization
Generalization and relaxations of requirements in Ops
P1
perf
for issues tracking performance problems/improvements
Tasks
Priority Configs:
Prior to sweeping, the priority configs should be added and tested.
From Llama 405B
4
chips,3
links, input tensor (per chip) = [1,1,32,6.5*1024], dim=1 => input_tensor for test_case =[1,4,32,6.5*1024]
)8
chips,{3,4}
links, input tensor (per chip) = [1,1,32,2304], dim=1 => input_tensor for test_case =[1,8,32,2304]
)8
chips,{3,4}
links, input tensor (per chip) = [1,1,32,4k], dim=1 => input_tensor for test_case =[1,8,32,4k]
)8
chips,{3,4}
links, input tensor (per chip) = [1, 1, 8[padded to 32], 4k], dim=2 => output shape (per chip) = [1, 1, 32, 4k] -> all-gather concatenates within tile8
chips,{3,4}
links, input tensor (per chip) = [1, 1, 8[padded to 32], 16k], dim=2 => output shape (per chip) = [1, 1, 32, 16k] -> all-gather concatenates within tileFrom Llama 70B
4
chips,3
links, input tensor (per chip) = [1,1,32,3.5*1024], dim=1 => input_tensor for test_case =[1,4,32,(int)(3.5*1024)]
)8
chips,{3,4}
links, input tensor (per chip) = [1,1,32,1280], dim=1 => input_tensor for test_case =[1,8,32,1280]
)8
chips,{3,4}
links, input tensor (per chip) = [1,1,32,2048], dim=1 => input_tensor for test_case =[1,8,32,2048]
)8
chips,{3,4}
links, input tensor (per chip) = [1, 1, 8[padded to 32], 2k], dim=2 => output shape (per chip) = [1, 1, 32, 2k] -> all-gather concatenates within tile8
chips,{3,4}
links, input tensor (per chip) = [1, 1, 8[padded to 32], 4k], dim=2 => output shape (per chip) = [1, 1, 32, 4k] -> all-gather concatenates within tileThe text was updated successfully, but these errors were encountered: