Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Priority Configs (All Gather) (Master) #11734

Closed
2 tasks
Tracked by #10874 ...
SeanNijjar opened this issue Aug 21, 2024 · 1 comment
Closed
2 tasks
Tracked by #10874 ...

Enable Priority Configs (All Gather) (Master) #11734

SeanNijjar opened this issue Aug 21, 2024 · 1 comment
Assignees
Labels
feature op_cat: ccl Op Generalization Generalization and relaxations of requirements in Ops P1 perf for issues tracking performance problems/improvements

Comments

@SeanNijjar
Copy link
Contributor

SeanNijjar commented Aug 21, 2024

Tasks

  • Enable Priority Configs (All Gather) (T3000)
  • Enable Priority Configs (All Gather) (TG)

Priority Configs:

Prior to sweeping, the priority configs should be added and tested.

From Llama 405B

  • line-all-gather (4 chips, 3 links, input tensor (per chip) = [1,1,32,6.5*1024], dim=1 => input_tensor for test_case = [1,4,32,6.5*1024])
  • line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1,1,32,2304], dim=1 => input_tensor for test_case = [1,8,32,2304])
  • line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1,1,32,4k], dim=1 => input_tensor for test_case = [1,8,32,4k])
  • line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1, 1, 8[padded to 32], 4k], dim=2 => output shape (per chip) = [1, 1, 32, 4k] -> all-gather concatenates within tile
    • currently expected to fail as this feature is missing
  • line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1, 1, 8[padded to 32], 16k], dim=2 => output shape (per chip) = [1, 1, 32, 16k] -> all-gather concatenates within tile
    • currently expected to fail as this feature is missing

From Llama 70B

  • line-all-gather (4 chips, 3 links, input tensor (per chip) = [1,1,32,3.5*1024], dim=1 => input_tensor for test_case = [1,4,32,(int)(3.5*1024)])
  • line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1,1,32,1280], dim=1 => input_tensor for test_case = [1,8,32,1280])
  • line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1,1,32,2048], dim=1 => input_tensor for test_case = [1,8,32,2048])
  • line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1, 1, 8[padded to 32], 2k], dim=2 => output shape (per chip) = [1, 1, 32, 2k] -> all-gather concatenates within tile
    • currently expected to fail as this feature is missing
  • line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1, 1, 8[padded to 32], 4k], dim=2 => output shape (per chip) = [1, 1, 32, 4k] -> all-gather concatenates within tile
    • currently expected to fail as this feature is missing
@SeanNijjar SeanNijjar self-assigned this Aug 21, 2024
@SeanNijjar SeanNijjar added P1 feature Op Generalization Generalization and relaxations of requirements in Ops op_cat: ccl perf for issues tracking performance problems/improvements labels Oct 10, 2024
@SeanNijjar
Copy link
Contributor Author

These test cases are running in TG post-commit (frequent) already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature op_cat: ccl Op Generalization Generalization and relaxations of requirements in Ops P1 perf for issues tracking performance problems/improvements
Projects
None yet
Development

No branches or pull requests

1 participant