-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line/Ring All Gather Test Sweeps #10874
Labels
Comments
Aswinmcw
added a commit
that referenced
this issue
Jul 30, 2024
3 tasks
Aswinmcw
added a commit
that referenced
this issue
Jul 30, 2024
Aswinmcw
added a commit
that referenced
this issue
Jul 30, 2024
Aswinmcw
added a commit
that referenced
this issue
Jul 30, 2024
Aswinmcw
added a commit
that referenced
this issue
Jul 31, 2024
Aswinmcw
added a commit
that referenced
this issue
Jul 31, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 1, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 1, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 2, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 2, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 5, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 5, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 5, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 5, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 5, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 7, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 7, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 7, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 7, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 7, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 7, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 7, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 7, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
Aswinmcw
added a commit
that referenced
this issue
Aug 22, 2024
…10885) * #10874: Enable test cases for concurrent instances * #10874: Move test to separate file * #10874: Add sweep test in new infra * #10874: Use t3k_device_mesh fixture and remove reused code * #10874: Use t3k_device_mesh fixture for concurrent instances * #10874: Use t3k_device_mesh fixture in sweep test * #10874: Fix test ncalls * #10874: Add symlink for CI * #0: Minor change * #10874: Enable cases * #10874: Use ttnn calls * #10874: Use loops * #10874: Use deprecated version * #10874: Modify sweep test to use device fixture
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Summary (Copy For All CCL Ops)-
This set of tasks should be completed for each CCL operation, though progress for each op can be completed independently.
For a given op we improve test coverage by adding breadth (adding more shape and argument combinations) and adding depth (running given op variations across more topologies and scale-out configurations). Together these form a 2D matrix of test coverage where each cell is itself a sweep over a multi-variable space.
Links to sub-tasks:
Here are the subtasks. High level info follows:
Enable Priority Configs (All Gather) (Master) #11734
Enable focused "adversarial" All Gather Tests #13565
Enable Basic Sweep Configs (All Gather) (Master) #11735
Enable Tile Padded Sweep Configs (All Gather) (Master) #11736
Enable Padded Shard Grids Sweep Configs (All Gather) (Master) #11737
Enable Large Input Sweep Configs (Master) #11738
Enable Mixed Input/Output Tensor Config Sweeps (All Gather) (Master) #11739
Testing Breadth
Priority Configs:
Prior to sweeping, the priority configs should be added and tested.
From Llama 405B
4
chips,3
links, input tensor (per chip) = [1,1,32,6.5*1024], dim=1 => input_tensor for test_case =[1,4,32,6.5*1024]
)8
chips,{3,4}
links, input tensor (per chip) = [1,1,32,2304], dim=1 => input_tensor for test_case =[1,8,32,2304]
)8
chips,{3,4}
links, input tensor (per chip) = [1,1,32,4k], dim=1 => input_tensor for test_case =[1,8,32,4k]
)8
chips,{3,4}
links, input tensor (per chip) = [1, 1, 8[padded to 32], 4k], dim=2 => output shape (per chip) = [1, 1, 32, 4k] -> all-gather concatenates within tile8
chips,{3,4}
links, input tensor (per chip) = [1, 1, 8[padded to 32], 16k], dim=2 => output shape (per chip) = [1, 1, 32, 16k] -> all-gather concatenates within tileFrom Llama 70B
4
chips,3
links, input tensor (per chip) = [1,1,32,3.5*1024], dim=1 => input_tensor for test_case =[1,4,32,(int)(3.5*1024)]
)8
chips,{3,4}
links, input tensor (per chip) = [1,1,32,1280], dim=1 => input_tensor for test_case =[1,8,32,1280]
)8
chips,{3,4}
links, input tensor (per chip) = [1,1,32,2048], dim=1 => input_tensor for test_case =[1,8,32,2048]
)8
chips,{3,4}
links, input tensor (per chip) = [1, 1, 8[padded to 32], 2k], dim=2 => output shape (per chip) = [1, 1, 32, 2k] -> all-gather concatenates within tile8
chips,{3,4}
links, input tensor (per chip) = [1, 1, 8[padded to 32], 4k], dim=2 => output shape (per chip) = [1, 1, 32, 4k] -> all-gather concatenates within tileBasic Sweep tests:
Tile Padding Sweep Tests:
After basic sweep tests are running
Advanced Sharding Sweep Tests:
After basic sweep tests are running
Basic Sweeps (Large Tensors)
Input/Output Tensor Attribute Mixing
Note:
The above should all be runnable on 8 chip, 1 link and then 4chip 2 link. 3 and 4 link variants are runnable on TG. TG generality testing will lag t3000 generality testing initially.
Adding Testing Depth
For a given test (list), enable the tests on various multichip configurations. Priorities may change over time and by op.
Basic Topology Configurations
Advanced Topology Configurations
"Random" Topology Configurations
In the case of sweeping, we enumerate the various ways to map lines and rings onto the given cluster. Below are some non-typical but valid test cases that should be included in the sweep:
The text was updated successfully, but these errors were encountered: