Line/Ring All Gather Test Sweeps #10874

SeanNijjar · 2024-07-30T02:22:38Z

Summary (Copy For All CCL Ops)-

This set of tasks should be completed for each CCL operation, though progress for each op can be completed independently.

For a given op we improve test coverage by adding breadth (adding more shape and argument combinations) and adding depth (running given op variations across more topologies and scale-out configurations). Together these form a 2D matrix of test coverage where each cell is itself a sweep over a multi-variable space.

Links to sub-tasks:

Here are the subtasks. High level info follows:

Testing Breadth

Priority Configs:

Prior to sweeping, the priority configs should be added and tested.

From Llama 405B

line-all-gather (4 chips, 3 links, input tensor (per chip) = [1,1,32,6.5*1024], dim=1 => input_tensor for test_case = [1,4,32,6.5*1024])
line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1,1,32,2304], dim=1 => input_tensor for test_case = [1,8,32,2304])
line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1,1,32,4k], dim=1 => input_tensor for test_case = [1,8,32,4k])
line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1, 1, 8[padded to 32], 4k], dim=2 => output shape (per chip) = [1, 1, 32, 4k] -> all-gather concatenates within tile
- currently expected to fail as this feature is missing
line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1, 1, 8[padded to 32], 16k], dim=2 => output shape (per chip) = [1, 1, 32, 16k] -> all-gather concatenates within tile
- currently expected to fail as this feature is missing

From Llama 70B

line-all-gather (4 chips, 3 links, input tensor (per chip) = [1,1,32,3.5*1024], dim=1 => input_tensor for test_case = [1,4,32,(int)(3.5*1024)])
line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1,1,32,1280], dim=1 => input_tensor for test_case = [1,8,32,1280])
line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1,1,32,2048], dim=1 => input_tensor for test_case = [1,8,32,2048])
line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1, 1, 8[padded to 32], 2k], dim=2 => output shape (per chip) = [1, 1, 32, 2k] -> all-gather concatenates within tile
- currently expected to fail as this feature is missing
line-all-gather: (8 chips, {3,4} links, input tensor (per chip) = [1, 1, 8[padded to 32], 4k], dim=2 => output shape (per chip) = [1, 1, 32, 4k] -> all-gather concatenates within tile
- currently expected to fail as this feature is missing

Basic Sweep tests:

Sweep TensorMemoryLayout: {Single Bank, Interleaved, Width Sharded, Height Sharded, Block Sharded}
Dim: {3,2,1,0}
BufferType: {DRAM, L1}
Layout: {RowMajor, Tile}
Shapes: Constrained to tile/page aligned, Shard grids unpadded
- For inner dims (y, x), increment by 32 in each direction. For outer dims, increment by 1.
  - Outer dims can be swept by basic numbers, then relatively prime numbers up to 128.
  - More values can be swept over after all-gather is migrated to make more use of runtime args.
Dataformat: (fp16, bfp8)

Tile Padding Sweep Tests:

After basic sweep tests are running

For Layout == Tile, sweep over padded tile configurations/shapes:
- x-padded, y-aligned
  - from 1 to 31
- x-aligned, y-padded
  - from 1 to 31
- x-padded, y-padded
  - from 1 to 31

Advanced Sharding Sweep Tests:

After basic sweep tests are running

For TensorMemoryLayout == (WIDTH|HEIGHT|BLOCK) sharded, sweep over padded shard grid
- Sweep all possible shard grids
In addition to padded shard grids, also sweep lightly over grid offset

Basic Sweeps (Large Tensors)

Run the basic sweeps but only for very large tensor shapes (in the GBs, the goal is to make sure we can execute DRAM filling CCLs
- Assume 10GB usable space for WH per chip
- Be sure to include very short and wide tensors as well as narrow but tall tensors so we can stress dim sizes out too
  - This will help flush out any integer overflow issues that might be lurking

Input/Output Tensor Attribute Mixing

Mix and match combinations of the above but applied differently between the input and output tensor

Note:

The above should all be runnable on 8 chip, 1 link and then 4chip 2 link. 3 and 4 link variants are runnable on TG. TG generality testing will lag t3000 generality testing initially.

Adding Testing Depth

For a given test (list), enable the tests on various multichip configurations. Priorities may change over time and by op.

Basic Topology Configurations

hardware	topology	#links	#chips	#instances
n300	line	x1	2	1
n300	ring	x1	2	1
t3000	ring	x1	8	1
t3000	line	x1	8	1
t3000	line	x1	4	2
t3000	line	x1	3	1
t3000	ring	x1	4	1
TG	line	x{1,2,3,4}	8	4
TG	line	x{1,2,3}	4	8
TGG	line	x{1,2,3}	8	8

Advanced Topology Configurations

hardware	topology	#links	#chips	#concurrent_instances	comment
TG	line	x4	4	8	each column runs 2 separate line all-gathers. 8 all gathers total
TG	line	x4	5	4
TG	line	x4	6	4
TG	line	x4	7	4
TG	ring	x3	4 (2x2)	8	8 2x2 rings
TG	ring	x3	8 (2x4)	4	4 2x4 rings
TG	ring	x3	8 (4x2)	4	4 4x2 rings
TG	ring	x3	16 (8x2)	2	2 8x2 rings
TGG	line	x4	4	16	0

"Random" Topology Configurations

Build random topology generator

In the case of sweeping, we enumerate the various ways to map lines and rings onto the given cluster. Below are some non-typical but valid test cases that should be included in the sweep:

The text was updated successfully, but these errors were encountered:

…10885) * #10874: Enable test cases for concurrent instances * #10874: Move test to separate file * #10874: Add sweep test in new infra * #10874: Use t3k_device_mesh fixture and remove reused code * #10874: Use t3k_device_mesh fixture for concurrent instances * #10874: Use t3k_device_mesh fixture in sweep test * #10874: Fix test ncalls * #10874: Add symlink for CI * #0: Minor change * #10874: Enable cases * #10874: Use ttnn calls * #10874: Use loops * #10874: Use deprecated version * #10874: Modify sweep test to use device fixture

SeanNijjar mentioned this issue Jul 30, 2024

CCL Op Test Roadmap (Master Issue) #10873

Open

6 tasks

SeanNijjar assigned SeanNijjar and Aswinmcw Jul 30, 2024

SeanNijjar added Op Generalization Generalization and relaxations of requirements in Ops op_cat: ccl labels Jul 30, 2024

Aswinmcw added a commit that referenced this issue Jul 30, 2024

#10874: Enable test cases for concurrent instances

db9e3a3

Aswinmcw mentioned this issue Jul 30, 2024

#10874: Enable test cases for concurrent instances in CCL all gather #10885

Merged

3 tasks

Aswinmcw added a commit that referenced this issue Jul 30, 2024

#10874: Enable test cases for concurrent instances

16cab3d

Aswinmcw added a commit that referenced this issue Jul 30, 2024

#10874: Enable test cases for concurrent instances

2747a79

Aswinmcw added a commit that referenced this issue Jul 30, 2024

#10874: Enable test cases for concurrent instances

28906b3

Aswinmcw added a commit that referenced this issue Jul 31, 2024

#10874: Enable test cases for concurrent instances

8e985ab

Aswinmcw added a commit that referenced this issue Jul 31, 2024

#10874: Enable test cases for concurrent instances

7fee5d5

SeanNijjar added the P1 label Jul 31, 2024

Aswinmcw added a commit that referenced this issue Aug 1, 2024

#10874: Enable test cases for concurrent instances

d16c84e

Aswinmcw added a commit that referenced this issue Aug 1, 2024

#10874: Enable t3000 - line - x1 - 8 - 1 configuration

2b4f655

Aswinmcw added a commit that referenced this issue Aug 2, 2024

#10874: Enable t3000 - line - x2 - 4 - 1 configuration

e38f933

Aswinmcw added a commit that referenced this issue Aug 2, 2024

#10874: Move test to separate file

0daf267

Aswinmcw added a commit that referenced this issue Aug 5, 2024

#10874: Enable test cases for concurrent instances

0a70649

Aswinmcw added a commit that referenced this issue Aug 5, 2024

#10874: Enable t3000 - line - x1 - 8 - 1 configuration

e5c041d

Aswinmcw added a commit that referenced this issue Aug 5, 2024

#10874: Enable t3000 - line - x2 - 4 - 1 configuration

275db17

Aswinmcw added a commit that referenced this issue Aug 5, 2024

#10874: Move test to separate file

9bec97c

Aswinmcw added a commit that referenced this issue Aug 5, 2024

#10874: Add sweep test in new infra

7036909

Aswinmcw added a commit that referenced this issue Aug 7, 2024

#10874: Enable test cases for concurrent instances

6c0c65f

Aswinmcw added a commit that referenced this issue Aug 7, 2024

#10874: Enable t3000 - line - x1 - 8 - 1 configuration

9695673

Aswinmcw added a commit that referenced this issue Aug 7, 2024

#10874: Enable t3000 - line - x2 - 4 - 1 configuration

47570aa

Aswinmcw added a commit that referenced this issue Aug 7, 2024

#10874: Move test to separate file

52901a9

Aswinmcw added a commit that referenced this issue Aug 7, 2024

#10874: Add sweep test in new infra

37b1091

Aswinmcw added a commit that referenced this issue Aug 7, 2024

#10874: Use t3k_device_mesh fixture and remove reused code

0eb3fe1

Aswinmcw added a commit that referenced this issue Aug 7, 2024

#10874: Use t3k_device_mesh fixture for concurrent instances

172371d

Aswinmcw added a commit that referenced this issue Aug 7, 2024

#10874: Use t3k_device_mesh fixture in sweep test

c830175

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use deprecated version

d175bbc

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Modify sweep test to use device fixture

63cc462

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Enable test cases for concurrent instances

e4d447a

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Move test to separate file

f16bbff

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Add sweep test in new infra

969c380

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use t3k_device_mesh fixture and remove reused code

685bb60

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use t3k_device_mesh fixture for concurrent instances

6c269a6

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use t3k_device_mesh fixture in sweep test

28a838b

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Fix test ncalls

3e3eb6c

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Add symlink for CI

639bcbe

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Enable cases

1ffef0a

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use ttnn calls

7e5f11b

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use loops

f488a47

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use deprecated version

1e974b4

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Modify sweep test to use device fixture

5aa82f5

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Enable test cases for concurrent instances

cb963f6

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Move test to separate file

5d07193

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Add sweep test in new infra

b4533fd

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use t3k_device_mesh fixture and remove reused code

c3a7750

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use t3k_device_mesh fixture for concurrent instances

df9bf73

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use t3k_device_mesh fixture in sweep test

d1fbfec

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Fix test ncalls

fb112d1

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Add symlink for CI

6c9c439

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Enable cases

f1fa26c

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use ttnn calls

77eeb56

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use loops

4545f89

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Use deprecated version

cbcf02a

Aswinmcw added a commit that referenced this issue Aug 22, 2024

#10874: Modify sweep test to use device fixture

1bd70ef

SeanNijjar added the master label Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Line/Ring All Gather Test Sweeps #10874

Line/Ring All Gather Test Sweeps #10874

SeanNijjar commented Jul 30, 2024 •

edited

Loading

Line/Ring All Gather Test Sweeps #10874

Line/Ring All Gather Test Sweeps #10874

Comments

SeanNijjar commented Jul 30, 2024 • edited Loading

Summary (Copy For All CCL Ops)-

Links to sub-tasks:

Testing Breadth

Priority Configs:

From Llama 405B

From Llama 70B

Basic Sweep tests:

Tile Padding Sweep Tests:

Advanced Sharding Sweep Tests:

Basic Sweeps (Large Tensors)

Input/Output Tensor Attribute Mixing

Note:

Adding Testing Depth

Basic Topology Configurations

Advanced Topology Configurations

"Random" Topology Configurations

SeanNijjar commented Jul 30, 2024 •

edited

Loading