Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#16720: and #14898 update output dims for argmax and move pad for generic reduce #16989

Merged
merged 10 commits into from
Jan 25, 2025

Conversation

bbradelTT
Copy link
Contributor

@bbradelTT bbradelTT commented Jan 22, 2025

Ticket

Link to Github Issues #16720 and #14898

Problem description

  • when specify a dim, output tensor of argmax is not one rank smaller than input
  • transpose seems to insert it's own padding, which occurs for the early tensor dimensions

What's changed

  • for argmax, change shape of output tensor to have the right rank
  • for generic reduce, move pad filling to right before the reduce op is called
  • also update tests and add deprecated to another specialized reduce function

Checklist

@bbradelTT bbradelTT mentioned this pull request Jan 22, 2025
6 tasks
@bbradelTT bbradelTT changed the title #16720: update output dims for argmax and update reduce tests #16720: and #14898 update output dims for argmax and move pad for generic reduce Jan 22, 2025
@bbradelTT bbradelTT marked this pull request as ready for review January 22, 2025 22:07
@bbradelTT
Copy link
Contributor Author

Depends on #17048 for full functionality.
Will wait for that PR before running more tests / pipelines.

@@ -58,7 +58,7 @@ std::vector<TensorSpec> ArgMax::compute_output_specs(
ttnn::SimpleShape output_shape({1, 1, 1, 1});
if (this->dim.has_value()) {
auto input_shape = input_tensors[0].get_logical_shape();
output_shape = ttnn::SimpleShape{input_shape[0], input_shape[1], 1, input_shape[2]};
output_shape = ttnn::SimpleShape{input_shape[0], input_shape[1], input_shape[2]};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is keepdim always False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

argmax does not have keepdim

pwd
/Users/bbradel/tt-metal/ttnn/cpp/ttnn/operations/reduction/argmax
bbradel@Borys-Bradel's-Mac argmax % grep keep -r * | wc
       0       0       0


pt_out_tensor = golden_tensor
tt_out_tensor = tt_output_tensor_on_device.cpu().to(ttnn.ROW_MAJOR_LAYOUT).to_torch()
comp_pass, comp_out = comparison_funcs.comp_pcc(pt_out_tensor, tt_out_tensor, pcc=0.99)
assert_with_pcc(pt_out_tensor, tt_out_tensor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please compare shapes too. I found wrong shapes at #16922

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I switched to assert_with_pcc:
tests/ttnn/utils_for_testing.py

def assert_with_pcc(expected_pytorch_result, actual_pytorch_result, pcc=0.9999):
    assert list(expected_pytorch_result.shape) == list(
        actual_pytorch_result.shape
    ), f"list(expected_pytorch_result.shape)={list(expected_pytorch_result.shape)} vs list(actual_pytorch_result.shape)={list(actual_pytorch_result.shape)}"

@bbradelTT bbradelTT force-pushed the bbradel-16720_arg branch 3 times, most recently from e2040ce to 8d7c692 Compare January 24, 2025 14:25
@bbradelTT bbradelTT merged commit 83145d2 into main Jan 25, 2025
215 of 217 checks passed
@bbradelTT bbradelTT deleted the bbradel-16720_arg branch January 25, 2025 04:57
tt-rkim pushed a commit that referenced this pull request Jan 26, 2025
@bbradelTT bbradelTT restored the bbradel-16720_arg branch January 26, 2025 23:36
bbradelTT added a commit that referenced this pull request Jan 27, 2025
### Ticket
Link to Github Issue #14898 

Subset of previous PR
(#16989) that caused a hang
in (Single-card) Demo tests and got reverted. Verified that this
pipeline passes for this subset of changes:
https://github.com/tenstorrent/tt-metal/actions/runs/12992459972

### Problem description
- transpose was filling in non-logical areas with default pad value when
called from reduce

### What's changed
- pass in an appropriate pad value for transpose to use
- also mark a method that should only be used by pool to be deprecated
to deter other uses

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12992465641
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
williamlyTT pushed a commit that referenced this pull request Jan 30, 2025
…eric reduce (#16989)

### Ticket
Link to Github Issues #16720 and #14898

### Problem description
- when specify a dim, output tensor of argmax is not one rank smaller
than input
- transpose seems to insert it's own padding, which occurs for the early
tensor dimensions

### What's changed
- for argmax, change shape of output tensor to have the right rank
- for generic reduce, move pad filling to right before the reduce op is
called
- also update tests and add deprecated to another specialized reduce
function

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12956428377
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939182511
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360
in line with main
https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939183976
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
williamlyTT pushed a commit that referenced this pull request Jan 30, 2025
### Ticket
Link to Github Issue #14898 

Subset of previous PR
(#16989) that caused a hang
in (Single-card) Demo tests and got reverted. Verified that this
pipeline passes for this subset of changes:
https://github.com/tenstorrent/tt-metal/actions/runs/12992459972

### Problem description
- transpose was filling in non-logical areas with default pad value when
called from reduce

### What's changed
- pass in an appropriate pad value for transpose to use
- also mark a method that should only be used by pool to be deprecated
to deter other uses

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12992465641
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
yieldthought pushed a commit that referenced this pull request Jan 31, 2025
…eric reduce (#16989)

### Ticket
Link to Github Issues #16720 and #14898

### Problem description
- when specify a dim, output tensor of argmax is not one rank smaller
than input
- transpose seems to insert it's own padding, which occurs for the early
tensor dimensions

### What's changed
- for argmax, change shape of output tensor to have the right rank
- for generic reduce, move pad filling to right before the reduce op is
called
- also update tests and add deprecated to another specialized reduce
function

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12956428377
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939182511
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360
in line with main
https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939183976
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
yieldthought pushed a commit that referenced this pull request Jan 31, 2025
### Ticket
Link to Github Issue #14898 

Subset of previous PR
(#16989) that caused a hang
in (Single-card) Demo tests and got reverted. Verified that this
pipeline passes for this subset of changes:
https://github.com/tenstorrent/tt-metal/actions/runs/12992459972

### Problem description
- transpose was filling in non-logical areas with default pad value when
called from reduce

### What's changed
- pass in an appropriate pad value for transpose to use
- also mark a method that should only be used by pool to be deprecated
to deter other uses

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12992465641
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
hschoi4448 pushed a commit that referenced this pull request Feb 20, 2025
…eric reduce (#16989)

### Ticket
Link to Github Issues #16720 and #14898

### Problem description
- when specify a dim, output tensor of argmax is not one rank smaller
than input
- transpose seems to insert it's own padding, which occurs for the early
tensor dimensions

### What's changed
- for argmax, change shape of output tensor to have the right rank
- for generic reduce, move pad filling to right before the reduce op is
called
- also update tests and add deprecated to another specialized reduce
function

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12956428377
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939182511
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939185477/job/36091291360
in line with main
https://github.com/tenstorrent/tt-metal/actions/runs/12937069874/job/36084729557
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12939183976
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
hschoi4448 pushed a commit that referenced this pull request Feb 20, 2025
### Ticket
Link to Github Issue #14898 

Subset of previous PR
(#16989) that caused a hang
in (Single-card) Demo tests and got reverted. Verified that this
pipeline passes for this subset of changes:
https://github.com/tenstorrent/tt-metal/actions/runs/12992459972

### Problem description
- transpose was filling in non-logical areas with default pad value when
called from reduce

### What's changed
- pass in an appropriate pad value for transpose to use
- also mark a method that should only be used by pool to be deprecated
to deter other uses

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12992465641
- [ ] Blackhole Post commit (if applicable)
- [ ] Model regression CI testing passes (if applicable)
- [ ] Device performance regression CI testing passes (if applicable)
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [x] New/Existing tests provide coverage for changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants