Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix[dace][next]: Fix for DistributedBufferRelocator #1799

Conversation

philip-paul-mueller
Copy link
Contributor

This PR fixes an error that was reported by Edoardo (@edopao).
The bug was because the DistributedBufferRelocator transformation did not check if its insertion would create a read-write conflict. This commit adds such a check, that is, however, not very sophisticated and needs some improvements. However, the example /model/atmosphere/dycore/tests/dycore_stencil_tests/test_compute_exner_from_rhotheta.py) where it surfaced, does hold more challenges. The main purpose of this PR is to unblock further development in ICON4Py.

Link to ICON4Py PR: C2SM/icon4py#638

The bug was because the `DistributedBufferRelocator` transformation did not check if its insertion would create a read-write conflict.
This commit adds such a check, that is, however, not very sophisticated and needs some improvements.
However, the example /`model/atmosphere/dycore/tests/dycore_stencil_tests/test_compute_exner_from_rhotheta.py`) where it surfaced, does hold more challenges.
The main purpose of this commit is to unblock further development in ICON4Py.

Link to ICON4Py PR: C2SM/icon4py#638
Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good, only some minor comments.

) -> bool:
"""Tests if read-write conflict would be created for a single location.

Args:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add the Returns section to make clear what True means.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.
Actually when I wrote the thing I was also confused.

target_locations: Location where the new write back should be performed.

Todo:
Refine this checks later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Refine this checks later.
Refine this check later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a grammar but the function actually performs two different checks, so correct would be these.

state_to_inspect: dace.SDFGState = target_location[1]

# This is the access not on which we will perform the write back.
def_location_of_intermediate: dace_nodes.AccessNode = target_location[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine you do not like this:
def_location_of_intermediate, state_to_inspect = *target_location

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically I am not a fan of it, but here it actually makes sense.

if len(access_to_global_data_in_this_state) == 0:
return False

# For the second test we look if `global_data_name` is referred to in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This second case should never appear in the original SDFG, from the lowering. Can it be an artifact of other transformations?

Copy link
Contributor Author

@philip-paul-mueller philip-paul-mueller Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the Exner test case produces this kind of SDFGs.
This was actually also the reason why it is implemented.
But you are right, it is very unlikely that they appear but we should keep them.

@edopao
Copy link
Contributor

edopao commented Jan 15, 2025

Please merge the latest main to avoid the failure in cartesian tests.

Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to rebase once #1801 is merged, because the DaintXC CI is blocked.

It is now refined and also explcitly checks side streams.
It was actually doing that the whole time, just the description was wrong.
@philip-paul-mueller philip-paul-mueller merged commit 1b88276 into GridTools:main Jan 16, 2025
21 checks passed
philip-paul-mueller added a commit to philip-paul-mueller/gt4py that referenced this pull request Jan 22, 2025
…rved in ICON4Py's `TestUpdateThetaAndExner` test.

In essence there was an `assert` that assumed that checked if this temporary was a sink node.
but, the code that finds all write backs was never excluding such cases, i.e. the temporaries that were selected might not be sink nodes in the state where they are defined.
The `assert` was not part of the original implementation and is not a requirement of the transformation, instead it was introduced by [PR#1799](GridTools#1799), that fixed some issues in the analysis of read write dependencies.

There are two solutions for this, either removing the `assert` or prune these kinds of temporaries.
After some consideration, it was realized that handling such cases will not lead to invalid SDFG, as long as the other restrictions on the global data are respected. For that reason the `assert` was removed.
However, we should thinking of doing something more intelligent in that case.
philip-paul-mueller added a commit that referenced this pull request Jan 22, 2025
This PR fixes a bug in `DistributedBufferRelocator` that was observed in
ICON4Py's `TestUpdateThetaAndExner` test.

In essence there was an `assert` that assumed that checked if this
temporary was a sink node, but, the code that finds all write backs was
never excluding such cases, i.e. the temporaries that were selected might
not be sink nodes in the state where they are defined.
The `assert` was not part of the original implementation and is not a
requirement of the transformation, instead it was introduced by
[PR#1799](#1799), that fixed some
issues in the analysis of read write dependencies.

There are two solutions for this, either removing the `assert` or prune
these kinds of temporaries. After some consideration, it was realized
that handling such cases will not lead to invalid SDFG, as long as the
other restrictions on the global data are respected. For that reason the
`assert` was removed.
However, we should thinking of doing something more intelligent in that
case.
edopao pushed a commit to edopao/gt4py that referenced this pull request Jan 22, 2025
This PR fixes a bug in `DistributedBufferRelocator` that was observed in
ICON4Py's `TestUpdateThetaAndExner` test.

In essence there was an `assert` that assumed that checked if this
temporary was a sink node, but, the code that finds all write backs was
never excluding such cases, i.e. the temporaries that were selected might
not be sink nodes in the state where they are defined.
The `assert` was not part of the original implementation and is not a
requirement of the transformation, instead it was introduced by
[PR#1799](GridTools#1799), that fixed some
issues in the analysis of read write dependencies.

There are two solutions for this, either removing the `assert` or prune
these kinds of temporaries. After some consideration, it was realized
that handling such cases will not lead to invalid SDFG, as long as the
other restrictions on the global data are respected. For that reason the
`assert` was removed.
However, we should thinking of doing something more intelligent in that
case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants