Fix for CUDA codegen #1442

edopao · 2023-11-22T15:45:06Z

This PR addresses #1388: fix python codegen and SharedToGlobal1D template to generate correct code for write without reduction.

Argument to std::ifloor should be double, otherwise invalid result on gpu target.

Use new template for dace::SharedToGlobal1D

After uplift to dace v0.15, one SDFG which was working before started to show compilation errors. The latest DaCe is moving a data access to an inter-state edge. For the data-access, the symbols that define array strides are needed for code generation. The SDFG was validated, before and after the simplify pass, but it did not compile for CPU. When skipping the simplify pass, the compilation did work. The problem has been narrowed down to the scalar-to-symbol promotion, which is moving a data access to an inter-state edge. Then, the method _used_symbols_internal needs to be update to account for data containers, including symbolic shape and strides. This commit contains a unit test to reproduce the issue and verify the proposed fix.

Keep new logic, fix cuda codegen for 1D shared-to-global

dace/codegen/targets/cuda.py

dace/runtime/include/dace/math.h

edopao

Thank you the review, I will address your comments in a new commit.

dace/codegen/targets/cuda.py

dace/runtime/include/dace/math.h

Address review comments on main PR

Add draft of test case

edopao · 2023-12-12T07:16:46Z

@tbennun Test added, please re-review.

tbennun

Thank you. Only minor comments remain

tbennun · 2023-12-12T16:06:11Z

dace/codegen/targets/cuda.py

@@ -1132,10 +1132,22 @@ def _emit_copy(self, state_id, src_node, src_storage, dst_node, dst_storage, dst
                        func=funcname,
                        type=dst_node.desc(sdfg).dtype.ctype,
                        bdims=', '.join(_topy(self._block_dims)),
-                        is_async='true' if state_dfg.out_degree(dst_node) > 0 else 'true',
+                        is_async='true' if state_dfg.out_degree(dst_node) > 0 else 'false',


It should be the other way around (if there is a dependent read after it in the same state, sync).

Correct. I did not pay enough attention to is_async before. Besides correcting the value of this argument, I have also moved the synchronization point in the template function after the thread-level copy (see my last commit on copy.cuh).

dace/codegen/targets/cuda.py

edopao · 2023-12-13T06:45:44Z

@tbennun Thank you for the review. As I commented above, I have done one additional change related to is_async flag.

edopao · 2023-12-18T07:32:51Z

Gentle reminder for @tbennun: please check my last comment and whether you can approve this PR.

edopao added 8 commits November 21, 2023 17:21

[bug] Fix for floordiv codegen

9049210

Argument to std::ifloor should be double, otherwise invalid result on gpu target.

[fix] Fix lowering to CUDA coda

54bf6c1

Use new template for dace::SharedToGlobal1D

[test] Add gpu version of indirection test

880f247

[test] Add missing import for gpu test

47856d8

[test] Minor edit

08a036e

[cuda] Revert SharedToGlobal1D to old codegen

9de8e60

Remove extra changes

0e37b12

edopao linked an issue Nov 22, 2023 that may be closed by this pull request

CUDA compile error on SharedToGlobal1D #1388

Closed

edopao added 4 commits November 23, 2023 09:11

Different solution which keeps new template

87e947a

Keep new logic, fix cuda codegen for 1D shared-to-global

Fix cuda codegen for 1D dynamic copy

12b5321

Merge remote-tracking branch 'origin/master' into bug-gpu-codegen

57f8c02

Fix for broken test

54d0c67

edopao marked this pull request as ready for review November 24, 2023 15:07

edopao added bug Something isn't working codegen labels Nov 30, 2023

edopao requested a review from tbennun November 30, 2023 15:26

tbennun requested changes Nov 30, 2023

View reviewed changes

dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved

dace/codegen/targets/cuda.py Show resolved Hide resolved

dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved

dace/runtime/include/dace/math.h Outdated Show resolved Hide resolved

edopao commented Nov 30, 2023

View reviewed changes

dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved

dace/codegen/targets/cuda.py Show resolved Hide resolved

dace/codegen/targets/cuda.py Outdated Show resolved Hide resolved

dace/runtime/include/dace/math.h Outdated Show resolved Hide resolved

edopao added 4 commits December 5, 2023 12:24

Merge remote-tracking branch 'origin/master' into bug-gpu-codegen-wip

190f075

Apply type-specialization to template for ifloor

721cb32

Address review comments

d53d25b

Merge pull request #2 from edopao/bug-gpu-codegen-wip

fa967ea

Address review comments on main PR

edopao mentioned this pull request Dec 5, 2023

Fix for floordiv on GPU target #1471

Merged

edopao added 4 commits December 5, 2023 22:45

Revert change for ifloor bugfix

140aad4

Add test case for neighbor reduction

39ef8e8

Merge pull request #3 from edopao/bug-gpu-codegen-wip

289fba2

Add draft of test case

Replace init state with edge assignment

12b3cdd

edopao force-pushed the bug-gpu-codegen branch from 6d75df6 to 12b3cdd Compare December 6, 2023 07:49

Update test case for CUDA-codegen (#6)

150b4c4

Merge branch 'spcl:master' into bug-gpu-codegen

acf0e2d

edopao requested a review from tbennun December 12, 2023 07:16

tbennun requested changes Dec 12, 2023

View reviewed changes

edopao added 3 commits December 12, 2023 19:34

Merge remote-tracking branch 'origin/master' into bug-gpu-codegen

d839089

Correction for is_async arg in gpu memory copies

8b99971

Move __syncthreads after thread copy

f5cd14b

edopao requested a review from tbennun December 13, 2023 06:45

tbennun approved these changes Dec 18, 2023

View reviewed changes

Merge branch 'master' into bug-gpu-codegen

0dbaff2

tbennun enabled auto-merge December 18, 2023 15:18

tbennun added this pull request to the merge queue Dec 18, 2023

Merged via the queue into spcl:master with commit 7c06755 Dec 18, 2023
11 checks passed

edopao deleted the bug-gpu-codegen branch December 19, 2023 06:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for CUDA codegen #1442

Fix for CUDA codegen #1442

edopao commented Nov 22, 2023 •

edited

Loading

edopao left a comment

edopao commented Dec 12, 2023

tbennun left a comment

tbennun Dec 12, 2023

edopao Dec 12, 2023

edopao commented Dec 13, 2023

edopao commented Dec 18, 2023

Fix for CUDA codegen #1442

Fix for CUDA codegen #1442

Conversation

edopao commented Nov 22, 2023 • edited Loading

edopao left a comment

Choose a reason for hiding this comment

edopao commented Dec 12, 2023

tbennun left a comment

Choose a reason for hiding this comment

tbennun Dec 12, 2023

Choose a reason for hiding this comment

edopao Dec 12, 2023

Choose a reason for hiding this comment

edopao commented Dec 13, 2023

edopao commented Dec 18, 2023

edopao commented Nov 22, 2023 •

edited

Loading