CUDA compile error on SharedToGlobal1D #1388

edopao · 2023-10-09T13:54:39Z

Describe the bug
The CUDA code generated for the attached SDFG cannot be compiled:

.dacecache/calculate_nabla2_for_w_gpu/src/cuda/calculate_nabla2_for_w_gpu_cuda.cu(95): error: too many arguments for class template "dace::SharedToGlobal1D"


 91                     dace::wcr_fixed<dace::ReductionType::Sum, double>::reduce_atomic(__var_174, *(&__var_228));
 92                 }
 93             }
 94         }
 95         dace::SharedToGlobal1D<double, 4, 1, 1, 1, 1, true>(__var_174, 1, __var_230);
 96 
 97     }

The problem disappears if I enable the template SharedToGlobal1D in copy.cuh which is currently commented out:

    /*
    template <typename T, int BLOCK_WIDTH, int BLOCK_HEIGHT, int BLOCK_DEPTH,
        int COPY_XLEN, int DST_XSTRIDE,
        bool ASYNC>
        static DACE_DFI void SharedToGlobal1D(
            const T *smem, int src_xstride, T *ptr)
    {
        GlobalToShared3D<T, BLOCK_WIDTH, BLOCK_HEIGHT, BLOCK_DEPTH, 1,
            1, COPY_XLEN, 1, 1, DST_XSTRIDE, ASYNC>(
                smem, 1, 1, src_xstride, ptr);
    }
    */

So it seems to me that the lowering to CUDA code does not make use of the right template construct.

To Reproduce
Please load the SDFG using the following program:

import dace
import os

run_on_gpu = True
sdfg_name = "calculate_nabla2_for_w_gpu.sdfg"
path = os.path.join(os.getcwd(), sdfg_name)

sdfg = dace.SDFG.from_file(path)

if run_on_gpu:
    device = dace.DeviceType.GPU
    sdfg._name = f"{sdfg.name}_gpu"
    for _, _, array in sdfg.arrays_recursive():
        if not array.transient:
            array.storage = dace.dtypes.StorageType.GPU_Global
else:
    device = dace.DeviceType.CPU

sdfg.compile(validate=True)

sdfg.zip

The text was updated successfully, but these errors were encountered:

This PR addresses #1388: fix python codegen and `SharedToGlobal1D` template to generate correct code for write without reduction.

edopao mentioned this issue Nov 22, 2023

Fix for CUDA codegen #1442

Merged

edopao self-assigned this Nov 22, 2023

edopao linked a pull request Nov 22, 2023 that will close this issue

Fix for CUDA codegen #1442

Merged

edopao mentioned this issue Dec 5, 2023

test[next][dace]: Enable some GPU tests GridTools/gt4py#1385

Closed

tbennun closed this as completed in #1442 Dec 18, 2023

github-merge-queue bot pushed a commit that referenced this issue Dec 18, 2023

Fix for CUDA codegen (#1442)

7c06755

This PR addresses #1388: fix python codegen and `SharedToGlobal1D` template to generate correct code for write without reduction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA compile error on SharedToGlobal1D #1388

CUDA compile error on SharedToGlobal1D #1388

edopao commented Oct 9, 2023

CUDA compile error on SharedToGlobal1D #1388

CUDA compile error on SharedToGlobal1D #1388

Comments

edopao commented Oct 9, 2023