Skip to content

Latest commit

 

History

History
20 lines (15 loc) · 902 Bytes

ISSUES.md

File metadata and controls

20 lines (15 loc) · 902 Bytes

Issues

In contrast to many optimizations based on polyhedral compilation, whether to partitioning a kernel for execution on multiple GPUs is a binary decision. The write pattern of a kernel into global memory must be proven to be an injective mapping from thread IDs to array elements. The model of read patterns allow more graduation, more accurate read patterns result in communication that is closer to the optimum but even completely random read accesses to global memory allow the kernel to be partitioned in many cases.

The following conditions are required for an application to compile successful:

  • kernels only used in the translation unit they are defined in
  • all translation units containing cuda calls translated with mekcc

The following conditions are required for a kernel to be partioned:

  • injective-write accesses
  • bounded read-accesses
  • all __device__ function inlinable