-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcompiler_bench.txt
210 lines (138 loc) · 7.76 KB
/
compiler_bench.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
Compiler Benchmarks
===================
Steven K. Baum
0.1, Oct.. 10, 2021: It begins.
:doctype: book
:toc:
:icons:
:numbered!:
Codes that are already instrumented for OpenMP use via pragmas can also be offloaded to GPUs
without further coding.
*Analysis of OpenMP 4.5 Offloading in Implementations* - https://www.osti.gov/servlets/purl/1648853[`https://www.osti.gov/servlets/purl/1648853`]
*Best Practices for OpenMP on NVIDIA GPUs* - https://www.youtube.com/watch?v=9w_2tj2uD4M[`https://www.youtube.com/watch?v=9w_2tj2uD4M`]
== GCC
https://gcc.gnu.org/wiki/Offloading[`https://gcc.gnu.org/wiki/Offloading`]
https://fortran-lang.discourse.group/t/gfortran-gpu-offload-with-openmp/314[`https://fortran-lang.discourse.group/t/gfortran-gpu-offload-with-openmp/314`]
GCC 11 supports OpenMP offloading to Intel MIC, Nvidia PTX and AMD GCN targets.
=====
The main option to control offloading is:
* `foffload=<targets>=<options>`
By default, GCC will build offload images for all offload targets specified in configure with non-target-specific options passed to host compiler. (However, in most Linux distributions: by default, offloading is disabled (executed on the host) and `-foffload=<targets>` is required to compile to enable the offloading to accelerators.) This option is used to control offload targets and options for them. It can be used in a few ways:
* `-foffload=disable` - Tells GCC to disable offload support. Target regions will be run in host fallback mode.
* `-foffload=<targets>` - Tells GCC to build offload images for `<targets>`. They will be built with non-target-specific options passed to host compiler.
* `-foffload=<options>` - Tells GCC to build offload images for all targets specified in configure. They will be built with non-target-specific options passed to host compiler plus <options>.
* `-foffload=<targets>=<options>` - Tells GCC to build offload images for `<targets>`. They will be built with non-target-specific options passed to host compiler plus <options>.
`<targets>` are separated by commas. Several <options> can be specified by separating them by spaces. Options specified by `-foffload` are appended to the end of option set, so in case of option conflicts they have more priority. The `-foffload` flag can be specified several times, and you have to do that to specify different `<options>` for different `<targets>`.
*Note*: You may need to specify `-foffload=-lm` and for Fortran `-foffload=-lgfortran`, if the offloaded code uses math functions or Fortran-library procedures.
*Note*: If you use atomics directly or indirectly, you may need to specify `-foffload=-latomic` or, if only one target needs it, e.g., `-foffload=nvptx-none=-latomic`.
For AMD GCN devices, you have to specify additionally the GPU to be used: `-march=<name>` where name is either `carrizo` or `fiji` (both: third generation) – or the fifth-generation `VEGA` (gfx900 or gfx906). In order to apply this setting to the AMD GCN offloading target, only, and not to the host (`-march=…`) or all other offloading targets (as with `-foffload=-march=…`), use `-foffload=amdgcn-amdhsa=<options>`. For instance: `-foffload=amdgcn-amdhsa="-march=gfx906"`. [NOTE: the target-triplet can be set when building the compiler and might differ between vendors; it can also be, e.g., `amdgcn-unknown-amdhsa`.]
=====
== Intel
*Get Started with OpenMP* Offload to GPU* - https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html[`https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-cpp-fortran-compiler-openmp/top.html`]
https://www.intel.com/content/www/us/en/developer/videos/three-quick-practical-examples-openmp-offload-gpus.html#gs.drs907[`https://www.intel.com/content/www/us/en/developer/videos/three-quick-practical-examples-openmp-offload-gpus.html#gs.drs907`]
=====
The OpenMP* Offload to GPU feature of the Intel® oneAPI DPC++/C++ Compiler and the Intel® Fortran Compiler
(Beta) compiles OpenMP source files for a wide range of accelerators. Only the icx
and ifx compilers support the OpenMP Offload feature.
...
Intel supports two new options:
* `-qopenmp`
* `-fopenmp-targets=spir64`
that support OpenMP and offloading execution on CPU and GPU. The `-qopenmp`
option enables a middle-end that supports the transformation of OpenMP
in LLVM* (but not in the Clang* front-end). The `-fopenmp-targets=spir64`
option enables the compiler to generate a x86 + SPIR64
fat binary for the GPU device binary generation.
=====
=== Compiling and Running an Example
The following code is instrumented for OpenMP via various pragmas:
-----
// matmul.cpp: Matrix Multiplication Example using OpenMP Offloading
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#define MAX 128
int A[MAX][MAX], B[MAX][MAX], C[MAX][MAX], C_SERIAL[MAX][MAX];
typedef int BOOL;
typedef int TYPE;
BOOL check_result(TYPE *actual, TYPE *expected, unsigned n) {
for (unsigned i = 0; i < n; i++) {
if(actual[i] != expected[i]) {
printf("Value mismatch at index = %d. Expected: %d"
", Actual: %d.\n", i, expected[i], actual[i]);
return 0;
}
}
return 1;
}
void __attribute__ ((noinline)) Compute()
{
#pragma omp target teams distribute parallel for map(to: A, B) map(tofrom: C) \
thread_limit(128)
{
for (int i = 0; i < MAX; i++)
for (int j = 0; j < MAX; j++)
for (int k = 0; k < MAX; k++)
C[i][j] += A[i][k] * B[k][j];
}
}
int main() {
for (int i = 0; i < MAX; i++)
for (int j = 0; j < MAX; j++) {
A[i][j] = i + j - 1;
B[i][j] = i - j + 1;
}
for (int i = 0; i < MAX; i++)
for (int j = 0; j < MAX; j++)
for (int k = 0; k < MAX; k++)
C_SERIAL[i][j] += A[i][k] * B[k][j];
Compute();
if (!check_result((int*) &C[0][0], (int*) &C_SERIAL[0][0], MAX * MAX)) {
printf("FAILED\n");
return 1;
}
printf("PASSED\n");
return 0;
}
-----
==== Compiling the Code
This code `matmul_offload.cpp` is compiled for GPU offloading via:
-----
icpx -qopenmp -fopenmp-targets=spir64 matmul_offload.cpp -o matmul
-----
C codes are compiled for this using `icx`, Fortran codes with `ifx`, and Cpp codes with `ifx`.
==== Set the Environment Variable
The `OMP_TARGET_OFFLOAD` environment variable must be set to force offloading or fail via:
-----
export OMP_TARGET_OFFLOAD=MANDATORY
-----
==== Run the Compiled Code
The compiled binary `matmul` is run via:
-----
./matmul
PASSED
-----
and will produced the output `PASSED` if successful.
=== Integration of GPU-Optimized LIBM Functions
The compilers provide a way to choose some GPU-optimized math functions as an alternative
to the standard math libraries.
== NVIDIA
*Thinking OpenMP with HPC Compilers* - https://www.nas.nasa.gov/assets/nas/pdf/ams/2021/AMS_20210504_Ozen.pdf[`https://www.nas.nasa.gov/assets/nas/pdf/ams/2021/AMS_20210504_Ozen.pdf`]
The NVIDIA `nvc++`, `nvc` and `nvfortran` compilers can be enabled for OpenMP and target GPU and
multicure via:
-----
nvc -mp=gpu -gpu=[target] -Minfo=mp
-----
An environmental variable must also be set:
-----
set NVCOMPILER_ACC_NOTIFY 1/2/3
-----
== Clang
https://hpc-wiki.info/hpc/Building_LLVM/Clang_with_OpenMP_Offloading_to_NVIDIA_GPUs[`https://hpc-wiki.info/hpc/Building_LLVM/Clang_with_OpenMP_Offloading_to_NVIDIA_GPUs`]
https://github.com/pc2/OMP-Offloading[`https://github.com/pc2/OMP-Offloading`]
https://llvm.org/docs/CompileCudaWithLLVM.html[`https://llvm.org/docs/CompileCudaWithLLVM.html`]
== AOMP
https://github.com/ROCm-Developer-Tools/aomp[`https://github.com/ROCm-Developer-Tools/aomp`]
=====
AOMP is a scripted build of LLVM and supporting software. It has support for OpenMP target offload on AMD GPUs. Since AOMP is a clang/llvm compiler, it also supports GPU offloading with HIP, CUDA, and OpenCL.
=====