Skip to content

Commit

Permalink
Updated description of teams reduction, and included a teams only red…
Browse files Browse the repository at this point in the history
…uction.
  • Loading branch information
milfeld committed Oct 28, 2023
1 parent 9fd7540 commit c8d0068
Show file tree
Hide file tree
Showing 5 changed files with 124 additions and 91 deletions.
63 changes: 39 additions & 24 deletions OpenMP_gpu/adv/reduction/README_teams_red.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,49 +3,64 @@
Since teams are different contention groups,
reductions across the teams is accomplished
with a reduction clause. Effectively, reduction
across (FOR) the teams works the same was as
a reduction across (FOR) the loop.
across the teams works the same was as
a reduction across the loop.

If there is a parallel for|do in a target region
and teams is a component directive, also
include the reduction clause on the teams.
which performs a reduction, include a
reduction clause (of course). For a combined
```
target teams disbribute parallel for|do
```
it is only necessary to include a reduction clause.
However, if the distribute parallel for|do is a
nested construct of a target teams, then it is
necessary to include a reduction across the teams
(contention groups) and the parallel work-sharing:
```
target teams ... reduction(<op1>,var1)
...
disbribute parallel for|do reduction(<op1>,var1)
for|do var1=var1<op1>var
```

The reduction(.cpp | .F90) code illustrated this
concept with the separate "target teams" and
distribute parallel for|do constructs.
The reduction(.cpp | .F90) code illustrate the
latter situation with the separate "target teams"
and distribute parallel for|do constructs.

1.) Look over the code.

On the target teams construct

a.) map B and C as array sections with a
map type of to. TODO 1a
map type of "to". Just for fun, map all scalar
variables as "tofrom" with the defaultmap clause.TODO 1a

b.) For giggles, use a defaultmap clause
to map all scalars as to from (for the
sum variable). TODO 1b
b.) Include a reduction clause on the target... and distribute...
constructs. TODO 1b

c.) Note, the reduction for the parallel for|do
in the next directive. If necessary,
also include this reduction clause on
the target teams construct. TODO 1c
c.) Also, there is a reduction on sum_teams in the
target teams region (only),
include a reduction clause for it. The latter just check
if the compiler adheres to upper bound of the num_teams(100)
clause. Don't forget to map got_teams as "tofrom". TODO 1c

d.) Compile and run with OpenMP offloading.
Did you get the correct result?

2.) Questions
Note, that the default(tofrom: scalar) will also
map the sum as tofrom, as well as N and i.
map the sum as tofrom, as well as got_teams.

Fortran:
(Assume you set "i" to -1 before the target teams construct.)
What will be the values of N and i after offload?
What will be the values of i after offload?
If you answered "the same as before offload", you are correct.
(i is privatized for the loop indexing with the parallel for|do.
N is not reassigned.)
(i is privatized for the loop indexing with the parallel for|do.)

Would it have been better to not use the default clause,
(thereby allowing N and i to be firstprivate), and map sum as tofrom?
Yes. Binary creation is slighly simpler, and overhead of
transporting and keeping track of mapped objects is slight
less.
Is it necessary to include the defaultmap or explicitly map
sum and sum_teams as "tofrom"? Try removing the defaultmap clause
and just map got_teams as "tofrom". Did it work? Why/Why no?

```
./a.out
33 changes: 21 additions & 12 deletions OpenMP_gpu/adv/reduction/reduction.F90
Original file line number Diff line number Diff line change
@@ -1,19 +1,30 @@
!unverified
function dotprod(B,C,n) result(sum)
real :: B(n), C(n), sum
integer :: n, i
integer :: sum_teams = 0
integer :: got_teams = 0
sum = 0.0e0

!! TODO 1a: map B and C and scalars as "tofrom"
!! TODO 1b-c include reductions for sum and sum_teams.
!$omp target teams num_teams(100) &
!$omp& map... &
!$omp& reduction...

function dotprod(B,C,N) result(sum)
real :: B(N), C(N), sum
integer :: N, i
sum = 0.0e0
!! TODO 1a-c map B and C, make scalars tofrom by default
!! and include a reduction if necessary
!$omp distribute simd reduction...
do i = 1,N
sum = sum + B(i) * C(i)
end do
end function

! Note: The variable sum is now mapped with tofrom from the defaultmap
! clause on the combined target teams construct, for correct
! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro.
sum_teams=sum_teams + 1
if(omp_get_team_num()==0) got_teams=omp_get_num_teams()

write(*,'("sum_teams= ",i4," got_teams=",i4," sum= ",f12.0," (n=",i12,")" )') &
sum_teams,got_teams,sum,n
!$omp end target teams

end function

program main
integer, parameter :: N=1024*1024
Expand All @@ -24,6 +35,4 @@ program main

sum=dotprod(B,C,N)

print*,"N= ", N, ", sum= ",sum

end program main
39 changes: 22 additions & 17 deletions OpenMP_gpu/adv/reduction/reduction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,25 @@

float dotprod(float B[], float C[], int n)
{
float sum = 0.0f;
int i=9;
int k=9;
// TODO 1a-c map B and C, make scalars tofrom by default
// and include a reduction if necessary
for (i=0; i<n; i++) sum += B[i] * C[i];

k=omp_get_team_num();
float sum = 0.0f;
int sum_teams = 0;
int got_teams = 0;

printf("i AF = %d, k= %d\n",i,k);
// TODO 1a: map B and C and scalars as "tofrom"
// TODO 1b-c include reductions for sum and sum_teams.
#pragma omp target teams num_teams(100) \
maps... \
reductions...
{ //TODO 1b
#pragma omp distribute parallel for \
reduction...
for (int i=0; i<n; i++) sum += B[i] * C[i];

sum_teams+=1;
if(omp_get_team_num()==0) got_teams=omp_get_num_teams();
}

printf("sum_teams = %d, got_teams = %d, sum = %f (n = %d)\n",sum_teams,got_teams,sum,n);
return sum;
}

Expand All @@ -22,12 +31,8 @@ int main(){
for(int i=0; i<N; i++){B[i]=1.0f; C[i]=1.0f;}

sum=dotprod(B,C,N);

printf("N= %d, sum= %f\n", N,sum);
}

/* Note: The variable sum is now mapped with tofrom from the defaultmap
clause on the combined target teams construct, for correct
execution with 4.5 (and pre-4.5) compliant compilers.
See Devices Intro.
*/
// Without defaultmap
// #pragma omp target teams num_teams(100) \
// map(to: B[0:n], C[0:n]) map(tofrom:got_teams) \
// reduction(+:sum,sum_teams)
39 changes: 22 additions & 17 deletions OpenMP_gpu/adv/reduction/solutions/ans_reduction.F90
Original file line number Diff line number Diff line change
@@ -1,23 +1,30 @@
!unverified

function dotprod(B,C,N) result(sum)
real :: B(N), C(N), sum
integer :: N, i
sum = 0.0e0
!! TODO 1a-c map B and C, make scalars tofrom by default
!! and include a reduction if necessary
!$omp target teams map(to: B, C) reduction(+:sum) &
!$omp& defaultmap(tofrom:scalar)
!$omp distribute simd reduction(+:sum)

function dotprod(B,C,n) result(sum)
real :: B(n), C(n), sum
integer :: n, i
integer :: sum_teams = 0
integer :: got_teams = 0
sum = 0.0e0

!! TODO 1a: map B and C and scalars as "tofrom"
!! TODO 1b-c include reductions for sum and sum_teams.
!$omp target teams num_teams(100) \
!$omp& map(to: B[0:n], C[0:n]) defaultmap(tofrom:scalar) \
!$omp& reduction(+:sum,sum_teams)

!$omp distribute simd reduction(+:sum)
do i = 1,N
sum = sum + B(i) * C(i)
end do

sum_teams=sum_teams + 1
if(omp_get_team_num()==0) got_teams=omp_get_num_teams()

write(*,'("sum_teams= ",i4," got_teams=",i4," sum= ",f12.0," (n=",i12,")" )') &
sum_teams,got_teams,sum,n
!$omp end target teams
end function

! Note: The variable sum is now mapped with tofrom from the defaultmap
! clause on the combined target teams construct, for correct
! execution with 4.5 (and pre-4.5) compliant compilers. See Devices Intro.
end function

program main
integer, parameter :: N=1024*1024
Expand All @@ -28,6 +35,4 @@ program main

sum=dotprod(B,C,N)

print*,"N= ", N, ", sum= ",sum

end program main
41 changes: 20 additions & 21 deletions OpenMP_gpu/adv/reduction/solutions/ans_reduction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,24 @@

float dotprod(float B[], float C[], int n)
{
float sum = 0.0f;
int i=9;
int k=9;
// TODO 1a-c map B and C, make scalars tofrom by default
// and include a reduction if necessary
#pragma omp target teams map(to: B[0:n], C[0:n]) num_teams(104) \
defaultmap(tofrom:scalar) reduction(+:sum)
{
#pragma omp distribute parallel for reduction(+:sum)
for (i=0; i<n; i++) sum += B[i] * C[i];

k=omp_get_team_num();
float sum = 0.0f;
int sum_teams = 0;
int got_teams = 0;

// TODO 1a: map B and C and scalars as "tofrom"
// TODO 1b-c include reductions for sum and sum_teams.
#pragma omp target teams num_teams(100) \
map(to: B[0:n], C[0:n]) defaultmap(tofrom:scalar) \
reduction(+:sum,sum_teams)
{ //TODO 1b
#pragma omp distribute parallel for reduction(+:sum)
for (int i=0; i<n; i++) sum += B[i] * C[i];

sum_teams+=1;
if(omp_get_team_num()==0) got_teams=omp_get_num_teams();
}

printf("i AF = %d, k= %d\n",i,k);
printf("sum_teams = %d, got_teams = %d, sum = %f (n = %d)\n",sum_teams,got_teams,sum,n);
return sum;
}

Expand All @@ -27,12 +30,8 @@ int main(){
for(int i=0; i<N; i++){B[i]=1.0f; C[i]=1.0f;}

sum=dotprod(B,C,N);

printf("N= %d, sum= %f\n", N,sum);
}

/* Note: The variable sum is now mapped with tofrom from the defaultmap
clause on the combined target teams construct, for correct
execution with 4.5 (and pre-4.5) compliant compilers.
See Devices Intro.
*/
// Without defaultmap
// #pragma omp target teams num_teams(100) \
// map(to: B[0:n], C[0:n]) map(tofrom:got_teams) \
// reduction(+:sum,sum_teams)

0 comments on commit c8d0068

Please sign in to comment.