-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated timer.hpp and timer.f90 to be consistent. Fleshed out time_ad…
…d gpu exercise.
- Loading branch information
Showing
22 changed files
with
598 additions
and
137 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,47 +1,56 @@ | ||
# Class Timer -- collects time measurements and reports them later. | ||
# Class Timer_Collector -- collects time measurements and reports them later. | ||
|
||
## BACKGROUND | ||
Fortran and CPP Timer classes have been created for measuring performance | ||
of code blocks demarked by start() and stop() class member functions. | ||
For each block a literal character string argument for start("my label") | ||
is used as a label when the times are reported with print(). | ||
|
||
See timer.hpp (CPP) and timer.f90 (F90) | ||
|
||
``` | ||
Declare Timer_collector as timer | ||
CPP F90 | ||
timer.start("my label") call timer%start("my label") | ||
<something to be timed> <something to be timed> | ||
timer.stop() call timer%stop() | ||
... ... | ||
timer.print() call timer%print() | ||
See timer.hpp (CPP) and timer.f90 (F90) | ||
``` | ||
## Exercises | ||
|
||
1.) | ||
Look over time_print(.cpp|.f90). The todo's guide you | ||
Look over time_print(.cpp|.f90). The TODOs guide you | ||
through instrumenting the code for timing | ||
|
||
a sleep routine, and | ||
|
||
-- That's just to make sure the timer is working.--- | ||
-- That is just to make sure the timer is working.--- | ||
|
||
a write to stdout (CPP: printf, F90: print*,) | ||
|
||
TODO 1: | ||
Include the Timer class file (CPP: timer.hpp, F90: timer.f90) | ||
Include the Timer_collector class file (CPP: timer.hpp, F90: timer.f90) | ||
at the beginning of the code. | ||
|
||
TODO 2: | ||
Instantiate the class in main as timer: | ||
CPP: Just use "Timer time;" | ||
CPP: Just use "Timer_Collector timer;" | ||
F90: Use the module defined in timer.f90 (mod_timer) | ||
and define timer as type(cls_timer). | ||
and define "timer" as type(Timer_Collector). | ||
|
||
TODO 3: | ||
Time the sleep and print statements: | ||
Put time.start("<label>") and time.stop() before and after | ||
Put call time%start("<label>") and call time%stop() before and after | ||
CPP: Put time.start("my label") and time.stop() before and after | ||
F90: Put call time%start("my label") and call time%stop() before and after | ||
|
||
TODO 4: | ||
Report the times at the end. | ||
|
||
Is the timer reporting reasonable numbers? | ||
What are the units? | ||
How accurate is the timer? | ||
How long did the print statement take? | ||
What is the resolution of the timer. | ||
(Note, the print time is just to write to an io buffer.) | ||
How much time is use in calling the timer? | ||
How would you measure that? | ||
(TODO, fortran needs to report more digits) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,19 @@ | ||
CPP code | ||
|
||
Ft1 solutions $ ./a.out | ||
Hello Helsinki. | ||
|
||
Action :: time/s Time resolution = 1e-06 | ||
------ | ||
1 sec sleep :: 1.00009 | ||
printf io :: 5.7e-05 | ||
|
||
1 sec sleep :: 1.000986 | ||
printf io :: 0.000023 | ||
|
||
Fortran code | ||
|
||
Hello Helsinki. | ||
|
||
Action :: time/s Time resolution = 1.0E-06 | ||
------ | ||
1 sec sleep :: 1.000825 | ||
printf io :: 0.000171 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,45 @@ | ||
# This exercise is incomplete. | ||
# Time Offload kernel for vector add, and data motion | ||
|
||
## Background | ||
|
||
By putting timers around the "target data" directive" | ||
and immediately after the target data region begind,, | ||
the "target" construct, and the termination of the target | ||
data region, one can measure the time required to | ||
|
||
``` | ||
(allocate device data and) move data "to" the device, | ||
execute and kernel, | ||
and move data "from" the device (and deallocate). | ||
``` | ||
|
||
## Exercise | ||
Only C/C++ code is available at this time | ||
1.) Experiment with the target ... directives | ||
executing the addition. | ||
Some different setting are in comments. | ||
STATUS: Makefile works for CPP only at this time. | ||
|
||
1.) Add the start and stop timers around the appropriate | ||
areas, and run with the default (present) "target" | ||
construct to offload the add function. Make sue it works. | ||
See previous examples (basic/worksharing) for including a timer: | ||
TODO 1: | ||
Note, the data motions due to the target data | ||
construct are measured separately. | ||
construct are measured separately at the beginning and end | ||
of the target data region. | ||
|
||
TODO 2: 2a target ENCLOSING function, 2b constructs IN function | ||
Change 1<<25 to 1<<28 for C/C++; 2**25 to 2**28 for Fortran for timing. | ||
|
||
Note: For the "add" procedure, the target and the target teams can be | ||
hoisted outside the function. The different forms are labeled with | ||
F1, F2, F3, and F4. Experiment with (time) the differents forms, | ||
and determine which one works best with defaults, and then experiment | ||
with the number of teams and thread limit for the optimal version. | ||
Which one has the highest performance. | ||
|
||
TODO 3: | ||
Now try running the Loop version (which doesn't | ||
call the function) L1, L2, L3, with various different | ||
"target" directives (number of teams and thread limit). | ||
|
||
Which one has the highest performance. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
include "timer.f90" | ||
|
||
subroutine add(n, x, y) | ||
implicit none | ||
integer :: n,i | ||
real :: x(n),y(n) | ||
|
||
!! TODO 2b: | ||
!F1 !$omp target teams distribute parallel for num_teams(108) num_threads(256) !!1 .4s | ||
!F2 !$omp teams distribute parallel for num_teams(108) num_threads(256) !!2 mins | ||
!F3 !$omp distribute parallel for num_threads(256) !!2 .1s | ||
!$omp distribute parallel for !! num_threads(256) !!2 .1s | ||
do i=1,n; y(i)=x(i)+y(i); enddo | ||
|
||
end subroutine | ||
|
||
program main | ||
use mod_Timer | ||
implicit none | ||
|
||
type(Timer_Collector) timer | ||
|
||
!integer,parameter :: N=2**28 !! use 28 for timing, 1024*1024*2**8 | ||
integer,parameter :: N=2**25 !! use 25 for code validation, 1024*1024*2**5 | ||
real :: x(N), y(N), error | ||
integer :: i | ||
|
||
|
||
do i=1,n; x(i)=1.0e0; y(i)=1.0e0; enddo !init x and y on host | ||
|
||
call timer%start(" Data TO & Alloc "); !! TODO 1 | ||
!$omp target data map(tofrom:y) map(to:x) !allocate and move data | ||
call timer%stop(); !! TODO 1 | ||
|
||
call timer%start(" Add on GPU "); !! TODO 1 | ||
|
||
!! TODO 3: | ||
!! Time Loop on Host | ||
!!#pragma omp target teams distribute parallel for num_teams(108) num_threads(256) | ||
!!do i=1,n; y(i)=x(i)+y(i); enddo | ||
|
||
!! TODO 2a: | ||
!! Time function call with "target" "target teams and num_teams/thread_limit clauses | ||
!F1 -- no construct here !!1 .4s | ||
!F2 #pragma !$omp target !!3 mins | ||
!F3 #pragma !$omp target teams num_teams(108) !!2 .1s | ||
!F4 #pragma !$omp target teams thread_limit(256) !! num_teams(108) !!2 .008s | ||
!$omp target teams thread_limit(512) !! num_teams(108) !!2 | ||
call add(N,x,y); | ||
|
||
call timer%stop(); !! TODO 1 | ||
|
||
call timer%start(" Data FROM & Dealloc"); !! TODO 1 | ||
!$omp end target data | ||
call timer%stop(); !! TODO 1 | ||
|
||
do i=1,N | ||
error = max( 0.0e0, abs(y(i)-2.0e0)) | ||
if ( error > 0.0000001 ) & | ||
print*, " error e10-6 exceeded at ",i," val= ",error | ||
end do | ||
print*, "Max error = ", error, " for count of ", N | ||
|
||
call timer%print(); !! TODO 1 | ||
|
||
end program main |
Oops, something went wrong.