Skip to content

Commit

Permalink
Updated timer.hpp and timer.f90 to be consistent. Fleshed out time_ad…
Browse files Browse the repository at this point in the history
…d gpu exercise.
  • Loading branch information
milfeld committed Oct 21, 2023
1 parent 6d28bba commit 9fd7540
Show file tree
Hide file tree
Showing 22 changed files with 598 additions and 137 deletions.
35 changes: 22 additions & 13 deletions OpenMP_cpu/adv_intro/2_timers/README_timer.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,56 @@
# Class Timer -- collects time measurements and reports them later.
# Class Timer_Collector -- collects time measurements and reports them later.

## BACKGROUND
Fortran and CPP Timer classes have been created for measuring performance
of code blocks demarked by start() and stop() class member functions.
For each block a literal character string argument for start("my label")
is used as a label when the times are reported with print().

See timer.hpp (CPP) and timer.f90 (F90)

```
Declare Timer_collector as timer
CPP F90
timer.start("my label") call timer%start("my label")
<something to be timed> <something to be timed>
timer.stop() call timer%stop()
... ...
timer.print() call timer%print()
See timer.hpp (CPP) and timer.f90 (F90)
```
## Exercises

1.)
Look over time_print(.cpp|.f90). The todo's guide you
Look over time_print(.cpp|.f90). The TODOs guide you
through instrumenting the code for timing

a sleep routine, and

-- That's just to make sure the timer is working.---
-- That is just to make sure the timer is working.---

a write to stdout (CPP: printf, F90: print*,)

TODO 1:
Include the Timer class file (CPP: timer.hpp, F90: timer.f90)
Include the Timer_collector class file (CPP: timer.hpp, F90: timer.f90)
at the beginning of the code.

TODO 2:
Instantiate the class in main as timer:
CPP: Just use "Timer time;"
CPP: Just use "Timer_Collector timer;"
F90: Use the module defined in timer.f90 (mod_timer)
and define timer as type(cls_timer).
and define "timer" as type(Timer_Collector).

TODO 3:
Time the sleep and print statements:
Put time.start("<label>") and time.stop() before and after
Put call time%start("<label>") and call time%stop() before and after
CPP: Put time.start("my label") and time.stop() before and after
F90: Put call time%start("my label") and call time%stop() before and after

TODO 4:
Report the times at the end.

Is the timer reporting reasonable numbers?
What are the units?
How accurate is the timer?
How long did the print statement take?
What is the resolution of the timer.
(Note, the print time is just to write to an io buffer.)
How much time is use in calling the timer?
How would you measure that?
(TODO, fortran needs to report more digits)
14 changes: 11 additions & 3 deletions OpenMP_cpu/adv_intro/2_timers/solutions/ans_output
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
CPP code

Ft1 solutions $ ./a.out
Hello Helsinki.

Action :: time/s Time resolution = 1e-06
------
1 sec sleep :: 1.00009
printf io :: 5.7e-05

1 sec sleep :: 1.000986
printf io :: 0.000023

Fortran code

Hello Helsinki.

Action :: time/s Time resolution = 1.0E-06
------
1 sec sleep :: 1.000825
printf io :: 0.000171

4 changes: 2 additions & 2 deletions OpenMP_cpu/adv_intro/2_timers/solutions/ans_time_print.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
#include "timer.hpp"
int main(){

// TODO 1a: Instantiate a Timer class as timer.
Timer timer;
// TODO 1a: Instantiate a Timer_Collector class as timer.
Timer_Collector timer;

// TODO 1b: Time a 1 second sleep
timer.start(" 1 sec sleep ");
Expand Down
6 changes: 3 additions & 3 deletions OpenMP_cpu/adv_intro/2_timers/solutions/ans_time_print.f90
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@
program main

!! TODO 1a: Use the module in timer.f90
!! and instantiate a type(cls_timer) class
use mod_timer
type(cls_timer) timer
!! and instantiate Timer_Collector (type) class as timer
use mod_Timer
type(Timer_Collector) timer

!! TODO 1b: Time a 1 second sleep
call timer%start(" 1 sec sleep ");
Expand Down
2 changes: 1 addition & 1 deletion OpenMP_cpu/adv_intro/2_timers/time_print.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
// #include ...
int main(){

// TODO 1a: Instantiate a Timer class as timer.
// TODO 1a: Instantiate a Timer_Collector class as timer.
//

// TODO 1b: Time a 1 second sleep
Expand Down
2 changes: 1 addition & 1 deletion OpenMP_cpu/adv_intro/2_timers/time_print.f90
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
program main

!! TODO 1a: Use the module in timer.f90
!! and instantiate a type(cls_timer) class
!! and instantiate Timer_Collector (type) class as timer
!use ...
!type(...) timer

Expand Down
18 changes: 9 additions & 9 deletions OpenMP_cpu/adv_intro/2_timers/timer.f90
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ module mod_timer
private
integer, parameter :: m = 20

type, public :: cls_timer
type, public :: Timer_Collector

private
integer :: n = 0
Expand All @@ -19,13 +19,13 @@ module mod_timer
procedure, public :: stop
procedure, public :: print

endtype cls_timer
endtype Timer_Collector

contains

subroutine reset(this)

class(cls_timer) :: this
class(Timer_Collector) :: this
this%n = 0
this%it = -1
this%c = 'undef'
Expand All @@ -35,7 +35,7 @@ subroutine reset(this)

subroutine start(this, c)

class(cls_timer) :: this
class(Timer_Collector) :: this
character(len=*) :: c
character(len=16) :: c16

Expand All @@ -54,7 +54,7 @@ subroutine start(this, c)

subroutine stop(this)

class(cls_timer) :: this
class(Timer_Collector) :: this

call system_clock(this%it(2,this%n))

Expand All @@ -63,14 +63,14 @@ subroutine stop(this)

subroutine print(this)

class(cls_timer) :: this
class(Timer_Collector) :: this

write (*,*)
write (*,'(a,es7.1)') 'Action :: time/s Time resolution = ', 1./real(this%itr)
write (*,'(a,es8.1)') 'Action :: time/s Time resolution = ', 1./real(this%itr)
write (*,'(a)') '------'
do i=1, this%n
write (*,'(a,a, f7.3)') this%c(i), ' :: ', &
(real(this%it(2,i) - this%it(1,i))) / real(this%itr)
write (*,'(a,a, f12.6)') this%c(i), ' :: ', & !TODO change .6 to .precision
(real(this%it(2,i) - this%it(1,i))) / real(this%itr)
enddo

end subroutine
Expand Down
29 changes: 16 additions & 13 deletions OpenMP_cpu/adv_intro/2_timers/timer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,44 +2,47 @@
#define SIMPLE_TIMER_H

#include <iostream>
#include <iomanip> //setprecision
#include <cmath>
#include <string>
#include <sys/time.h>
using namespace std;

/* simple timer that stores multiple times
and can print them afterwards.
Does not support nested timing calls.
*/
class Timer
class Timer_Collector
{
public:
Timer(): n(0) { }
void start(std::string label)
Timer_Collector(): n(0) { }
void start(string label)
{
if (n < 20) { labels[n] = label; gettimeofday(&times[2*n], NULL); }
else { std::cerr << "No more timers, " << label << " will not be timed." << std::endl; }
else { cerr << "No more timers, " << label << " will not be timed." << endl; }
}

void stop() { gettimeofday(&times[2*n+1], NULL); n++;}
void reset() { n=0; }
void print();
private:
std::string labels[20];
string labels[20];
timeval times[40];
int n;
};

void Timer::print()
void Timer_Collector::print()
{
std::cout << std::endl;
std::cout << "Action :: time/s Time resolution = " << 1.f/(float)CLOCKS_PER_SEC << std::endl;
std::cout << "------" << std::endl;
cout << endl;
cout << "Action :: time/s Time resolution = " << 1.f/(float)CLOCKS_PER_SEC << endl;
cout << "------" << endl;
for (int i=0; i < n; ++i)
{
time_t seconds = times[2*i+1].tv_sec - times[2*i+0].tv_sec;
suseconds_t ms = times[2*i+1].tv_usec - times[2*i+0].tv_usec;
if (ms < 0) { ms += 1000000; seconds--; }
std::cout << labels[i] << " :: " << (float)seconds + ms/1000000.f << std::endl;
time_t seconds = times[2*i+1].tv_sec - times[2*i+0].tv_sec;
suseconds_t us = times[2*i+1].tv_usec - times[2*i+0].tv_usec;
if (us < 0) { us += 1000000; seconds--; }
double secs = (double)seconds + us/1000000.0f;
cout << labels[i] << " :: " << setiosflags(ios::fixed)<<setprecision(6)<< setw(12)<<secs<<endl;
}
}
#endif
47 changes: 41 additions & 6 deletions OpenMP_gpu/adv/time_add/README_time_add.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,45 @@
# This exercise is incomplete.
# Time Offload kernel for vector add, and data motion

## Background

By putting timers around the "target data" directive"
and immediately after the target data region begind,,
the "target" construct, and the termination of the target
data region, one can measure the time required to

```
(allocate device data and) move data "to" the device,
execute and kernel,
and move data "from" the device (and deallocate).
```

## Exercise
Only C/C++ code is available at this time
1.) Experiment with the target ... directives
executing the addition.
Some different setting are in comments.
STATUS: Makefile works for CPP only at this time.

1.) Add the start and stop timers around the appropriate
areas, and run with the default (present) "target"
construct to offload the add function. Make sue it works.
See previous examples (basic/worksharing) for including a timer:
TODO 1:
Note, the data motions due to the target data
construct are measured separately.
construct are measured separately at the beginning and end
of the target data region.

TODO 2: 2a target ENCLOSING function, 2b constructs IN function
Change 1<<25 to 1<<28 for C/C++; 2**25 to 2**28 for Fortran for timing.

Note: For the "add" procedure, the target and the target teams can be
hoisted outside the function. The different forms are labeled with
F1, F2, F3, and F4. Experiment with (time) the differents forms,
and determine which one works best with defaults, and then experiment
with the number of teams and thread limit for the optimal version.
Which one has the highest performance.

TODO 3:
Now try running the Loop version (which doesn't
call the function) L1, L2, L3, with various different
"target" directives (number of teams and thread limit).

Which one has the highest performance.
66 changes: 66 additions & 0 deletions OpenMP_gpu/adv/time_add/add.F90
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
include "timer.f90"

subroutine add(n, x, y)
implicit none
integer :: n,i
real :: x(n),y(n)

!! TODO 2b:
!F1 !$omp target teams distribute parallel for num_teams(108) num_threads(256) !!1 .4s
!F2 !$omp teams distribute parallel for num_teams(108) num_threads(256) !!2 mins
!F3 !$omp distribute parallel for num_threads(256) !!2 .1s
!$omp distribute parallel for !! num_threads(256) !!2 .1s
do i=1,n; y(i)=x(i)+y(i); enddo

end subroutine

program main
use mod_Timer
implicit none

type(Timer_Collector) timer

!integer,parameter :: N=2**28 !! use 28 for timing, 1024*1024*2**8
integer,parameter :: N=2**25 !! use 25 for code validation, 1024*1024*2**5
real :: x(N), y(N), error
integer :: i


do i=1,n; x(i)=1.0e0; y(i)=1.0e0; enddo !init x and y on host

call timer%start(" Data TO & Alloc "); !! TODO 1
!$omp target data map(tofrom:y) map(to:x) !allocate and move data
call timer%stop(); !! TODO 1

call timer%start(" Add on GPU "); !! TODO 1

!! TODO 3:
!! Time Loop on Host
!!#pragma omp target teams distribute parallel for num_teams(108) num_threads(256)
!!do i=1,n; y(i)=x(i)+y(i); enddo

!! TODO 2a:
!! Time function call with "target" "target teams and num_teams/thread_limit clauses
!F1 -- no construct here !!1 .4s
!F2 #pragma !$omp target !!3 mins
!F3 #pragma !$omp target teams num_teams(108) !!2 .1s
!F4 #pragma !$omp target teams thread_limit(256) !! num_teams(108) !!2 .008s
!$omp target teams thread_limit(512) !! num_teams(108) !!2
call add(N,x,y);

call timer%stop(); !! TODO 1

call timer%start(" Data FROM & Dealloc"); !! TODO 1
!$omp end target data
call timer%stop(); !! TODO 1

do i=1,N
error = max( 0.0e0, abs(y(i)-2.0e0))
if ( error > 0.0000001 ) &
print*, " error e10-6 exceeded at ",i," val= ",error
end do
print*, "Max error = ", error, " for count of ", N

call timer%print(); !! TODO 1

end program main
Loading

0 comments on commit 9fd7540

Please sign in to comment.