layout | title | release_number | author | tutorial |
---|---|---|---|---|
tutorial_page |
Thread Stack Size and Thread Binding |
UCRL-MI-133316 |
Blaise Barney, Lawrence Livermore National Laboratory |
OpenMP |
-
The OpenMP standard does not specify how much stack space a thread should have. Consequently, implementations will differ in the default thread stack size.
-
Default thread stack size can be easy to exhaust. It can also be non-portable between compilers. Using past versions of LC compilers as an example:
Compiler | Approx. Stack Limit | Approx. Array Size (doubles) |
---|---|---|
Linux icc, ifort | 4 MB | 700 x 700 |
Linux pgcc, pgf90 | 8 MB | 1000 x 1000 |
Linux gcc, gfortran | 2 MB | 500 x 500 |
-
Threads that exceed their stack allocation may or may not seg fault. An application may continue to run while data is being corrupted.
-
Statically linked codes may be subject to further stack restrictions.
-
A user's login shell may also restrict stack size.
-
If your OpenMP environment supports the OpenMP 3.0 OMP_STACKSIZE environment variable (covered in previous section), you can use it to set the thread stack size prior to program execution. For example:
setenv OMP_STACKSIZE 2000500B
setenv OMP_STACKSIZE "3000 k "
setenv OMP_STACKSIZE 10M
setenv OMP_STACKSIZE " 10 M "
setenv OMP_STACKSIZE "20 m "
setenv OMP_STACKSIZE " 1G"
setenv OMP_STACKSIZE 20000
- Otherwise, at LC, you should be able to use the method below for Linux clusters. The example shows setting the thread stack size to 12 MB, and as a precaution, setting the shell stack size to unlimited.
csh/tcsh | setenv KMP_STACKSIZE 12000000 limit stacksize unlimited |
---|---|
ksh/sh/bash | export KMP_STACKSIZE=12000000 ulimit -s unlimited |
-
In some cases, a program will perform better if its threads are bound to processors/cores.
-
"Binding" a thread to a processor means that a thread will be scheduled by the operating system to always run on a the same processor.
-
Otherwise, threads can be scheduled to execute on any processor and "bounce" back and forth between processors with each time slice.
-
Also called "thread affinity" or "processor affinity"
-
Binding threads to processors can result in better cache utilization, thereby reducing costly memory accesses. This is the primary motivation for binding threads to processors.
-
Depending upon your platform, operating system, compiler and OpenMP implementation, binding threads to processors can be done several different ways.
-
The OpenMP version 3.1 API provides an environment variable to turn processor binding "on" or "off". For example:
setenv OMP_PROC_BIND TRUE
setenv OMP_PROC_BIND FALSE
-
At a higher level, processes can also be bound to processors.
-
Detailed information about process and thread binding to processors on LC Linux clusters can be found HERE.