forked from chaos/slurm-spank-plugins
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial import of current spank plugins project to googlecode.
- Loading branch information
Showing
73 changed files
with
17,561 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
This work was produced at the Lawrence Livermore National Laboratory | ||
(LLNL) under Contract No. DE-AC52-07NA27344 (Contract 44) between | ||
the U.S. Department of Energy (DOE) and Lawrence Livermore National | ||
Security, LLC (LLNS) for the operation of LLNL. | ||
|
||
This work was prepared as an account of work sponsored by an agency of | ||
the United States Government. Neither the United States Government nor | ||
Lawrence Livermore National Security, LLC nor any of their employees, | ||
makes any warranty, express or implied, or assumes any liability or | ||
responsibility for the accuracy, completeness, or usefulness of any | ||
information, apparatus, product, or process disclosed, or represents | ||
that its use would not infringe privately-owned rights. | ||
|
||
Reference herein to any specific commercial products, process, or | ||
services by trade name, trademark, manufacturer or otherwise does | ||
not necessarily constitute or imply its endorsement, recommendation, | ||
or favoring by the United States Government or Lawrence Livermore | ||
National Security, LLC. The views and opinions of authors expressed | ||
herein do not necessarily state or reflect those of the Untied States | ||
Government or Lawrence Livermore National Security, LLC, and shall | ||
not be used for advertising or product endorsement purposes. | ||
|
||
The precise terms and conditions for copying, distribution, and | ||
modification are specified in the file "COPYING". |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
Name: chaos-spankings | ||
Version: 0.34 | ||
Release: 1 | ||
Author: Mark Grondona <[email protected]> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
|
||
CFLAGS = -Wall -ggdb | ||
|
||
all: renice.so \ | ||
oom-detect.so \ | ||
system-safe-preload.so system-safe.so \ | ||
iotrace.so \ | ||
tmpdir.so \ | ||
auto-affinity.so \ | ||
pty.so \ | ||
addr-no-randomize.so \ | ||
preserve-env.so \ | ||
subdirs | ||
|
||
SUBDIRS = use-env overcommit-memory cpuset | ||
|
||
.SUFFIXES: .c .o .so | ||
|
||
.c.o: | ||
$(CC) $(CFLAGS) -o $@ -fPIC -c $< | ||
.o.so: | ||
$(CC) -shared -o $*.so $< $(LIBS) | ||
|
||
subdirs: | ||
@for d in $(SUBDIRS); do make -C $$d; done | ||
|
||
system-safe-preload.so : system-safe-preload.o | ||
$(CC) -shared -o $*.so $< -ldl | ||
|
||
auto-affinity.so : auto-affinity.o lib/split.o lib/list.o lib/fd.o | ||
$(CC) -shared -o $*.so auto-affinity.o lib/split.o lib/list.o -lslurm | ||
|
||
preserve-env.so : preserve-env.o lib/list.o | ||
$(CC) -shared -o $*.so preserve-env.o lib/list.o | ||
|
||
pty.so : pty.o | ||
$(CC) -shared -o $*.so $< -lutil | ||
|
||
clean: subdirs-clean | ||
rm -f *.so *.o lib/*.o | ||
|
||
subdirs-clean: | ||
@for d in $(SUBDIRS); do make -C $$d clean; done | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
Version 0.34 (2008-09-25): | ||
- auto-affinity: Fix for using auto-affinity module with jobs using | ||
--use-cpusets=task. The auto-affinity module now checks to make sure | ||
CPU mask has not changed in task context, and if so, silently | ||
does nothing. | ||
- preserve-env: New plugin which, when enabled with --preserve-slurm-env | ||
option, will attempt to keep the remote SLURM_* environment variables | ||
the same as in the current context. Useful for invoking | ||
"srun -n1 --pty bash" from within an allocation shell. | ||
|
||
Version 0.33 (2008-09-11): | ||
- Fix for critical locking bug in cpuset plugin. The cpuset plugin | ||
now uses a global lockfile in /var/lock instead of locking files | ||
under /dev/cpuset. | ||
- Fix for generation of SLURM_CMDLINE in use-env plugin. | ||
|
||
Version 0.32 (2008-08-21): | ||
- oom-detect: Optionally log OOM killed jobs via syslog(3), if | ||
the do_syslog parameter is used in plugstack.conf. The syslog | ||
message has the form "slurmd: OOM detected: jobid=JOBID uid=UID" | ||
|
||
Version 0.31 (2008-08-19): | ||
- oom-detect: Delay slightly if an OOM killed process is detected | ||
to give the error message time to make it to srun stderr. | ||
|
||
Version 0.30 (2008-08-04): | ||
- cpuset: Slightly improve config file error messages. | ||
- cpuset: Minor fixes for man pages. | ||
- auto-affinity: Update --auto-affinity=help message. | ||
|
||
Version 0.29 (2008-07-29): | ||
- cpuset: Major overhaul of SLURM cpuset support. Now includes a PAM | ||
module, pam_slurm_cpuset.so, and a global config file in | ||
/etc/slurm/slurm-cpuset.conf. For more information, see the | ||
new manual pages included with the distribution. | ||
- auto-affinity: Do not set CPU affinity by default if the number | ||
of available CPUs is not evenly divisible by the number of tasks. | ||
|
||
Version 0.28 (2008-07-22): | ||
- auto-affinity: Fix error where spank_post_opt hook was incorrectly | ||
run in srun, which caused an immediate error and abort. | ||
|
||
Version 0.27 (2008-07-16): | ||
- cpuset: Expand cpuset support to per-task cpusets via --use-cpusets=tasks. | ||
|
||
Version 0.26 (2008-07-16): | ||
- cpuset: Add support for per-job-step cpusets via the new srun option | ||
'--use-cpusets'. See the README or --use-cpusets=help for more information. | ||
- auto-affinity: Delay detection of current cpuset until after user | ||
option processing in the event that user option changed our cpuset. | ||
|
||
Version 0.25 (2008-07-10): | ||
- cpuset: Added cpuset plugin to constrain jobs to number of CPUs | ||
allocated on shared, but not oversubscribed nodes. | ||
- auto-affinity: Make auto-affinity plugin cpuset-aware. CPU affinity | ||
is assigned as if the job were running on a node the size of the | ||
current cpuset. If cpusets are not enabled, the auto-affinity behavior | ||
is unchanged. | ||
|
||
Version 0.24 (2008-06-10): | ||
- auto-affinity: Query SLURM controller for number of CPUs allocated | ||
to the current job in exclusive_only mode if the environment variable | ||
SLURM_JOB_CPUS_PER_NODE is not set. | ||
|
||
Version 0.23 (2008-06-10): | ||
- auto-affinity: Add 'exclusive_only' flag to auto-affinity plugin | ||
to constrain plugin activity to only those jobs that have exclusive | ||
use of the current node. | ||
|
||
(2008-06-10): | ||
- Started NEWS file. | ||
|
||
$Id: NEWS 7811 2008-09-25 22:21:11Z grondo $ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
SLURM spank plugins README | ||
================================== | ||
|
||
This package includes several SLURM spank plugins developed | ||
at LLNL and used on production compute clusters onsite. A few | ||
of these plugins are only valid when used on LLNL's software | ||
stack (oom-detect.so, for example, requires LLNL-specific patches | ||
to track job's terminated by the OOM killer). However, the | ||
source for all plugins is provided here in the hope that they | ||
might be useful to other plugin developers. The following | ||
is a short description of most of the plugins in this package. | ||
|
||
addr-no-randomize | ||
----------------- | ||
|
||
The addr-no-randomize plugin allows sysadmins to set a default | ||
policy for address space randomization (when supported and | ||
enabled in the Linux kernel), and provides an option for users | ||
to enable/disable randomization on a per-job basis. | ||
|
||
auto-affinity | ||
----------------- | ||
|
||
Automatically assign CPU affinity using best-guess defaults. | ||
|
||
The default behavior of this plugin attempts to accomodate | ||
multi-threaded apps by assigning more than one CPU per task | ||
if the number of tasks running on the node is evenly divisible | ||
into the number of CPUs. Otherwise, CPU affinity is not enabled | ||
unless the cpus_per_task (cpt) option is specified. The default | ||
behavior may be modified using the --auto-affinity options | ||
listed below. Also, the srun(1) --cpu_bind option is processed | ||
after auto-affinity, and thus may be used to override any CPU | ||
affinity settings from this module. | ||
|
||
This plugin should not be used alone on systems using node | ||
sharing. In that case, it should be used along with | ||
the cpuset plugin below (and auto-affinity.so should be listed | ||
*after* cpuset.so in the plugstack.conf). | ||
|
||
cpuset | ||
----------------- | ||
|
||
The cpuset plugin uses Linux cpusets to constrain jobs to the | ||
number of CPUs they have been allocated on nodes. The plugin | ||
is specifically designed for sytems sharing nodes and using CPU | ||
scheduling (i.e. using the select/cons_res plugin). The plugin | ||
will not work on systems where CPUs are oversubscribed to jobs | ||
(i.e. strict node sharing without the use of select/cons_res). | ||
|
||
The plugin also has a pam_slurm_cpuset counterpart, which | ||
replaces pam_slurm and serves an identical functionality, | ||
except that user login sessions are constrained to their | ||
currently allocated CPUs on a node. | ||
|
||
The cpuset plugin requires the SGI libbitmask and libcpuset | ||
libraries available from | ||
|
||
http://oss.sgi.com/projects/cpusets | ||
|
||
(See also cpuset/README) | ||
|
||
iorelay | ||
----------------- | ||
|
||
The iorelay plugin is an experimental proof-of-concept plugin | ||
for remounting required filesystems for a parallel job from | ||
the first allocated node to all others. It is meant to reduce | ||
the load on global NFS servers. | ||
|
||
It has not been used in production. | ||
|
||
|
||
iotrace | ||
----------------- | ||
|
||
The iotrace plugin is another experimental plugin which | ||
uses "plasticfs" to log filesystem access on a per-job | ||
basis. | ||
|
||
|
||
oom-detect | ||
----------------- | ||
|
||
The oom-detect plugin detects jobs that have been victims | ||
of the OOM killer using some special code added to the LLNL | ||
Linux kernel. As tasks exit after having been killed by | ||
the OOM killer, a message is printed to the user's stderr | ||
along with some memory information about the task. | ||
|
||
overcommit-memory | ||
----------------- | ||
|
||
The overcommit-memory plugin is an attempt to allow users | ||
to tune global overcommit behavior of the Linux kernel on | ||
a per-job basis. It is currently buggy and thus not used. | ||
|
||
preserve-env | ||
----------------- | ||
|
||
The preserve-env plugin adds an srun option | ||
|
||
--preserve-slurm-env | ||
|
||
which attempts to preserve the current state of all SLURM_* | ||
environment variables in the remotely executed environment. This | ||
is meant solely to be used from an allocation shell with | ||
the syntax | ||
|
||
srun -n1 -N1 --pty --preserve-slurm-env $SHELL | ||
|
||
as a sort of "remote" allocation shell. | ||
|
||
pty | ||
----------------- | ||
|
||
The pty plugin provides the SLURM --pty option, introduced | ||
in slurm-1.3, for slurm-1.2. It isn't fully functional at this | ||
point, but is a good example of a complex feature added solely | ||
from a spank plugin. | ||
|
||
|
||
renice | ||
----------------- | ||
|
||
The renice plugin is the same as the example code in the | ||
spank(8) man page. It provides a new srun option "--renice=VALUE" | ||
which allows users to set the nice value of their remote | ||
tasks (down to a minimum value configured by sysadmin). | ||
|
||
system-safe | ||
------------------ | ||
|
||
The system-safe plugin provides an MPI-safe system(3) | ||
replacement through an LD_PRELOAD library (most of the work | ||
is done in system-safe-preload.c). The preloaded library | ||
interposes a version of system(3) that does not fork. Instead, | ||
the command line is passed through a pipe to a copy of the | ||
program which was pre-forked before MPI_Init(). The return | ||
value of the real system() call is passed back through the | ||
pipe and returned to the calling application, for which there | ||
is no noticable difference with the real system(3). | ||
|
||
use-env | ||
------------------ | ||
|
||
The use-env plugin allows system administrators and users to | ||
modify the environment of SLURM jobs using a set of simple | ||
yet very flexible config files. Environment variables can | ||
be overridden, set only if unset, set based on conditional | ||
syntax, and even defined in a per-task context. The config | ||
files have access to key slurm variables such as SLURM_NNODES, | ||
SLURM_NPROCS, etc., so variables can even be defined differently | ||
depending of the size of the job. | ||
|
||
See README.use-env for further information. |
Oops, something went wrong.