Skip to content

Latest commit

 

History

History
80 lines (54 loc) · 2.27 KB

README.md

File metadata and controls

80 lines (54 loc) · 2.27 KB

slurm

Definition

SLURM refers to the "Simple Linux Utility for Resource Management" and is a job manager for high performance computing.

Commands

Quick Start

It's sometimes easiest to get started by trying out some commands.

What Jobs are currently running?

qstat

What Jobs am I currently running?

qstat -u username

Launch an interactive session on one node with 16 cores:

qrsh -pe omp 16

Launch a batch job one node with 16 cores:

qsub -pe omp 16 script.sh``

Cancel a batch job

qdel -j jobID

Cancel all my jobs

qdel -u username

Examples

Here are a few more detailed examples

Batch Scripts

Example batch file with directives that reserve one node in the default queue, with 16 cores and exclusive use of the node:

#!/bin/bash
#SBATCH -N 1
#SBATCH -n 16
#SBATCH --time=1:00:00
#SBATCH --exclusive
<<shell commands that set up and run the job>>

Tools

The following tools are useful for interacting or otherwise using SLURM.

  • JobMaker is a small interface that a center can deploy, customized to their slurm.conf. Since the slurm.conf is readable by all nodes, a user can generate the data for the tool equivalently.
  • JobStats makes it easy for users to see status of jobs, and what resources were actually utilized from those requested.
  • doppler is a complementary web application to jobstats that shows users, and account job efficiency/resource wastage.
  • smanage is a tool developed out of Harvard to help with management of job arrays.

References