Skip to content

Commit

Permalink
remove references to flux mini and add batch script pragmas
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanday36 committed Mar 29, 2023
1 parent 09e8403 commit 8d7e122
Show file tree
Hide file tree
Showing 8 changed files with 60 additions and 64 deletions.
4 changes: 2 additions & 2 deletions flux/appendices/appendix2.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ release_number: LLNL-WEB-822959
author: Ryan Day, Lawrence Livermore National Laboratory
---

Most Flux commands and their options are described in their respective man pages, which can be accessed via the `man` command or via `flux help`. For example, `man flux-mini` or `flux help mini` will describe the options for the various `flux mini` commands.
Most Flux commands and their options are described in their respective man pages, which can be accessed via the `man` command or via `flux help`. For example, `man flux-run` or `flux help run` will describe the options for the `flux run` command.

The primary source for more general Flux documentation is the Flux [readthedocs](https://flux-framework.readthedocs.io/en/latest/index.html) page. There, you will find documentation on everything from [getting started](https://flux-framework.readthedocs.io/en/latest/quickstart.html#) with Flux to [batch jobs](https://flux-framework.readthedocs.io/en/latest/batch.html) to [administering Flux on clusters](https://flux-framework.readthedocs.io/en/latest/adminguide.html). There are also a number of [usage examples](https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/index.html), descriptions of the [commands and python bindings](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/index.html), and [API and other specifications](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/index.html).
The primary source for more general Flux documentation is the Flux [readthedocs](https://flux-framework.readthedocs.io/en/latest/index.html) page. There, you will find documentation on everything from [getting started](https://flux-framework.readthedocs.io/en/latest/quickstart.html#) with Flux to [batch jobs](https://flux-framework.readthedocs.io/en/latest/jobs/batch.html) to [administering Flux on clusters](https://flux-framework.readthedocs.io/en/latest/guides/admin-guide.html). There are also a number of [usage examples](https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/index.html), descriptions of the [commands](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/index.html) and python bindings](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/python/index.html), and [API and other specifications](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/index.html).

Finally, the Flux developers are very active in the HPC community and have made slides from various tutorials that they have given available on [GitHub](https://github.com/flux-framework/Tutorials).

Expand Down
2 changes: 1 addition & 1 deletion flux/exercises/exercise1.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ author: Ryan Day, Lawrence Livermore National Laboratory
---

1. Run `flux resource list` to determine if Flux is already running on your system.
2. If flux is running on the system, use `flux mini alloc` to get a two node allocation. If flux is not running on the system, use allocation commands appropriate to that system to get a two node allocation and start Flux with `flux start`.
2. If flux is running on the system, use `flux alloc` to get a two node allocation. If flux is not running on the system, use allocation commands appropriate to that system to get a two node allocation and start Flux with `flux start`.
3. Use `flux resource list` to query the state of the resources in your allocation.

### Notes / Solutions
Expand Down
2 changes: 1 addition & 1 deletion flux/exercises/exercise5.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ author: Ryan Day, Lawrence Livermore National Laboratory
$ git clone https://github.com/flux-framework/flux-workflow-examples.git
$ cd flux-workflow-examples/hierarchical-launching
```
2. Run the [hierarchical launching](https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/hierarchical-launching/README.html) example. Review the launcher scripts to understand which `flux mini` commands are launching Flux instances and which are not.
2. Run the [hierarchical launching](https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/hierarchical-launching/README.html) example. Review the launcher scripts to understand which `flux` commands are launching Flux instances and which are not.

### Notes / Solutions
2. This workflow example explicitly includes instructions for getting a Slurm allocation and starting flux. See [Section 1](/flux/section1) for general instructions on getting an allocation in flux or starting flux under Slurm.
Expand Down
12 changes: 6 additions & 6 deletions flux/section1.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,22 @@ Flux is included in the TOSS operating system on LC systems, so should be availa
[day36@corona211:~]$ which flux
/usr/bin/flux
[day36@corona211:~]$ flux --version
commands: 0.42.0
libflux-core: 0.42.0
libflux-security: 0.7.0
build-options: +hwloc==2.8.0+zmq==4.3.4
commands: 0.48.0
libflux-core: 0.48.0
libflux-security: 0.8.0
build-options: +ascii-only+hwloc==2.8.0+zmq==4.3.4
[day36@corona211:~]$
```
LC clusters running TOSS 4 are tracking Flux releases closely, but on TOSS 3 or non-LC clusters you may want a newer version. You can install a local build of Flux using `spack` or build it from source. See [Appendix I](/flux/appendices/appendixI) for more details on those options.
### Starting Flux
If you're on an LC cluster such as corona or tioga where Flux is running as the system level scheduler, you can skip this step. You can just use the `flux mini alloc` command to get an interactive allocation or any of the batch commands described in [Section 3](/flux/section3).
If you're on an LC cluster such as corona or tioga where Flux is running as the system level scheduler, you can skip this step. You can just use the `flux alloc` command to get an interactive allocation or any of the batch commands described in [Section 3](/flux/section3).

If you are on a cluster that is running another resource manager, such as Slurm or LSF, you can still use Flux to run your workload. You will need to get an allocation using the native resource managers commands (e.g. `salloc`), then start Flux on all of the nodes in that allocation with the `flux start` command. This will start `flux-broker` processes on all of the nodes that will gather information about the hardware resources available and communicate between each other to assign your workload to those resources. On a cluster running Slurm, this will look like:
```console
[day36@rzalastor2:~]$ salloc -N2 --exclusive
salloc: Granted job allocation 234174
sh-4.2$ srun -N2 -n2 --pty --mpibind=off flux start
sh-4.2$ flux mini run -n 2 hostname
sh-4.2$ flux run -n 2 hostname
rzalastor6
rzalastor5
sh-4.2$
Expand Down
32 changes: 16 additions & 16 deletions flux/section2.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@ author: Ryan Day, Lawrence Livermore National Laboratory
In the previous section, we learned how to find flux, get an allocation, and query the compute resources in that allocation. Now, we are ready to launch work on those compute resources and get some work done. When you launch work in Flux, that work takes the form of jobs that can be either blocking or non-blocking. Blocking jobs will run to completion before more work can be submitted, whereas non-blocking jobs are enqueued, allowing you to immediately submit more work in the allocation.

Before we get into submitting and managing Flux jobs, we should also discuss Flux's jobids as they're a bit different than what you'll find in other resource management software. Remember that an allocation in Flux is a fully featured Flux instance. Rather than create sequential numeric ids within each instance, Flux combines the submit time, an id, and sequence number to create identifiers that are unique for each job in a Flux instance. This avoids requiring a central allocator for jobids which improves the scalability of job submission within instances. There are options to display these identifiers in a number of ways, but the default is an 8 character string prepended by an `f`, e.g. `fBsFXaow5` for the job submitted in the example below. For more details on Flux's identifiers, see the [FLUID documentation](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_19.html).
### Submit blocking Flux jobs with `flux mini run`
If you want your work to block until it completes, the `flux mini run` command will submit a job and then wait until it is complete before returning. For example, in a two node allocation, we can launch an mpi program with 4 tasks:
### Submit blocking Flux jobs with `flux run`
If you want your work to block until it completes, the `flux run` command will submit a job and then wait until it is complete before returning. For example, in a two node allocation, we can launch an mpi program with 4 tasks:
```
sh-4.2$ flux mini run -n4 ./mpi_hellosleep
sh-4.2$ flux run -n4 ./mpi_hellosleep
task 2 on rzalastor6 going to sleep
MASTER: Number of MPI tasks is: 4
task 0 on rzalastor5 going to sleep
Expand All @@ -23,10 +23,10 @@ task 3 on rzalastor6 woke up
task 1 on rzalastor5 woke up
sh-4.2$
```
### Submit non-blocking Flux jobs with `flux mini submit`
If you just want to queue up work in a Flux instance, the `flux mini submit` command will submit the job and return immediately. As in the example above, here we will submit a 4 task mpi program in our two node allocation:
### Submit non-blocking Flux jobs with `flux submit`
If you just want to queue up work in a Flux instance, the `flux submit` command will submit the job and return immediately. As in the example above, here we will submit a 4 task mpi program in our two node allocation:
```
sh-4.2$ flux mini submit -n4 --output=job_{{id}}.out ./mpi_hellosleep
sh-4.2$ flux submit -n4 --output=job_{{id}}.out ./mpi_hellosleep
fBsFXaow5
sh-4.2$ tail -f job_fBsFXaow5.out
MASTER: Number of MPI tasks is: 4
Expand All @@ -42,7 +42,7 @@ task 3 on rzalastor6 woke up
sh-4.2$
```
### Submit dependent jobs with `--dependency=`
If you want to submit a Flux job that won't start until another job has completed or reached some other state, you can add a `--dependency=` flag to any `flux mini` command. Flux currently supports five dependency conditions:
If you want to submit a Flux job that won't start until another job has completed or reached some other state, you can add a `--dependency=` flag to any `flux run` or other job submission command. Flux currently supports five dependency conditions:

`--dependency=after:JOBID`
 job will not start until JOBID has started
Expand All @@ -55,26 +55,26 @@ If you want to submit a Flux job that won't start until another job has complete
`--dependency=begin-time:TIMESTAMP`
 job will not start until TIMESTAMP

These dependency conditions can be used with `flux mini run`, `flux mini submit`, and the `flux mini batch` and `flux mini alloc` commands described in the [Section 3](/flux/section3).
These dependency conditions can be used with `flux run`, `flux submit`, and the `flux batch` and `flux alloc` commands described in the [Section 3](/flux/section3).

### Managing Flux jobs with `flux jobs` and `flux job`
If you have multiple Flux jobs running and queued you can list those jobs with the `flux jobs` command, and manage them with `flux job`. For example, in two node instance with 20 cores per node, we can see the states of the job steps that we've submitted as:
```
sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7AC3114K
sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7As1t7pB
sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7BK65VFu
sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7Bgjnzwy
sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7WUMj9qH
sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7WsybiDu
sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7XUqnpcF
sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep
sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep
f7Y14ABhq
sh-4.2$ flux jobs
JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST
Expand Down
46 changes: 21 additions & 25 deletions flux/section3.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,16 @@ author: Ryan Day, Lawrence Livermore National Laboratory
---

The reality of computing on shared resources is that nodes are rarely available when you're in front of the keyboard and you need to put your work into a script that can be run by the scheduler when resources become available. Your batch script may mix basic shell commands and functions that will be run serially on the first compute node in your allocation with parallel programs that are run as jobs as described in [Section 2](/flux/section2).
### Submitting a basic job script with `flux mini batch`
The `flux mini batch` command allows you to submit batch scripts to a queue for later execution once resources are available. In the `simplescript.sh` example below, we mix shell commands to log when the job starts and that it has completed with a `flux mini run` command to launch an MPI program.
### Submitting a basic job script with `flux batch`
The `flux batch` command allows you to submit batch scripts to a queue for later execution once resources are available. In the `simplescript.sh` example below, we mix shell commands to log when the job starts and that it has completed with a `flux run` command to launch an MPI program.
```
sh-4.2$ cat simplescript.sh
#!/bin/sh
date
flux mini run -n4 ./mpi_hellosleep
flux run -n4 ./mpi_hellosleep
echo 'job complete'
sh-4.2$ flux mini batch -N2 -n4 ./simplescript.sh
sh-4.2$ flux batch -N2 -n4 ./simplescript.sh
fDkrou8xo
sh-4.2$ flux jobs
JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST
Expand All @@ -37,24 +37,20 @@ job complete
^C
sh-4.2$
```
### Making your job script submission self documenting
Many resource managers allow you to put batch submission scripts in your script as comments. For example, in a Slurm sbatch script, you can use `#SBATCH -N 2` in your script to request two nodes in your allocation. Flux does not yet have directly analogous functionality. If you want to include your job submission flags in your script, you can use a [heredoc](https://en.wikipedia.org/wiki/Here_document) to include the `flux mini batch` command in your script and run it directly.
### Adding job submission directives to your batch script
Many resource managers allow you to put batch submission flags in your script as comments. In Flux, you can do this by prepending the flags with `#flux:` in your script. For example, the job script below will run 4 tasks on two nodes.
```
sh-4.2$ cat simplescript.sh
#!/bin/sh
flux mini batch \
-N 2 \
-n 4 \
<<- 'END_OF_SCRIPT'
#!/bin/sh
#flux: -N 2
#flux: -n 4
date
flux mini run -n4 ./mpi_hellosleep
echo 'job complete'
date
flux mini run -n4 ./mpi_hellosleep
echo 'job complete'
END_OF_SCRIPT
sh-4.2$ ./simplescript.sh
sh-4.2$ flux batch ./simplescript.sh
f3YDA4qqR
sh-4.2$ tail -f flux-f3YDA4qqR.out
Mon Mar 15 13:45:01 PDT 2021
Expand All @@ -71,21 +67,21 @@ job complete
^C
sh-4.2$
```
### Starting an interactive Flux instance with `flux mini alloc`
When you submit a job script with `flux mini batch`, you are actually starting a new Flux instance with its own resources and running your script in that instance. The `flux mini alloc` command will create a new Flux instance and start an interactive shell in it.
### Starting an interactive Flux instance with `flux alloc`
When you submit a job script with `flux batch`, you are actually starting a new Flux instance with its own resources and running your script in that instance. The `flux alloc` command will create a new Flux instance and start an interactive shell in it.
```
[day36@corona211:~]$ flux mini alloc -N2 -n2 -t 1h
[day36@corona211:~]$ flux alloc -N2 -n2 -t 1h
[day36@corona177:~]$ flux resource list
STATE NNODES NCORES NGPUS NODELIST
free 2 96 16 corona[177-178]
allocated 0 0 0
down 0 0 0
[day36@corona177:~]$ flux mini run -N2 -n2 hostname
[day36@corona177:~]$ flux run -N2 -n2 hostname
corona177
corona178
[day36@corona177:~]$
```
Alternatively, you can supply `flux mini alloc` with a command or script and it will run that in a new Flux instance. Unlike `flux mini batch`, `flux mini alloc` will block until the command or script returns and send the standard output and error to the terminal.
Alternatively, you can supply `flux alloc` with a command or script and it will run that in a new Flux instance. Unlike `flux batch`, `flux alloc` will block until the command or script returns and send the standard output and error to the terminal.
```
[day36@corona211:flux_test]$ cat test_batch.sh
#!/bin/bash
Expand All @@ -94,8 +90,8 @@ echo "resources"
flux resource list
echo "hosts"
flux mini run -N 2 -n 2 hostname
[day36@corona211:flux_test]$ flux mini alloc -N2 -n2 ./test_batch.sh
flux run -N 2 -n 2 hostname
[day36@corona211:flux_test]$ flux alloc -N2 -n2 ./test_batch.sh
resources
STATE NNODES NCORES NGPUS NODELIST
free 2 96 16 corona[177-178]
Expand All @@ -108,7 +104,7 @@ corona178
[day36@corona211:flux_test]
```
### Submitting jobs to an existing instance with `flux proxy`
Some user workflows involve getting an allocation (Flux instance) and submitting work to it from outside of that allocation. Flux can accomodate these types of workflows using the `flux proxy` command. You could, for example, create a two node Flux instance with `flux mini alloc -N2 -n96 -t 1d sleep 1d`, then use a `flux proxy` command to submit work in that instance:
Some user workflows involve getting an allocation (Flux instance) and submitting work to it from outside of that allocation. Flux can accomodate these types of workflows using the `flux proxy` command. You could, for example, create a two node Flux instance with `flux alloc -N2 -n96 -t 1d --bg`, then use a `flux proxy` command to submit work in that instance:
```
[day36@corona212:~]$ flux jobs
JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST
Expand All @@ -119,7 +115,7 @@ corona178
[day36@corona212:~]$
```
### More user facing batch options
There are a number of things like job queues, qos, modifying jobs, holding jobs, etc that aren't in flux yet, but will be described here once they are.
The Flux job submission commands have many more options for doing things like running on specific nodes or queues, modifying your job environment, specifying task mappings, and more. See, for example, `man flux-run` for details on all of the options available.

---
[Section 2](/flux/section2) | Section 3 | [Exercise 3](/flux/exercises/exercise3) | [Section 4](/flux/section4)
Expand Down
2 changes: 1 addition & 1 deletion flux/section4.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The top level Flux system instance on LC systems is configured to store informat
### Viewing completed jobs with `flux jobs -a`
By default, the `flux jobs` command only lists your running and pending jobs. You can see all of your jobs, including completed jobs, by adding a `-a` flag. You can also show more information about any jobs with a `-o "FORMAT"` flag. See `man flux-jobs` for a detailed description of how to construct a `"FORMAT"` string and what information may be included in it.
### Submitting jobs to a non-default bank
Some LC clusters have Flux's accounting modules enabled. This allows us to use a mutli-factor priority system that includes a hierarchical fairshare algorithm. Users of these systems may have access to multiple banks. One bank will be set as your default bank, but you can choose to submit jobs using an alternate bank by adding `-o setattr=system.bank=BANK` to your `flux mini` command.
Some LC clusters have Flux's accounting modules enabled. This allows us to use a mutli-factor priority system that includes a hierarchical fairshare algorithm. Users of these systems may have access to multiple banks. One bank will be set as your default bank, but you can choose to submit jobs using an alternate bank by adding `-o setattr=system.bank=BANK` to your `flux run|submit|alloc|batch` command.

##### *More on accounting and usage coming soon*

Expand Down
Loading

0 comments on commit 8d7e122

Please sign in to comment.