diff --git a/flux/appendices/appendix2.md b/flux/appendices/appendix2.md index 26e2210..e4d2762 100644 --- a/flux/appendices/appendix2.md +++ b/flux/appendices/appendix2.md @@ -5,9 +5,9 @@ release_number: LLNL-WEB-822959 author: Ryan Day, Lawrence Livermore National Laboratory --- -Most Flux commands and their options are described in their respective man pages, which can be accessed via the `man` command or via `flux help`. For example, `man flux-mini` or `flux help mini` will describe the options for the various `flux mini` commands. +Most Flux commands and their options are described in their respective man pages, which can be accessed via the `man` command or via `flux help`. For example, `man flux-run` or `flux help run` will describe the options for the `flux run` command. -The primary source for more general Flux documentation is the Flux [readthedocs](https://flux-framework.readthedocs.io/en/latest/index.html) page. There, you will find documentation on everything from [getting started](https://flux-framework.readthedocs.io/en/latest/quickstart.html#) with Flux to [batch jobs](https://flux-framework.readthedocs.io/en/latest/batch.html) to [administering Flux on clusters](https://flux-framework.readthedocs.io/en/latest/adminguide.html). There are also a number of [usage examples](https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/index.html), descriptions of the [commands and python bindings](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/index.html), and [API and other specifications](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/index.html). +The primary source for more general Flux documentation is the Flux [readthedocs](https://flux-framework.readthedocs.io/en/latest/index.html) page. There, you will find documentation on everything from [getting started](https://flux-framework.readthedocs.io/en/latest/quickstart.html#) with Flux to [batch jobs](https://flux-framework.readthedocs.io/en/latest/jobs/batch.html) to [administering Flux on clusters](https://flux-framework.readthedocs.io/en/latest/guides/admin-guide.html). There are also a number of [usage examples](https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/index.html), descriptions of the [commands](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/index.html) and python bindings](https://flux-framework.readthedocs.io/projects/flux-core/en/latest/python/index.html), and [API and other specifications](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/index.html). Finally, the Flux developers are very active in the HPC community and have made slides from various tutorials that they have given available on [GitHub](https://github.com/flux-framework/Tutorials). diff --git a/flux/exercises/exercise1.md b/flux/exercises/exercise1.md index 5e48b57..e343c37 100644 --- a/flux/exercises/exercise1.md +++ b/flux/exercises/exercise1.md @@ -6,7 +6,7 @@ author: Ryan Day, Lawrence Livermore National Laboratory --- 1. Run `flux resource list` to determine if Flux is already running on your system. -2. If flux is running on the system, use `flux mini alloc` to get a two node allocation. If flux is not running on the system, use allocation commands appropriate to that system to get a two node allocation and start Flux with `flux start`. +2. If flux is running on the system, use `flux alloc` to get a two node allocation. If flux is not running on the system, use allocation commands appropriate to that system to get a two node allocation and start Flux with `flux start`. 3. Use `flux resource list` to query the state of the resources in your allocation. ### Notes / Solutions diff --git a/flux/exercises/exercise5.md b/flux/exercises/exercise5.md index e7102c9..8567448 100644 --- a/flux/exercises/exercise5.md +++ b/flux/exercises/exercise5.md @@ -10,7 +10,7 @@ author: Ryan Day, Lawrence Livermore National Laboratory $ git clone https://github.com/flux-framework/flux-workflow-examples.git $ cd flux-workflow-examples/hierarchical-launching ``` -2. Run the [hierarchical launching](https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/hierarchical-launching/README.html) example. Review the launcher scripts to understand which `flux mini` commands are launching Flux instances and which are not. +2. Run the [hierarchical launching](https://flux-framework.readthedocs.io/projects/flux-workflow-examples/en/latest/hierarchical-launching/README.html) example. Review the launcher scripts to understand which `flux` commands are launching Flux instances and which are not. ### Notes / Solutions 2. This workflow example explicitly includes instructions for getting a Slurm allocation and starting flux. See [Section 1](/flux/section1) for general instructions on getting an allocation in flux or starting flux under Slurm. diff --git a/flux/section1.md b/flux/section1.md index 29b932f..a0cd2d9 100644 --- a/flux/section1.md +++ b/flux/section1.md @@ -12,22 +12,22 @@ Flux is included in the TOSS operating system on LC systems, so should be availa [day36@corona211:~]$ which flux /usr/bin/flux [day36@corona211:~]$ flux --version -commands: 0.42.0 -libflux-core: 0.42.0 -libflux-security: 0.7.0 -build-options: +hwloc==2.8.0+zmq==4.3.4 +commands: 0.48.0 +libflux-core: 0.48.0 +libflux-security: 0.8.0 +build-options: +ascii-only+hwloc==2.8.0+zmq==4.3.4 [day36@corona211:~]$ ``` LC clusters running TOSS 4 are tracking Flux releases closely, but on TOSS 3 or non-LC clusters you may want a newer version. You can install a local build of Flux using `spack` or build it from source. See [Appendix I](/flux/appendices/appendixI) for more details on those options. ### Starting Flux -If you're on an LC cluster such as corona or tioga where Flux is running as the system level scheduler, you can skip this step. You can just use the `flux mini alloc` command to get an interactive allocation or any of the batch commands described in [Section 3](/flux/section3). +If you're on an LC cluster such as corona or tioga where Flux is running as the system level scheduler, you can skip this step. You can just use the `flux alloc` command to get an interactive allocation or any of the batch commands described in [Section 3](/flux/section3). If you are on a cluster that is running another resource manager, such as Slurm or LSF, you can still use Flux to run your workload. You will need to get an allocation using the native resource managers commands (e.g. `salloc`), then start Flux on all of the nodes in that allocation with the `flux start` command. This will start `flux-broker` processes on all of the nodes that will gather information about the hardware resources available and communicate between each other to assign your workload to those resources. On a cluster running Slurm, this will look like: ```console [day36@rzalastor2:~]$ salloc -N2 --exclusive salloc: Granted job allocation 234174 sh-4.2$ srun -N2 -n2 --pty --mpibind=off flux start -sh-4.2$ flux mini run -n 2 hostname +sh-4.2$ flux run -n 2 hostname rzalastor6 rzalastor5 sh-4.2$ diff --git a/flux/section2.md b/flux/section2.md index 9ed26a7..db44e80 100644 --- a/flux/section2.md +++ b/flux/section2.md @@ -8,10 +8,10 @@ author: Ryan Day, Lawrence Livermore National Laboratory In the previous section, we learned how to find flux, get an allocation, and query the compute resources in that allocation. Now, we are ready to launch work on those compute resources and get some work done. When you launch work in Flux, that work takes the form of jobs that can be either blocking or non-blocking. Blocking jobs will run to completion before more work can be submitted, whereas non-blocking jobs are enqueued, allowing you to immediately submit more work in the allocation. Before we get into submitting and managing Flux jobs, we should also discuss Flux's jobids as they're a bit different than what you'll find in other resource management software. Remember that an allocation in Flux is a fully featured Flux instance. Rather than create sequential numeric ids within each instance, Flux combines the submit time, an id, and sequence number to create identifiers that are unique for each job in a Flux instance. This avoids requiring a central allocator for jobids which improves the scalability of job submission within instances. There are options to display these identifiers in a number of ways, but the default is an 8 character string prepended by an `f`, e.g. `fBsFXaow5` for the job submitted in the example below. For more details on Flux's identifiers, see the [FLUID documentation](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_19.html). -### Submit blocking Flux jobs with `flux mini run` -If you want your work to block until it completes, the `flux mini run` command will submit a job and then wait until it is complete before returning. For example, in a two node allocation, we can launch an mpi program with 4 tasks: +### Submit blocking Flux jobs with `flux run` +If you want your work to block until it completes, the `flux run` command will submit a job and then wait until it is complete before returning. For example, in a two node allocation, we can launch an mpi program with 4 tasks: ``` -sh-4.2$ flux mini run -n4 ./mpi_hellosleep +sh-4.2$ flux run -n4 ./mpi_hellosleep task 2 on rzalastor6 going to sleep MASTER: Number of MPI tasks is: 4 task 0 on rzalastor5 going to sleep @@ -23,10 +23,10 @@ task 3 on rzalastor6 woke up task 1 on rzalastor5 woke up sh-4.2$ ``` -### Submit non-blocking Flux jobs with `flux mini submit` -If you just want to queue up work in a Flux instance, the `flux mini submit` command will submit the job and return immediately. As in the example above, here we will submit a 4 task mpi program in our two node allocation: +### Submit non-blocking Flux jobs with `flux submit` +If you just want to queue up work in a Flux instance, the `flux submit` command will submit the job and return immediately. As in the example above, here we will submit a 4 task mpi program in our two node allocation: ``` -sh-4.2$ flux mini submit -n4 --output=job_{{id}}.out ./mpi_hellosleep +sh-4.2$ flux submit -n4 --output=job_{{id}}.out ./mpi_hellosleep fBsFXaow5 sh-4.2$ tail -f job_fBsFXaow5.out MASTER: Number of MPI tasks is: 4 @@ -42,7 +42,7 @@ task 3 on rzalastor6 woke up sh-4.2$ ``` ### Submit dependent jobs with `--dependency=` -If you want to submit a Flux job that won't start until another job has completed or reached some other state, you can add a `--dependency=` flag to any `flux mini` command. Flux currently supports five dependency conditions: +If you want to submit a Flux job that won't start until another job has completed or reached some other state, you can add a `--dependency=` flag to any `flux run` or other job submission command. Flux currently supports five dependency conditions: `--dependency=after:JOBID`  job will not start until JOBID has started @@ -55,26 +55,26 @@ If you want to submit a Flux job that won't start until another job has complete `--dependency=begin-time:TIMESTAMP`  job will not start until TIMESTAMP -These dependency conditions can be used with `flux mini run`, `flux mini submit`, and the `flux mini batch` and `flux mini alloc` commands described in the [Section 3](/flux/section3). +These dependency conditions can be used with `flux run`, `flux submit`, and the `flux batch` and `flux alloc` commands described in the [Section 3](/flux/section3). ### Managing Flux jobs with `flux jobs` and `flux job` If you have multiple Flux jobs running and queued you can list those jobs with the `flux jobs` command, and manage them with `flux job`. For example, in two node instance with 20 cores per node, we can see the states of the job steps that we've submitted as: ``` -sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep +sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep f7AC3114K -sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep +sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep f7As1t7pB -sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep +sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep f7BK65VFu -sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep +sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep f7Bgjnzwy -sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep +sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep f7WUMj9qH -sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep +sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep f7WsybiDu -sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep +sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep f7XUqnpcF -sh-4.2$ flux mini submit -N1 -n10 ./mpi_hellosleep +sh-4.2$ flux submit -N1 -n10 ./mpi_hellosleep f7Y14ABhq sh-4.2$ flux jobs JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST diff --git a/flux/section3.md b/flux/section3.md index 8c73604..98f74f4 100644 --- a/flux/section3.md +++ b/flux/section3.md @@ -6,16 +6,16 @@ author: Ryan Day, Lawrence Livermore National Laboratory --- The reality of computing on shared resources is that nodes are rarely available when you're in front of the keyboard and you need to put your work into a script that can be run by the scheduler when resources become available. Your batch script may mix basic shell commands and functions that will be run serially on the first compute node in your allocation with parallel programs that are run as jobs as described in [Section 2](/flux/section2). -### Submitting a basic job script with `flux mini batch` -The `flux mini batch` command allows you to submit batch scripts to a queue for later execution once resources are available. In the `simplescript.sh` example below, we mix shell commands to log when the job starts and that it has completed with a `flux mini run` command to launch an MPI program. +### Submitting a basic job script with `flux batch` +The `flux batch` command allows you to submit batch scripts to a queue for later execution once resources are available. In the `simplescript.sh` example below, we mix shell commands to log when the job starts and that it has completed with a `flux run` command to launch an MPI program. ``` sh-4.2$ cat simplescript.sh #!/bin/sh date -flux mini run -n4 ./mpi_hellosleep +flux run -n4 ./mpi_hellosleep echo 'job complete' -sh-4.2$ flux mini batch -N2 -n4 ./simplescript.sh +sh-4.2$ flux batch -N2 -n4 ./simplescript.sh fDkrou8xo sh-4.2$ flux jobs JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST @@ -37,24 +37,20 @@ job complete ^C sh-4.2$ ``` -### Making your job script submission self documenting -Many resource managers allow you to put batch submission scripts in your script as comments. For example, in a Slurm sbatch script, you can use `#SBATCH -N 2` in your script to request two nodes in your allocation. Flux does not yet have directly analogous functionality. If you want to include your job submission flags in your script, you can use a [heredoc](https://en.wikipedia.org/wiki/Here_document) to include the `flux mini batch` command in your script and run it directly. +### Adding job submission directives to your batch script +Many resource managers allow you to put batch submission flags in your script as comments. In Flux, you can do this by prepending the flags with `#flux:` in your script. For example, the job script below will run 4 tasks on two nodes. ``` sh-4.2$ cat simplescript.sh #!/bin/sh -flux mini batch \ --N 2 \ --n 4 \ -<<- 'END_OF_SCRIPT' - #!/bin/sh +#flux: -N 2 +#flux: -n 4 - date - flux mini run -n4 ./mpi_hellosleep - echo 'job complete' +date +flux mini run -n4 ./mpi_hellosleep +echo 'job complete' -END_OF_SCRIPT -sh-4.2$ ./simplescript.sh +sh-4.2$ flux batch ./simplescript.sh f3YDA4qqR sh-4.2$ tail -f flux-f3YDA4qqR.out Mon Mar 15 13:45:01 PDT 2021 @@ -71,21 +67,21 @@ job complete ^C sh-4.2$ ``` -### Starting an interactive Flux instance with `flux mini alloc` -When you submit a job script with `flux mini batch`, you are actually starting a new Flux instance with its own resources and running your script in that instance. The `flux mini alloc` command will create a new Flux instance and start an interactive shell in it. +### Starting an interactive Flux instance with `flux alloc` +When you submit a job script with `flux batch`, you are actually starting a new Flux instance with its own resources and running your script in that instance. The `flux alloc` command will create a new Flux instance and start an interactive shell in it. ``` -[day36@corona211:~]$ flux mini alloc -N2 -n2 -t 1h +[day36@corona211:~]$ flux alloc -N2 -n2 -t 1h [day36@corona177:~]$ flux resource list STATE NNODES NCORES NGPUS NODELIST free 2 96 16 corona[177-178] allocated 0 0 0 down 0 0 0 -[day36@corona177:~]$ flux mini run -N2 -n2 hostname +[day36@corona177:~]$ flux run -N2 -n2 hostname corona177 corona178 [day36@corona177:~]$ ``` -Alternatively, you can supply `flux mini alloc` with a command or script and it will run that in a new Flux instance. Unlike `flux mini batch`, `flux mini alloc` will block until the command or script returns and send the standard output and error to the terminal. +Alternatively, you can supply `flux alloc` with a command or script and it will run that in a new Flux instance. Unlike `flux batch`, `flux alloc` will block until the command or script returns and send the standard output and error to the terminal. ``` [day36@corona211:flux_test]$ cat test_batch.sh #!/bin/bash @@ -94,8 +90,8 @@ echo "resources" flux resource list echo "hosts" -flux mini run -N 2 -n 2 hostname -[day36@corona211:flux_test]$ flux mini alloc -N2 -n2 ./test_batch.sh +flux run -N 2 -n 2 hostname +[day36@corona211:flux_test]$ flux alloc -N2 -n2 ./test_batch.sh resources STATE NNODES NCORES NGPUS NODELIST free 2 96 16 corona[177-178] @@ -108,7 +104,7 @@ corona178 [day36@corona211:flux_test] ``` ### Submitting jobs to an existing instance with `flux proxy` -Some user workflows involve getting an allocation (Flux instance) and submitting work to it from outside of that allocation. Flux can accomodate these types of workflows using the `flux proxy` command. You could, for example, create a two node Flux instance with `flux mini alloc -N2 -n96 -t 1d sleep 1d`, then use a `flux proxy` command to submit work in that instance: +Some user workflows involve getting an allocation (Flux instance) and submitting work to it from outside of that allocation. Flux can accomodate these types of workflows using the `flux proxy` command. You could, for example, create a two node Flux instance with `flux alloc -N2 -n96 -t 1d --bg`, then use a `flux proxy` command to submit work in that instance: ``` [day36@corona212:~]$ flux jobs JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST @@ -119,7 +115,7 @@ corona178 [day36@corona212:~]$ ``` ### More user facing batch options -There are a number of things like job queues, qos, modifying jobs, holding jobs, etc that aren't in flux yet, but will be described here once they are. +The Flux job submission commands have many more options for doing things like running on specific nodes or queues, modifying your job environment, specifying task mappings, and more. See, for example, `man flux-run` for details on all of the options available. --- [Section 2](/flux/section2) | Section 3 | [Exercise 3](/flux/exercises/exercise3) | [Section 4](/flux/section4) diff --git a/flux/section4.md b/flux/section4.md index 4291598..2bda61c 100644 --- a/flux/section4.md +++ b/flux/section4.md @@ -9,7 +9,7 @@ The top level Flux system instance on LC systems is configured to store informat ### Viewing completed jobs with `flux jobs -a` By default, the `flux jobs` command only lists your running and pending jobs. You can see all of your jobs, including completed jobs, by adding a `-a` flag. You can also show more information about any jobs with a `-o "FORMAT"` flag. See `man flux-jobs` for a detailed description of how to construct a `"FORMAT"` string and what information may be included in it. ### Submitting jobs to a non-default bank -Some LC clusters have Flux's accounting modules enabled. This allows us to use a mutli-factor priority system that includes a hierarchical fairshare algorithm. Users of these systems may have access to multiple banks. One bank will be set as your default bank, but you can choose to submit jobs using an alternate bank by adding `-o setattr=system.bank=BANK` to your `flux mini` command. +Some LC clusters have Flux's accounting modules enabled. This allows us to use a mutli-factor priority system that includes a hierarchical fairshare algorithm. Users of these systems may have access to multiple banks. One bank will be set as your default bank, but you can choose to submit jobs using an alternate bank by adding `-o setattr=system.bank=BANK` to your `flux run|submit|alloc|batch` command. ##### *More on accounting and usage coming soon* diff --git a/flux/section5.md b/flux/section5.md index abc0251..96c7fd4 100644 --- a/flux/section5.md +++ b/flux/section5.md @@ -5,9 +5,9 @@ release_number: LLNL-WEB-822959 author: Ryan Day, Lawrence Livermore National Laboratory --- -One of the key innovations of Flux is the ability to easily start flux instances within a parent Flux instances. This allows users to create separate allocations on different subsets of their allocated resources and assign different portions of their workflow to those resources. The basic command line interface for Flux has two commands that create new Flux instances, and you've already been using one of them. The `flux mini batch` command described in [section 3](/flux/section3) is actually creating a flux instance that the `flux mini run` commands are running in. Similarly, `flux mini alloc` can be used to create a new instance, but blocks until its work is complete. +One of the key innovations of Flux is the ability to easily start flux instances within a parent Flux instances. This allows users to create separate allocations on different subsets of their allocated resources and assign different portions of their workflow to those resources. The basic command line interface for Flux has two commands that create new Flux instances, and you've already been using one of them. The `flux batch` command described in [section 3](/flux/section3) is actually creating a flux instance that the `flux run` commands are running in. Similarly, `flux alloc` can be used to create a new instance, but blocks until its work is complete. ### Creating allocations inside of an allocation -We can use the `flux resource list` and `flux jobs` commands discussed in [section 1](/flux/section1) and [section 2](/flux/section2) to demonstrate the differences between running in an allocation (`flux mini run` or `flux mini submit`) and creating a new allocation (`flux mini batch` or `flux mini alloc`). We will start with a two node allocation: +We can use the `flux resource list` and `flux jobs` commands discussed in [section 1](/flux/section1) and [section 2](/flux/section2) to demonstrate the differences between running in an allocation (`flux run` or `flux submit`) and creating a new allocation (`flux batch` or `flux alloc`). We will start with a two node allocation: ``` sh-4.2$ flux resource list STATE NNODES NCORES NGPUS NODELIST @@ -18,9 +18,9 @@ sh-4.2$ ``` We can submit work directly to this allocation as discussed previously and see that work with `flux jobs`. Note that the two `sleep` processes ended up on different nodes in the allocation: ``` -sh-4.2$ flux mini submit -n1 sleep 10m +sh-4.2$ flux submit -n1 sleep 10m f4M6c3TKd -sh-4.2$ flux mini submit -n1 sleep 10m +sh-4.2$ flux submit -n1 sleep 10m f4NGSibEo sh-4.2$ flux jobs JOBID USER NAME ST NTASKS NNODES RUNTIME NODELIST @@ -28,12 +28,12 @@ sh-4.2$ flux jobs f4M6c3TKd day36 sleep R 1 1 7.261s rzalastor6 sh-4.2$ ``` -We can also submit batch scripts with `flux mini batch`. These will create new flux instances with different hardware resources available. We will demonstrate this with two batch scripts. `script1.sh` creates an allocation with eight tasks spread across the two nodes of the parent allocation: +We can also submit batch scripts with `flux batch`. These will create new flux instances with different hardware resources available. We will demonstrate this with two batch scripts. `script1.sh` creates an allocation with eight tasks spread across the two nodes of the parent allocation: ``` sh-4.2$ cat script1.sh #!/bin/sh -flux mini batch \ +flux batch \ -N 2 \ -n 8 \ <<- 'END_OF_SCRIPT' @@ -41,8 +41,8 @@ flux mini batch \ date flux resource list - flux mini run -n4 ./mpi_hellosleep & - flux mini run -n4 ./mpi_hellosleep & + flux run -n4 ./mpi_hellosleep & + flux run -n4 ./mpi_hellosleep & sleep 3 flux jobs wait @@ -51,12 +51,12 @@ flux mini batch \ END_OF_SCRIPT sh-4.2$ ``` -In this script, the four tasks of each `flux mini run` will be spread across both nodes. In contrast, `script2.sh` creates an allocation with eight tasks on just one of the nodes of the parent allocation, so all of the tasks from both `flux mini run` commands will be on the same node: +In this script, the four tasks of each `flux run` will be spread across both nodes. In contrast, `script2.sh` creates an allocation with eight tasks on just one of the nodes of the parent allocation, so all of the tasks from both `flux run` commands will be on the same node: ``` sh-4.2$ cat script2.sh #!/bin/sh -flux mini batch \ +flux batch \ -N 1 \ -n 8 \ <<- 'END_OF_SCRIPT' @@ -64,8 +64,8 @@ flux mini batch \ date flux resource list - flux mini run -n4 ./mpi_hellosleep & - flux mini run -n4 ./mpi_hellosleep & + flux run -n4 ./mpi_hellosleep & + flux run -n4 ./mpi_hellosleep & sleep 3 flux jobs wait