Skip to content

Commit

Permalink
incorporate feedback from Stephen and Dong
Browse files Browse the repository at this point in the history
  • Loading branch information
ryanday36 committed May 26, 2021
1 parent 02ca479 commit 3b6b693
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 20 deletions.
2 changes: 1 addition & 1 deletion flux/exercises/exercise1.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ flux-resource: ERROR: [Errno 2] Unable to connect to Flux: ENOENT: No such file
```
2. See "Starting Flux" in [Section 1](/flux/section1).
3. See "Showing the resources in your Flux allocation" in [Section 1](/flux/section1).
4. The flux-hwloc man page gives the helpful command `flux hwloc topology | lstopo-no-graphics --if xml -i -` for displaying a detailed view of the hardware topology.
4. The flux-hwloc man page gives the helpful command `flux hwloc topology | lstopo-no-graphics --if xml -i -` for displaying a detailed view of the hardware topology (this command may not work with all versions of hwloc).

---
[Introduction](/flux/intro) | [Section 1](/flux/section1) | Exercise 1 | [Section 2](/flux/section2)
Expand Down
37 changes: 19 additions & 18 deletions flux/section1.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,43 +8,44 @@ author: Ryan Day, Lawrence Livermore National Laboratory
Regardless of what resource management software a cluster is running, the first step in running in a multi-user environment is to get an allocation of hardware resources. Once you have an allocation, you can use Flux to manage your workload on those resources. This section will tell you where to find Flux and how to start it in an allocation even if it is not the main resource manager on the cluster that you are running on.
### Finding Flux
Flux is included in the TOSS operating system on LC systems, so should be available in your standard `PATH`. You can check on this with:
```
[day36@fluke108:~]$ which flux
```console
[day36@rzalastor1:~]$ which flux
/usr/bin/flux
[day36@fluke108:~]$ flux --version
commands: 0.22.0
libflux-core: 0.22.0
[day36@rzalastor1:~]$ flux --version
commands: 0.26.0
libflux-core: 0.26.0
libflux-security: 0.4.0
build-options: +hwloc==1.11.0
[day36@fluke108:~]$
[day36@rzalastor1:~]$
```
Flux is under heavy development. At times you may want a version that is newer than the TOSS version, or just ensure that you stay on a consistent version. Builds of Flux are also installed in `/usr/global/tools/flux/` on LC clusters. You can use one of these versions by adding it to your PATH:
```
[day36@fluke108:~]$ export PATH=/usr/global/tools/flux/$SYS_TYPE/flux-c0.18.0-s0.10.0/bin:$PATH
[day36@fluke108:~]$ which flux
/usr/global/tools/flux/toss_3_x86_64_ib/flux-c0.18.0-s0.10.0/bin/flux
[day36@fluke108:~]$ flux --version
```console
[day36@rzalastor2:~]$ export PATH=/usr/global/tools/flux/$SYS_TYPE/default/bin:$PATH
[day36@rzalastor2:~]$ which flux
/usr/global/tools/flux/toss_3_x86_64_ib/default/bin/flux
[day36@rzalastor2:~]$ flux --version
commands: 0.18.0
libflux-core: 0.18.0
build-options: +hwloc==1.11.0
[day36@fluke108:~]$
[day36@rzalastor2:~]$
```
Note that the `default` and `new` links can change as new versions of Flux are released.

If you are not on an LC cluster, and flux is not already installed, or if you're just into that sort of thing, you can also install Flux using `spack` or build it from source. See [Appendix I](/flux/appendices/appendixI) for more details on those options.
### Starting Flux
Even if you are on a cluster that is running another resource manager, such as Slurm or LSF, you can still use Flux to run your workload. You will need to get an allocation, then start Flux on all of the nodes in that allocation with the `flux start` command. This will start `flux-broker` processes on all of the nodes that will gather information about the hardware resources available and communicate between each other to assign your workload to those resources. On a cluster running Slurm, this will look like:
```
```console
[day36@rzalastor2:~]$ salloc -N2 --exclusive
salloc: Granted job allocation 234174
sh-4.2$ srun -N2 -n2 --pty flux start
sh-4.2$ srun -N2 -n2 --pty --mpibind=off flux start
sh-4.2$ flux mini run -n 2 hostname
rzalastor6
rzalastor5
sh-4.2$
```
If you're on a cluster that is running a multi-user Flux instance, getting an allocation with `flux-broker` processes running is even easier. You can just use the `flux mini alloc` command:
```
fill this in when fluke works again
```
The `--mpibind=off` flag affects an LC-specific plugin, and should not be used on non-LC clusters.

If you're on a cluster that is running a multi-user Flux instance, getting an allocation with `flux-broker` processes running is even easier. You can just use the `flux mini alloc` command to get an interactive allocation or any of the batch commands described in [Section 3](/flux/section3).
### Showing the resources in your Flux instance
Flux uses [hwloc](http://manpages.org/hwloc/7) to build an internal model of the hardware available in a Flux instance. You can query this model with `flux hwloc`, or see a view of what resources are allocated and available with `flux resource list`. For example, in the Flux instance started in the previous section, we have two nodes with 20 cores each:
```
Expand Down
2 changes: 1 addition & 1 deletion flux/section2.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ author: Ryan Day, Lawrence Livermore National Laboratory

In the previous section, we learned how to find flux, get an allocation, and query the compute resources in that allocation. Now, we are ready to launch work on those compute resources and get some work done. When you launch work in Flux, that work can be either blocking or non-blocking. Blocking steps will run to completion before more work can be submitted, whereas non-blocking steps are enqueued, allowing you to immediately submit more work in the allocation.

Before we get into submitting and managing job steps, we should also discuss Flux's jobids as they're a bit different than what you'll find in other resource management software. In the introduction to this tutorial, we mentioned that Flux is fully hierarchical. That is, users can launch full flux instances within allocations, then launch more job steps or flux instances within those instances. While this has has many benefits for taking advantage of modern HPC hardware and allowing complex workflows, it also means that the sequential numeric jobids used in traditional resource managers do not match Flux's job model. Flux instead uses hashes of the job parameters and environment, including submit time, to create effectively unique identifiers for each job and job step. There are options to display these identifiers in a number of ways, but the default is an 8 character string prepended by an `f`, e.g. `fBsFXaow5` for the job submitted in the example below. For more details on Flux's identifiers, see the [FLUID documentation](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_19.html).
Before we get into submitting and managing job steps, we should also discuss Flux's jobids as they're a bit different than what you'll find in other resource management software. In the introduction to this tutorial, we mentioned that Flux is fully hierarchical. That is, users can launch full flux instances within allocations, then launch more job steps or flux instances within those instances. While this has has many benefits for taking advantage of modern HPC hardware and allowing complex workflows, it also means that the sequential numeric jobids used in traditional resource managers do not match Flux's job model. Flux instead combines the submit time, an id, and sequence number to create effectively unique identifiers for each job and job step. There are options to display these identifiers in a number of ways, but the default is an 8 character string prepended by an `f`, e.g. `fBsFXaow5` for the job submitted in the example below. For more details on Flux's identifiers, see the [FLUID documentation](https://flux-framework.readthedocs.io/projects/flux-rfc/en/latest/spec_19.html).
### Submit blocking job steps with `flux mini run`
If you want your work to block until it completes, the `flux mini run` command will submit a job step and then wait until the step is complete before returning. For example, in a two node allocation, we can launch an mpi program with 4 tasks:
```
Expand Down

0 comments on commit 3b6b693

Please sign in to comment.