Skip to content

Commit

Permalink
rabbit: add documentation on rabbit config table
Browse files Browse the repository at this point in the history
Problem: there is no documentation on rabbit configuration with
a config file.

Add it.
  • Loading branch information
jameshcorbett committed Nov 5, 2024
1 parent 82a9cc6 commit 0605fb3
Show file tree
Hide file tree
Showing 3 changed files with 65 additions and 1 deletion.
Binary file modified auto_examples/auto_examples_jupyter.zip
Binary file not shown.
Binary file modified auto_examples/auto_examples_python.zip
Binary file not shown.
66 changes: 65 additions & 1 deletion tutorials/lab/rabbit_config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Configuring Flux with Rabbits

In order for a Flux system instance to be able to allocate
rabbit storage, the ``dws_jobtap.so`` plugin must be loaded.
The plugin can be loaded in a config file like so:
The plugin can be loaded in a config file like so:

.. code-block::
Expand Down Expand Up @@ -48,3 +48,67 @@ For example, in a config file:
[sched-fluxion-resource]
match-format = "rv1"
Rabbit Config Options
---------------------

The ``rabbit`` config table captures site-general policies and options for
Flux's interactions with the rabbits.


**kubeconfig** (string)
(optional) Path to kubeconfig file for Flux to use, ideally with restricted permissions.
This can be left undefined if the file is placed at the path `~flux/.kube/config`
(assuming the `flux` user is the instance owner).

**tc_timeout** (integer)
(optonal) Time in seconds to tolerate a workflow stuck in TransientCondition state
before killing the associated job. Defaults to 10 seconds.

**drain_compute_nodes** (boolean)
(optional) Whether to automatically drain compute nodes that lose PCIe connection
with their rabbit. Defaults to true.

**save_datamovements** (integer)
(optional) Number of `nnfdatamovement` resources to save to jobs' KVS, may be useful for
debugging but too many may degrade performance. Defaults to 0.

**restrict_persistent_creation** (boolean)
(optional) Restrict the creation of persistent file systems to the instance owner
(in most cases the `flux` user).

**policy.maximums** (table)
(optional) The maximum filesystem capacity per node, in GiB, that users may
request. Leave undefined for no limit. See below for an example.

**presets** (table)
(optional) Defines preset #DW strings. May potentially save users time and energy,
allowing them to run, for instance, `flux alloc -N1 -S dw=NAME` rather than
`flux alloc -N1 -S "dw=#DW jobdw ..."` See below for an example.


Example
~~~~~~~

.. code-block:: TOML
[rabbit]
kubeconfig = "/var/flux/.kube/config"
tc_timeout = 600
drain_compute_nodes = true
save_datamovements = 5
restrict_persistent_creation = true
# maximum filesystem capacity per node, in GiB
[rabbit.policy.maximums]
xfs = 1024
gfs2 = 2048
raw = 4096
lustre = 1024
# defines preset #DW strings
[rabbit.presets]
small_xfs = "#DW jobdw type=xfs capacity=100GiB name=smallxfs"
large_lustre = "#DW jobdw type=lustre capacity=50TiB name=largelustre"

0 comments on commit 0605fb3

Please sign in to comment.