Skip to content

v0.15.0

Compare
Choose a tag to compare
@github-actions github-actions released this 17 Apr 19:54

HyperQueue 0.15.0

Breaking changes

  • NVIDIA GPUs are now automatically detected under the resource name gpus/nvidia, instead of
    just gpus!
    If you have been using the gpus resource name, you should update your scripts.
    See more details below.

New features

Resource management

  • You can now specify more resources for one task, e.g.: 1 cpu and 1 gpu OR 4 cpus. The scheduler considers both configurations in task planning.
    For example let us assume that we have many tasks with the mentioned configuration and worker with 16 cpus and 4 gpus.
    The tasks will fully utilize the node, 4 tasks will run in the configuration with gpu and 3 tasks will run in the cpu only mode.

  • Job Definition File is a TOML file that can define a job.
    It allows to submit complex jobs without using Python API (dependencies, resource variants, ...).

    $ hq job submit-file myfile.toml
  • You can now specify (indexed) resource values provided by workers as strings (previously only
    integers were allowed). Notably, automatic detection of Nvidia GPUs specified with string UUIDs
    now works.

    $ hq worker start --resource="res1=[foo, bar]"
  • HyperQueue now provides built-in support for AMD GPUs. For this reason, the default name of GPU
    resources that are automatically detected on a worker has been changed from gpus to gpus/nvidia
    for NVIDIA GPUs. AMD GPUs are now autodetected as gpus/amd. In the future, we intend to create a way
    to ask for any GPU resource (e.g. --resource=gpus=2), regardless of its type.

  • AMD GPUs are now automatically detected in workers from the environment variable ROCR_VISIBLE_DEVICES.

  • Allowed characters for resource names has been changed. The name now has to begin with an ASCII letter,
    and it can only contain ASCII letters, ASCII digits and the slash (/) symbol. This restriction is
    introduced for better alignment with shells, which typically do not support complicated variable names.
    HQ passes the resource names to executed tasks through environment variables, so it has to take this
    into account. Note that the / symbol in resource name will be normalized to _ when being passed
    to a task.

  • hq task info now shows more information

Changes

Job submission

  • The default path for stdout and stderr files has been changed from %{SUBMIT_DIR}/job-%{JOB_ID}/%{TASK_ID}.[stdout/stderr]
    to %{CWD}/job-%{JOB_ID}/%{TASK_ID}.[stdout/stderr]. Note that the default value for the working
    directory (%{CWD}) is set to the submission directory, so if you have used the defaults before,
    nothing will change for you. Stdout and stderr paths are now also resolved relative to the working
    directory of the given task, not to the submit directory.

Artifact summary:

  • hq-v0.15.0-*: Main HyperQueue build containing the hq binary. Download this archive to
    use HyperQueue from the command line
    .
  • hyperqueue-0.15.0-*: Wheel containing the hyperqueue package with HyperQueue Python
    bindings.