Skip to content

Commit

Permalink
Change NeMo SDK to NeMo Run (NVIDIA#187)
Browse files Browse the repository at this point in the history
* Pin rapids versions

Signed-off-by: Ryan Wolf <[email protected]>

* Change sdk to run

Signed-off-by: Ryan Wolf <[email protected]>

* Fix sdk references

Signed-off-by: Ryan Wolf <[email protected]>

* Remove pinned version

Signed-off-by: Ryan Wolf <[email protected]>

* Update documentation

Signed-off-by: Ryan Wolf <[email protected]>

---------

Signed-off-by: Ryan Wolf <[email protected]>
  • Loading branch information
ryantwolf authored Aug 7, 2024
1 parent 707089d commit 3d83434
Show file tree
Hide file tree
Showing 5 changed files with 30 additions and 30 deletions.
6 changes: 3 additions & 3 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@
:ref:`NeMo Curator on Kubernetes <data-curator-kubernetes>`
Demonstration of how to run the NeMo Curator on a Dask Cluster deployed on top of Kubernetes

:ref:`NeMo Curator with NeMo SDK <data-curator-nemo-sdk>`
Example of how to use NeMo Curator with NeMo SDK to run on various platforms
:ref:`NeMo Curator with NeMo Run <data-curator-nemo-run>`
Example of how to use NeMo Curator with NeMo Run to run on various platforms

:ref:`Best Practices <data-curator-best-practices>`
A collection of suggestions on how to best use NeMo Curator to curate your dataset
Expand All @@ -58,6 +58,6 @@
personalidentifiableinformationidentificationandremoval.rst
distributeddataclassification.rst
kubernetescurator.rst
nemosdk.rst
nemorun.rst
bestpractices.rst
nextsteps.rst
32 changes: 16 additions & 16 deletions docs/user-guide/nemosdk.rst → docs/user-guide/nemorun.rst
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
.. _data-curator-nemo-sdk:
.. _data-curator-nemo-run:

======================================
NeMo Curator with NeMo SDK
NeMo Curator with NeMo Run
======================================
-----------------------------------------
NeMo SDK
NeMo Run
-----------------------------------------

The NeMo SDK is a general purpose tool for configuring and executing Python functions and scripts acrosss various computing environments.
The NeMo Run is a general purpose tool for configuring and executing Python functions and scripts acrosss various computing environments.
It is used across the NeMo Framework for managing machine learning experiments.
One of the key features of the NeMo SDK is the ability to run code locally or on platforms like SLURM with minimal changes.
One of the key features of the NeMo Run is the ability to run code locally or on platforms like SLURM with minimal changes.

-----------------------------------------
Usage
-----------------------------------------

We recommend getting slightly familiar with NeMo SDK before jumping into this. The documentation can be found here.
We recommend getting slightly familiar with NeMo Run before jumping into this. The documentation can be found here.

Let's walk through the example usage for how you can launch a slurm job using `examples/launch_slurm.py <https://github.com/NVIDIA/NeMo-Curator/blob/main/examples/nemo_sdk/launch_slurm.py>`_.
Let's walk through the example usage for how you can launch a slurm job using `examples/launch_slurm.py <https://github.com/NVIDIA/NeMo-Curator/blob/main/examples/nemo_run/launch_slurm.py>`_.

.. code-block:: python
import nemo_sdk as sdk
from nemo_sdk.core.execution import SlurmExecutor
import nemo_run as run
from nemo_run.core.execution import SlurmExecutor
from nemo_curator.nemo_sdk import SlurmJobConfig
from nemo_curator.nemo_run import SlurmJobConfig
@sdk.factory
@run.factory
def nemo_curator_slurm_executor() -> SlurmExecutor:
"""
Configure the following function with the details of your SLURM cluster
Expand All @@ -43,7 +43,7 @@ Let's walk through the example usage for how you can launch a slurm job using `e
)
First, we need to define a factory that can produce a ``SlurmExecutor``.
This exectuor is where you define all your cluster parameters. Note: NeMo SDK only supports running on SLURM clusters with `Pyxis <https://github.com/NVIDIA/pyxis>`_ right now.
This exectuor is where you define all your cluster parameters. Note: NeMo Run only supports running on SLURM clusters with `Pyxis <https://github.com/NVIDIA/pyxis>`_ right now.
After this, there is the main function

.. code-block:: python
Expand Down Expand Up @@ -80,17 +80,17 @@ We'll highlight a couple of important ones:
.. code-block:: python
executor = sdk.resolve(SlurmExecutor, "nemo_curator_slurm_executor")
with sdk.Experiment("example_nemo_curator_exp", executor=executor) as exp:
executor = run.resolve(SlurmExecutor, "nemo_curator_slurm_executor")
with run.Experiment("example_nemo_curator_exp", executor=executor) as exp:
exp.add(curator_job.to_script(), tail_logs=True)
exp.run(detach=False)
After configuring the job, we can finally run it.
First, we use the sdk to resolve our custom factory.
First, we use the run to resolve our custom factory.
Next, we use it to begin an experiment named "example_nemo_curator_exp" running on our Slurm exectuor.

``exp.add(curator_job.to_script(), tail_logs=True)`` adds the NeMo Curator script to be part of the experiment.
It converts the ``SlurmJobConfig`` to a ``sdk.Script``.
It converts the ``SlurmJobConfig`` to a ``run.Script``.
This ``curator_job.to_script()`` has two important parameters.
* ``add_scheduler_file=True``
* ``add_device=True``
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import nemo_sdk as sdk
from nemo_sdk.core.execution import SlurmExecutor
import nemo_run as run
from nemo_run.core.execution import SlurmExecutor

from nemo_curator.nemo_sdk import SlurmJobConfig
from nemo_curator.nemo_run import SlurmJobConfig


@sdk.factory
@run.factory
def nemo_curator_slurm_executor() -> SlurmExecutor:
"""
Configure the following function with the details of your SLURM cluster
Expand Down Expand Up @@ -46,8 +46,8 @@ def main():
script_command=curator_command,
)

executor = sdk.resolve(SlurmExecutor, "nemo_curator_slurm_executor")
with sdk.Experiment("example_nemo_curator_exp", executor=executor) as exp:
executor = run.resolve(SlurmExecutor, "nemo_curator_slurm_executor")
with run.Experiment("example_nemo_curator_exp", executor=executor) as exp:
exp.add(curator_job.to_script(), tail_logs=True)
exp.run(detach=False)

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,14 @@

from nemo_curator.utils.import_utils import safe_import

sdk = safe_import("nemo_sdk")
run = safe_import("nemo_run")


@dataclass
class SlurmJobConfig:
"""
Configuration for running a NeMo Curator script on a SLURM cluster using
NeMo SDK
NeMo Run
Args:
job_dir: The base directory where all the files related to setting up
Expand Down Expand Up @@ -69,7 +69,7 @@ class SlurmJobConfig:

def to_script(self, add_scheduler_file: bool = True, add_device: bool = True):
"""
Converts to a script object executable by NeMo SDK
Converts to a script object executable by NeMo Run
Args:
add_scheduler_file: Automatically appends a '--scheduler-file' argument to the
script_command where the value is job_dir/logs/scheduler.json. All
Expand All @@ -79,7 +79,7 @@ def to_script(self, add_scheduler_file: bool = True, add_device: bool = True):
where the value is the member variable of device. All scripts included in
NeMo Curator accept and require this argument.
Returns:
A NeMo SDK Script that will intialize a Dask cluster, and run the specified command.
A NeMo Run Script that will intialize a Dask cluster, and run the specified command.
It is designed to be executed on a SLURM cluster
"""
env_vars = self._build_env_vars()
Expand All @@ -94,7 +94,7 @@ def to_script(self, add_scheduler_file: bool = True, add_device: bool = True):
# Surround the command in quotes so the variable gets set properly
env_vars["SCRIPT_COMMAND"] = f"\"{env_vars['SCRIPT_COMMAND']}\""

return sdk.Script(path=self.container_entrypoint, env=env_vars)
return run.Script(path=self.container_entrypoint, env=env_vars)

def _build_env_vars(self) -> Dict[str, str]:
env_vars = vars(self)
Expand Down

0 comments on commit 3d83434

Please sign in to comment.