Skip to content

Latest commit

 

History

History
120 lines (95 loc) · 6.93 KB

rhods-kfp-modelmesh.md

File metadata and controls

120 lines (95 loc) · 6.93 KB

Table of contents

  1. High-level architecture
  2. Getting started
  3. How-To
  4. Troubleshooting
  5. Folder structure
  6. References

High-level architecture

Architecture diagram

Getting started

Prerequisites

  • Red Hat OpenShift Data Science version 1.22 or above.
  • Data Science Pipelines (follow the deployment guide for reference.)

Preparing the environment

Set up S3

Spin up a Minio instance by deploying the manifests/minio/minio.yaml manifest. Once it's running, create the models bucket through its GUI, which is exposed through the Minio route in project minio. The credentials are:

  • login, access key: minio
  • password, secret key: minio123

Update and deploy manifests/odh/demo-pipeline-secret.yaml:

  • s3_endpoint_url: your S3 endpoint URL such as http://s3.openshift-storage.svc.cluster.local
  • s3_accesskey: S3 access key with bucket creation permissions, for example value of AWS_ACCESS_KEY_ID in secret noobaa-admin in project openshift-storage.
  • s3_secret_key: corresponding S3 secret key, for example value of AWS_SECRET_ACCESS_KEY_ID in secret noobaa-admin in project openshift-storage.

Run the pipeline

  • Enter or launch the Elyra KFNBC notebook in the Jupyter spawner page.
  • Clone this repository.
    • Open git client (Git in left toolbar).
    • Select Clone a Repository.
    • Enter the repository URL https://github.com/mamurak/os-mlops.git and select Clone.
    • Authenticate if necessary.
  • Open notebooks/elyra-kfp-onnx-example/model-training.pipeline in the Kubeflow Pipeline Editor.
  • Select Run Pipeline in the top toolbar.
  • Select OK.
  • Monitor pipeline execution in the Kubeflow Pipelines user interface (ds-pipelines-ui route URL) under Runs.

How-To

  • Change the available notebook deployment sizes.

    • Find the odh-dashboard-config object of kind OdhDashboardConfig in project odh-applications.
    • Add or update the spec.notebookSizes property. Check manifests/odh/odh-dashboard-config.yaml for reference.
  • Clone git repositories with JupyterLab.

    • Open git client (Git in left toolbar).
    • Select Clone a Repository.
    • Enter the repository URL and select Clone.
    • Authenticate if necessary.
  • Build and add custom notebook

    • Deploy manifests/odh/images/custom-notebook-is.yaml.
    • Deploy manifests/odh/images/custom-notebook-bc.yaml.
    • Trigger build of the new build config and wait until build finishes.
    • As an ODH admin user, open the Settings tab in the ODH dashboard.
    • Select Notebook Images and Import new image.
    • Add new notebook with repository URL custom-notebook:latest and appropriate metadata.
    • Verify custom notebook integration in the JupyterHub provisioning page. You should be able to provision an instance of the custom notebook that you have defined in the previous step.
  • Add packages to custom notebook image with pinned versions.

    • Within a custom notebook instance, install the package through pip install {your-package}.
    • Note the installed version of the package.
    • Add a new entry in container-images/custom-notebook/requirements.txt with {your-package}=={installed-version}.
    • Trigger a new image build.
    • Once the build is finished, provision a new notebook instance using the custom notebook image. The new package is now available.
  • Create Elyra pipelines within JupyterLab.

    • Open the Launcher (blue plus symbol on top left corner of the frame).
    • Select Kubeflow Pipeline Editor.
    • Drag and drop notebooks from the file browser into the editor.
    • Build a pipeline by connecting the notebooks by drawing lines from the output to input ports of the node representation. Any directed acyclic graph is supported.
    • For each node, update the node properties (right click on node and select Open Properties):
      • Runtime Image: Select the appropriate runtime image containing the runtime dependencies of the notebook.
      • File Dependencies: If the notebook expects a file to be present, add this file dependency here. It must be present in the file system of the notebook instance.
      • Environment Variables: If the notebook expects particular environment variables to be set, you can set them here.
      • Kubernetes Secrets: If you would like to set environment variables through Kubernetes secrets rather than defining them in the Elyra interface explicitly, you can reference the environment variables through the corresponding secrets in this field.
      • Output Files: If the notebook generates files that are needed by downstream pipeline nodes, reference these files here.
    • Save the pipeline (top toolbar).
  • Submit Elyra pipeline to Kubeflow Pipelines backend.

    • Open an existing pipeline within the Elyra pipeline editor.
    • Select Run Pipeline (top toolbar).
    • Select the runtime configuration you have prepared before and click OK.
    • You can now monitor the pipeline execution within the Kubeflow Pipelines GUI under Runs.

Troubleshooting

  • If you don't have cluster admin privileges and you're denied access to the ML Pipelines UI route, you may have to update the oauth proxy arguments in the ds-pipeline-ui deployment. Refer to manifests/odh/ds-pipeline-ui-deployment.yaml for details.

Folder structure

  • manifests: OpenShift deployment artifacts.
  • container-images: dependencies of container builds.
  • notebooks: scripts and sample procedures for ODH component integration.

Noteworthy notebook examples in the notebooks folder:

  • starburst-odf examples: Examples of how to interact with S3 buckets and Starburst Enterprise Platform from a Jupyter notebook.
  • elyra examples: A sample pipeline demonstrating how to create artifacts that can be visualized in Kubeflow Pipelines.
  • batch pipeline example: A sample pipeline demonstrating parallel batch processing with Seldon.
  • pipeline example: A sample end-to-end ML pipeline including feature extraction, model training and model deployment through GitOps.

References