Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust the README file in cater with users #2440

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Electronic-Waste
Copy link
Member

What this PR does / why we need it:

  • Put a Warning description in the front of Overview section to notify users that the API is in alpha stage
  • Add quick-start installation commands for Kubeflow Trainer and Kubeflow Training Operator V1, so it will be more user-friendly.
  • Adjust the order of some expressions.

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Electronic-Waste
Copy link
Member Author

/cc @kubeflow/wg-training-leads @astefanutti @saileshd1402 @varodrig

@google-oss-prow google-oss-prow bot requested review from varodrig and a team February 16, 2025 14:03
Copy link

@Electronic-Waste: GitHub didn't allow me to request PR reviews from the following users: saileshd1402.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @kubeflow/wg-training-leads @astefanutti @saileshd1402 @varodrig

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment on lines +43 to +44
kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/manager?ref=master"
kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/runtimes?ref=master"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Electronic-Waste Previously, we've been discussing with @rimolive @varodrig and @kubeflow/wg-training-leads that we want to have single source of truth for the Kubeflow Trainer docs, since it is hard to keep all of these docs up-to-date.

Thus, we just redirect users to the Kubeflow website for the installation steps: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the user's side, they want to quickly install Kubeflow Trainer. So, we'd better put the installation guide at the README file to get started quickly like so many other oss including Katib:

And they also have detailed guidance in the website. So, I think it probably a better choice to put some installation commands in the README.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, they need to click "the official Kubeflow documentation" -> "the installation guide" -> scroll down, to see the installation commands. I think the guide is too deep for users. They may prefer a straightforward way and search for the official documentation if the straightforward way does not work.

Copy link
Member

@andreyvelich andreyvelich Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, since we directly link the installation guide from the README: https://github.com/kubeflow/trainer?tab=readme-ov-file#getting-started.
Should we just provide another link to the operator guide from the README: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager ?

Copy link
Member Author

@Electronic-Waste Electronic-Waste Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just provide another link to the operator guide from the README: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager ?

It's a better choice compared to the current one. But from the user's perspective, they prefer install Trainer directly with some simple commands listed in the README, which is more straightforward, and search for the details if they want customized installation (like only installing manager)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we talk about it at the next Training WG call please ?
I want to see what is the right solution for us moving forward.
/hold

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we talk about it at the next Training WG call please ?
I want to see what is the right solution for us moving forward.

Yeah, of course.

README.md Outdated
@@ -11,6 +11,8 @@

## Overview

> **Warning**: Kubeflow Trainer is currently in **alpha** status, and APIs may change. If you want to use stable release of Kubeflow Training Operator V1, please check [this section](#kubeflow-training-operator-v1).
Copy link
Member

@tenzen-y tenzen-y Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> **Warning**: Kubeflow Trainer is currently in **alpha** status, and APIs may change. If you want to use stable release of Kubeflow Training Operator V1, please check [this section](#kubeflow-training-operator-v1).
> [!WARNING]
> Kubeflow Trainer is currently in **alpha** status, and APIs may change. If you want to use stable release of Kubeflow Training Operator V1, please check [this section](#kubeflow-training-operator-v1).

This is a GitHub Markdown way: https://github.com/orgs/community/discussions/16925
Anyway, do we really want to notice this to users? @kubeflow/wg-training-leads @astefanutti
If yes, when can we remove this warning? @Electronic-Waste Do you have a specific schedule for the removal this warning?

Copy link
Member Author

@Electronic-Waste Electronic-Waste Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we are implementing Trainer according to KEP-2170. It's unstable and has not been ready for production. So I think it's necessary to remind users of this. They can use the stable release of Training Operator V1 instead, at least before we release the first stable version of Trainer.

As for the removal timeline, I think probably we can remove this warning when the first stable release of Trainer is ready, maybe v2.0.0. It means that we are production-ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the removal timeline, I think probably we can remove this warning when the first stable release of Trainer is ready, maybe v2.0.0. It means that we are production-ready.

We will cut v2.0.0 after MPI impl. So I'm not sure the reason why we can say production ready in the v2.0.0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can turn the warning into a note. And also rephrase it in a way that says the v2 APIs are still subject to change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the v2.0.0. I mean we can remove this warning when we are production ready.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can turn the warning into a note. And also rephrase it in a way that says the v2 APIs are still subject to change.

+1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the v2.0.0. I mean we can remove this warning when we are production ready.

Here, what is production-ready is a matter. If we do not decide graduation criteria for production ready, we lose the timeline for that, and we continue "this is not production ready, so any kind of break should be fine". And then, users will leave from this projects.

At least, it should be better to define production readiness criteria.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can turn the warning into a note. And also rephrase it in a way that says the v2 APIs are still subject to change.

SGTM.

Copy link
Member Author

@Electronic-Waste Electronic-Waste Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, what is production-ready is a matter.

Yeah, that's true. We should discuss it further in another dedicated issue for graduation criteria. And maybe we could list it as an agenda for the next WG AutoML/Training community call.

/cc @andreyvelich

@Electronic-Waste
Copy link
Member Author

@kubeflow/wg-training-leads @astefanutti I've updated the README file. PTAL if you have time. Thanks!

The preview of new README: https://github.com/kubeflow/trainer/blob/0c44d15d0181fdad249555229171cea21cf298ca/README.md

@tenzen-y
Copy link
Member

@kubeflow/wg-training-leads @astefanutti I've updated the README file. PTAL if you have time. Thanks!

The preview of new README: https://github.com/kubeflow/trainer/blob/0c44d15d0181fdad249555229171cea21cf298ca/README.md

As I described in #2440 (comment), I would say to keep parking here until we decide on the production readiness criteria. The "production" term should be handled carefully.

@Electronic-Waste
Copy link
Member Author

As I described in #2440 (comment), I would say to keep parking here until we decide on the production readiness criteria. The "production" term should be handled carefully.

Thanks for pointing this out. Let's discuss it in the upcoming Training WG Call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants