-
Notifications
You must be signed in to change notification settings - Fork 731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust the README file in cater with users #2440
base: master
Are you sure you want to change the base?
Adjust the README file in cater with users #2440
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
… chapter. Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
e836f6d
to
707e7f7
Compare
/cc @kubeflow/wg-training-leads @astefanutti @saileshd1402 @varodrig |
@Electronic-Waste: GitHub didn't allow me to request PR reviews from the following users: saileshd1402. Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/manager?ref=master" | ||
kubectl apply --server-side -k "https://github.com/kubeflow/trainer.git/manifests/overlays/runtimes?ref=master" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Electronic-Waste Previously, we've been discussing with @rimolive @varodrig and @kubeflow/wg-training-leads that we want to have single source of truth for the Kubeflow Trainer docs, since it is hard to keep all of these docs up-to-date.
Thus, we just redirect users to the Kubeflow website for the installation steps: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the user's side, they want to quickly install Kubeflow Trainer. So, we'd better put the installation guide at the README file to get started quickly like so many other oss including Katib:
- https://github.com/pytorch/torchtune?tab=readme-ov-file#installation-%EF%B8%8F
- https://github.com/kubeflow/katib#installation
- https://github.com/kubernetes-sigs/kueue#installation
And they also have detailed guidance in the website. So, I think it probably a better choice to put some installation commands in the README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, they need to click "the official Kubeflow documentation" -> "the installation guide" -> scroll down, to see the installation commands. I think the guide is too deep for users. They may prefer a straightforward way and search for the official documentation if the straightforward way does not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, since we directly link the installation guide from the README: https://github.com/kubeflow/trainer?tab=readme-ov-file#getting-started.
Should we just provide another link to the operator guide from the README: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just provide another link to the operator guide from the README: https://www.kubeflow.org/docs/components/trainer/operator-guides/installation/#installing-the-kubeflow-trainer-controller-manager ?
It's a better choice compared to the current one. But from the user's perspective, they prefer install Trainer directly with some simple commands listed in the README, which is more straightforward, and search for the details if they want customized installation (like only installing manager)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we talk about it at the next Training WG call please ?
I want to see what is the right solution for us moving forward.
/hold
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we talk about it at the next Training WG call please ?
I want to see what is the right solution for us moving forward.
Yeah, of course.
README.md
Outdated
@@ -11,6 +11,8 @@ | |||
|
|||
## Overview | |||
|
|||
> **Warning**: Kubeflow Trainer is currently in **alpha** status, and APIs may change. If you want to use stable release of Kubeflow Training Operator V1, please check [this section](#kubeflow-training-operator-v1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> **Warning**: Kubeflow Trainer is currently in **alpha** status, and APIs may change. If you want to use stable release of Kubeflow Training Operator V1, please check [this section](#kubeflow-training-operator-v1). | |
> [!WARNING] | |
> Kubeflow Trainer is currently in **alpha** status, and APIs may change. If you want to use stable release of Kubeflow Training Operator V1, please check [this section](#kubeflow-training-operator-v1). |
This is a GitHub Markdown way: https://github.com/orgs/community/discussions/16925
Anyway, do we really want to notice this to users? @kubeflow/wg-training-leads @astefanutti
If yes, when can we remove this warning? @Electronic-Waste Do you have a specific schedule for the removal this warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we are implementing Trainer according to KEP-2170. It's unstable and has not been ready for production. So I think it's necessary to remind users of this. They can use the stable release of Training Operator V1 instead, at least before we release the first stable version of Trainer.
As for the removal timeline, I think probably we can remove this warning when the first stable release of Trainer is ready, maybe v2.0.0
. It means that we are production-ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the removal timeline, I think probably we can remove this warning when the first stable release of Trainer is ready, maybe v2.0.0. It means that we are production-ready.
We will cut v2.0.0 after MPI impl. So I'm not sure the reason why we can say production ready in the v2.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can turn the warning into a note. And also rephrase it in a way that says the v2 APIs are still subject to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the v2.0.0. I mean we can remove this warning when we are production ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can turn the warning into a note. And also rephrase it in a way that says the v2 APIs are still subject to change.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the v2.0.0. I mean we can remove this warning when we are production ready.
Here, what is production-ready is a matter. If we do not decide graduation criteria for production ready, we lose the timeline for that, and we continue "this is not production ready, so any kind of break should be fine". And then, users will leave from this projects.
At least, it should be better to define production readiness criteria.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can turn the warning into a note. And also rephrase it in a way that says the v2 APIs are still subject to change.
SGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, what is production-ready is a matter.
Yeah, that's true. We should discuss it further in another dedicated issue for graduation criteria. And maybe we could list it as an agenda for the next WG AutoML/Training community call.
/cc @andreyvelich
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
Signed-off-by: Electronic-Waste <[email protected]>
@kubeflow/wg-training-leads @astefanutti I've updated the README file. PTAL if you have time. Thanks! The preview of new README: https://github.com/kubeflow/trainer/blob/0c44d15d0181fdad249555229171cea21cf298ca/README.md |
As I described in #2440 (comment), I would say to keep parking here until we decide on the production readiness criteria. The "production" term should be handled carefully. |
Thanks for pointing this out. Let's discuss it in the upcoming Training WG Call. |
What this PR does / why we need it:
Overview
section to notify users that the API is in alpha stageWhich issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...
format, will close the issue(s) when PR gets merged):Fixes #
Checklist: