Skip to content
This repository has been archived by the owner on Mar 16, 2024. It is now read-only.

Add resources field to computeclass #2380 #2384

Merged
merged 5 commits into from
Jan 3, 2024

Conversation

dciangot
Copy link
Contributor

The edited version of the computeclass will allow to specify any kind of resources allowed by K8s corev1.ResourceRequirements.
In principle that should be enough to enable admins to create computeclass with GPU and accelarators in general.

NOTES/QUESTIONS

  • The quota management of these resources is postponed for later times as per @cjellick suggestion, since it will take some more thoughts.
  • To avoid conflicts with the "first class" fields for memory and cpu scaling the client should rise an error if the admin tries to push a complute class where memory/cpu limits/requests are passed into the "resources" field. I have an integration test ready for this, but commented out since I'm failing to find the proper spot in the codebase where to implement such a check.

Checklist

  • The title of this PR would make a good line in Acorn's Release Note's Changelog
  • The title of this PR ends with a link to the main issue being address in parentheses, like: This is a title (#1216). Here's an example
  • All relevant issues are referenced in the PR description. NOTE: don't use GitHub keywords that auto-close issues
  • Commits follow contributing guidance
  • Automated tests added to cover the changes. If tests couldn't be added, an explanation is provided in the Verification and Testing section
  • Changes to user-facing functionality, API, CLI, and upgrade impacts are clearly called out in PR description
  • PR has at least two approvals before merging (or a reasonable exception, like it's just a docs change)

@cjellick
Copy link
Member

Context is this slack thread: https://acorn-users.slack.com/archives/C03R9ME0SKC/p1702567598185389

@ibuildthecloud effectively blessed the approach conceptually

Copy link
Contributor

@thedadams thedadams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution!

Additional to my comments, I believe you also need to check in the changes generated from running make generate. The CI would have caught that had it run.

Comment on lines 165 to 184
// // Raise error to avoid conflicts with
// // the "first class" fields for memory and cpu scaling
// {
// name: "invalid-custom-resources-limits",
// resources: corev1.ResourceRequirements{
// Limits: corev1.ResourceList{
// "cpu": resource.MustParse("1"),
// },
// },
// fail: true,
// },
// {
// name: "invalid-custom-resources-requests",
// resources: corev1.ResourceRequirements{
// Requests: corev1.ResourceList{
// "memory": resource.MustParse("1"),
// },
// },
// fail: true,
// },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, it was there because I tried to raise errors for the admin when specifying CPU and Memory in the resources field instead that on the Memory and CPU Scale fields. I don't see an easy/useful way to enforce it, since the definition of compute classes does not pass through the acorn client afaiu.

If you have any suggestion I'll give it a try, otherwise I'll go ahead removing this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, and put into the docs the description of the behavior in case of memory/cpu passed in this field

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @dciangot. Back from the holiday break. Now that I am properly rested, I better understand what you are saying here. If you would like to add validation for this scenario such that the above (now removed) integration test passes, you would need to add some validation logic here and here.

Adding validation logic in both places would ensure that an admin wouldn't be able to set the memory and cpu on the resources field at the API level.

Would you like to take the time to add this validation now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure no problem. I'll add the validation then.

pkg/controller/scheduling/scheduling.go Outdated Show resolved Hide resolved
pkg/controller/scheduling/scheduling.go Outdated Show resolved Hide resolved
@dciangot
Copy link
Contributor Author

dciangot commented Dec 20, 2023

Thank you for your contribution!

Additional to my comments, I believe you also need to check in the changes generated from running make generate. The CI would have caught that had it run.

Thank you @thedadams , indeed I missed that make step, sorry about that.

Although there are two open comments, I pushed the changes that I found more in line with my understanding. Also, I added the update to the admin documentation, that I completely missed before.

Enabling admins to create computeclass with GPU and accelarators in general.

Signed-off-by: dciangot <[email protected]>
@dciangot dciangot force-pushed the compclass_resources branch from ac81366 to 29bb0a2 Compare December 20, 2023 23:36
@dciangot
Copy link
Contributor Author

the recent commit broke the new tests for computeclass, I have rebased on the current main, hope it is ok for you

resolve offerings introduced in acorn-io#2377

Signed-off-by: dciangot <[email protected]>
Copy link
Contributor

@tylerslaton tylerslaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! Super excited to see this get in.

@thedadams thedadams self-requested a review January 2, 2024 13:09
Insert test and validation for computeclass `resources` field. Avoiding conflicts with spec.memory and spec.cpuScaler fields

Signed-off-by: Diego Ciangottini <[email protected]>
@cjellick
Copy link
Member

cjellick commented Jan 2, 2024

Looks like a new commit came in. WIll need rereviewed @thedadams @tylerslaton

Tyler, once this is good, can you own getting it merged and shepherding it along internally (whatever that may mean...actually dont think theres much to do there other than merge)

@dciangot
Copy link
Contributor Author

dciangot commented Jan 2, 2024

Yeah, it was on the latest @thedadams request. Now integration is failing, not related to the changes I made though. I can take a look later, meanwhile any hint is welcome.

@cjellick
Copy link
Member

cjellick commented Jan 2, 2024

we'll rekick the tests after it officially fails to see if it was a flake

Copy link
Contributor

@tylerslaton tylerslaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjellick Yep, I'll own getting this merged.

@dciangot Just one small comment and I'll reapprove once it is addressed. I believe the failure you were seeing was flake and I've restarted CI to check. If it fails we can look further.

@thedadams thedadams requested a review from tylerslaton January 2, 2024 17:40
check limits and requests on one go

Signed-off-by: Diego Ciangottini <[email protected]>
@dciangot dciangot requested a review from thedadams January 2, 2024 21:21
@tylerslaton
Copy link
Contributor

Merging now, thanks for the contribution @dciangot!

@tylerslaton tylerslaton merged commit 57a5112 into acorn-io:main Jan 3, 2024
5 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants