Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: track the resources used across all of an association's running jobs #561

Merged
merged 4 commits into from
Jan 13, 2025

Conversation

cmoussa1
Copy link
Member

@cmoussa1 cmoussa1 commented Jan 9, 2025

Problem

The priority plugin does not keep track of the resources used by an association across all of their running jobs, which will be needed when the plugin begins to enforce limits on how many resources they can have across all of their running jobs.


This PR looks to add the tracking component of an association's resource consumption across all of their running jobs. It does this by copying over the jj (along with a required streq function) code over from flux-core and utilizing the jj_get_counts_json () function to get a total nnodes and ncores count. When a job enters job.state.run, the association's current nnodes and ncores count will get incremented accordingly, and subsequently decremented when the job enters job.state.inactive.

I've also added some basic tests that check tracking resources across an association's set of jobs as they go through their lifecycle.

I probably need to hold this PR as [WIP] until I properly add the license for the code I copied over from ccan over in flux-core; for now, I just manually copied the streq () function to the jj.h file just to get a working solution, but should clean this up.

@cmoussa1 cmoussa1 added new feature new feature plugin related to the multi-factor priority plugin labels Jan 9, 2025
@cmoussa1 cmoussa1 force-pushed the plugin.track.resources branch 2 times, most recently from e42e60e to a4fbc4e Compare January 9, 2025 20:40
@cmoussa1 cmoussa1 changed the title [WIP] plugin: track the resources used across all of an association's running jobs plugin: track the resources used across all of an association's running jobs Jan 9, 2025
@cmoussa1 cmoussa1 marked this pull request as ready for review January 9, 2025 20:51
@cmoussa1 cmoussa1 requested a review from jameshcorbett January 9, 2025 20:51
Copy link
Member

@jameshcorbett jameshcorbett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍 two little questions.

b->cur_run_jobs--;
if (jobspec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

under what conditions would jobspec be NULL? Just wondering because then a user might get stuck with the system permanently thinking they have active resources when in fact they don't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only case I can think of where jobspec might be NULL is if for some reason the jobtap plugin failed to unpack the jobspec in flux_plugin_arg_unpack (). Perhaps the plugin should raise an exception on the job if it fails to unpack it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure that can't happen - the "pack" operation would fail and not call this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you said unpack (sorry). You probably want to immediately fail if the unpack fails (like due to out of memory).

src/plugins/jj.hpp Show resolved Hide resolved
@garlick
Copy link
Member

garlick commented Jan 9, 2025

For running jobs, would it make more sense to grab R instead of jobspec? In jobspec, the node count can be ambiguous if the user only requested cores, for example. But R always contains the actual count.

Problem: flux-core has some useful code for counting the resources for
a job, but it needs to be manually copied over.

Copy the jj.* files from flux-core and place them in the same folder as
the priority plugin so that it can be compiled together and have its
methods able to be accessed by the plugin.

Manually copy over the streq() definition and place it in jj.hpp as
well as copy over its LICENSE from flux-core. Reference it in jj.hpp.
Problem: An Association object has no way to track its current node and
core count across all of its running jobs, which is needed in order to
enforce limits in case a submitted job would put them over their max.

Add two new members to the Association class, cur_nodes and cur_cores,
which will store the current number of nodes and cores allocated across
all of the association's running jobs, respectively.

Update the unit test to account for the addition of the two new members
added to the Association class.
Problem: The priority plugin does not keep track of the resources
allocated per-job when an association submits a job, which will be
needed when the plugin begins to enforce limits on how many resources
an association can use across all of their running jobs.

Add an increment/decrement of node+core counts in job.state.run and
job.state.inactive by unpacking the jobspec of the job and
calculating the number of nodes and cores assigned with the job.
Increment and decrement these values in the "cur_nodes" and "cur_cores"
attributes of the Association object.
Problem: There exists no tests for tracking the number of resources
used across an association's running jobs and ensuring that the counts
are as expected.

Add some tests.
@cmoussa1
Copy link
Member Author

cmoussa1 commented Jan 9, 2025

@garlick that's a good suggestion! I picked jobspec because I was able to use the jj_get_counts() function to get the resource counts, but I didn't know I could also do this with R. Is that the case?

Also, when the plugin ultimately needs to enforce a limit due to exceeding a certain max, I believe this check would need to happen in job.state.depend. I think the jobtap man page states that the R can only be accessed in RUN state or later. So, if I were to use R in one callback and jobspec in another, is there a chance I could end up with a discrepancy in the actual resource count? Apologies if I'm misunderstanding.

@cmoussa1 cmoussa1 force-pushed the plugin.track.resources branch from a4fbc4e to 4494d3c Compare January 9, 2025 23:12
@garlick
Copy link
Member

garlick commented Jan 9, 2025

Yes, R is only available at RUN state.

On node scheduled, homogeneous systems, jobspec should be fine. That's everything we have in production today AFAIK.

However for core scheduled and/or homogeneous systems (different node:core counts depending on the node), if the user requests N cores, you may not know how many nodes that is until the scheduler assigns them.

Maybe that's OK for now where we are now!

@cmoussa1
Copy link
Member Author

cmoussa1 commented Jan 9, 2025

Thank you for the clarification! Yes, the latter case is absolutely something to consider - I think at this point, just to make some progress on this front, maybe using jobspec (and/or making an estimation as best we can) should be okay.

@mergify mergify bot merged commit 0ffd06e into flux-framework:master Jan 13, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge-when-passing new feature new feature plugin related to the multi-factor priority plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants