-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy: expose internals of resource module policy factory to allow custom policies #1268
base: master
Are you sure you want to change the base?
Conversation
And I know the testing and the code should be in separate commits... sigh (will fix) |
f7f1df3
to
cd87d59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this @wihobbs! I have just some preliminary questions/comments on a first pass. Apologies in advance that I'm not very familiar with the sched code at all so I could definitely be misunderstanding how this works. :-)
resource_type_t node_rt ("node"); | ||
std::shared_ptr<dfu_match_cb_t> create_match_cb (const std::string &policy_requested) | ||
{ | ||
std::string policy = policies.find (policy_requested)->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this .find ()
call have any sort of error checking after the lookup of policy_requested
? Maybe this is always guaranteed to find something successfully. If not, maybe you can check the contents of policy
before moving on.
if (policy == "")
// return an error?
// set matcher to something default?
// sorry that idk what the right behavior should be here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm good catch. It should call known_match_policy
first to see if the policy exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually, known_match_policy
is called when the config is validated in resource/modules/resource_match_opts.cpp
so it's already been validated before it gets here.
This will need additional work to support custom
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, a comment making note that it is already validated by the time it gets to that lookup might be helpful here!
|
||
if (option_exists ("stop_on_k_matches", policy)) { | ||
ptr->set_stop_on_k_matches (parse_int_match_options ("stop_on_k_matches", policy)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this if-else
branch need an else
statement? Excuse my ignorance on this - I'm not sure if the addition of custom policies allows for something other than low
or high
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK low and high are the only options, unless locality
or variation
are chosen.
5f03ae3
to
ba80f7b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just did a quick look-through, it looks reasonable to me. I think the commits could be broken up better though (as you noted already)
bool known_match_policy (const std::string &policy); | ||
|
||
const std::map<std::string, std::string> policies = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do a deeper look over this, but first thing this should be split. Put the declaration here with extern
and the definition in the cpp file so this variable doesn't end up getting defined in all includers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be fair, that's what the old code did, but it shouldn't 😬
PR title seems truncated? |
Well, Note that this is still WIP and needs some config work and testing for the custom options. Happy to incorporate any feedback on the initial draft (and thanks to those who have already reviewed!), but it's going to stay in draft state until the code can:
|
Per discussion with Tom:
|
ba80f7b
to
2d176c4
Compare
147eebf
to
3484d1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good! Just noticed one thing.
I just asked @jameshcorbett to do a first pass to get me started. Note this still is in draft state until it has working sharness tests and additional reviews. |
3484d1c
to
2efc97e
Compare
2efc97e
to
bdff4d5
Compare
Once the automated tests pass I'll drop draft from this PR. |
This is starting to look really good @wihobbs! There's one tweak I'd like to see to this so we can keep extending it. I like the way the options are handled, this is exactly what I was hoping for. The one tweak I'd make is to have a |
5628d63
to
7e72499
Compare
7e72499
to
62b7cf4
Compare
I think I've incorporated this feedback @trws, though slightly differently than you laid out in your comment. Instead of a I haven't implemented allowing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good @wihobbs!! I've left some general feedback; I should probably defer to someone who knows the sched code better than I do to actually approve this but I think this looks pretty good to me!
Problem: policies in fluxion are sets of options that are specified in a shared pointer, and, when a new set of options (a new policy) is created, this requires at least a re-compilation of flux-sched. In other words, the internal options on which policies are built are not exposed to end users for creating new, custom policies. Solution: continue to support the existing policy options, but make them sets of specified options, and allow users to create their own sets and pass them to the scheduler as a string.
Problem: the changes to the policy factory do not have sharness testing. Solution: building from existing, validated outputs, provide the full string for certain policy options (rather than the short string) and check that the matches are the same.
Problem: the policy factory in the resource module has no unit testing. Solution: add a series of unit tests to validate the parsing and selection of policy options in the factory.
Problem: currently, the resource module sets a default policy (first) when an invalid policy is specified. With the expansion of the resource module to include custom policies, users might be less accepting of a default policy being forced on them. Emit an error when an invalid policy is specified in the match-policy key. Exit with EINVAL.
Problem: the current testsuite includes two separate tests that assert that the sched-fluxion-resource module will not emit an error and abort if an invalid policy is specified in its configuration. However, this behavior is undesirable, especially since users will soon be able to specify custom policies and should get an error if they specify an invalid one. Amend these tests to test for failure in the cases where an invalid policy is specified.
Problem: the example config files for tests in t/conf.d includes an example of invalid policy settings, both for match-policy in resource and queue-policy in fluxion. This configuration is only used in t1005, where qmanager is tested to ensure it tolerates an invalid queue policy and defaults to fcfs. Recently, the resource module was updated to not tolerate invalid policies, but this shouldn't change the behavior of qmanager. Use first in the sched-fluxion-resource toml file example for this test.
62b7cf4
to
1ced65b
Compare
Thanks @cmoussa1! Fixed and pushed. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1268 +/- ##
========================================
+ Coverage 75.3% 75.4% +0.1%
========================================
Files 111 112 +1
Lines 16042 16103 +61
========================================
+ Hits 12081 12146 +65
+ Misses 3961 3957 -4
|
*/ | ||
if (std::find (policy_options.begin (), policy_options.end (), settings.at (0)) | ||
== policy_options.end ()) { | ||
std::cerr << "invalid policy option: " << settings.at (0) << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, you should probably avoid writing directly to stdout/err from a broker module.
If there's a way to throw an exception and catch the error message at a higher level, which would then flux_log (LOG_ERR, ...)
the result, that might be good. Otherwise you'd have to pass down a flux_t *
handle to this function so it can log errors.
You could also return an error via a separate parameter when the result is false
. Some of the other Fluxion developers may have a preference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had a chat about this yesterday after the meeting. Just above this in the call chain we're passing a std::string through by reference and appending messages to it as necessary, then returning it up to where we can handle it. The trick is we can't always have a handle here, because this is used in both the module and in resource_query
. Essentially the separate parameter option. Example of the plan on line 516 in resource_match_opts.
I have a separate issue about doing a better job of this in fluxion as a whole, but don't want it to block this PR, so plan is to pass that output string down through for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's a hole in this PR. Will fix with what we discussed. Sorry, I should have done that before marking it as "ready." I got excited [and maybe a little ahead of myself] when the testsuite finally passed 😄
Problem: the current fluxion policy matching factory doesn't allow for customizable policies.
Add a custom option and refactor existing policies to behave similarly. Add unit testing for the string parsing convenience functions.
Outstanding to-do: