policy: expose internals of resource module policy factory to allow custom policies #1268

wihobbs · 2024-08-13T01:05:28Z

Problem: the current fluxion policy matching factory doesn't allow for customizable policies.

Add a custom option and refactor existing policies to behave similarly. Add unit testing for the string parsing convenience functions.

Outstanding to-do:

add an option in the scheduler config to actually pass a custom string in, and probably some validation too

wihobbs · 2024-08-13T01:07:43Z

And I know the testing and the code should be in separate commits... sigh (will fix)

cmoussa1

Thanks for opening this @wihobbs! I have just some preliminary questions/comments on a first pass. Apologies in advance that I'm not very familiar with the sched code at all so I could definitely be misunderstanding how this works. :-)

resource/policies/dfu_match_policy_factory.cpp

cmoussa1 · 2024-08-13T15:36:48Z

resource/policies/dfu_match_policy_factory.cpp

-    resource_type_t node_rt ("node");
+std::shared_ptr<dfu_match_cb_t> create_match_cb (const std::string &policy_requested)
+{
+    std::string policy = policies.find (policy_requested)->second;


Should this .find () call have any sort of error checking after the lookup of policy_requested? Maybe this is always guaranteed to find something successfully. If not, maybe you can check the contents of policy before moving on.

if (policy == "") // return an error? // set matcher to something default? // sorry that idk what the right behavior should be here

Mmm good catch. It should call known_match_policy first to see if the policy exists.

Oh, actually, known_match_policy is called when the config is validated in resource/modules/resource_match_opts.cpp so it's already been validated before it gets here.

This will need additional work to support custom.

In that case, a comment making note that it is already validated by the time it gets to that lookup might be helpful here!

cmoussa1 · 2024-08-13T15:38:40Z

resource/policies/dfu_match_policy_factory.cpp

+
+            if (option_exists ("stop_on_k_matches", policy)) {
+                ptr->set_stop_on_k_matches (parse_int_match_options ("stop_on_k_matches", policy));
+            }


Does this if-else branch need an else statement? Excuse my ignorance on this - I'm not sure if the addition of custom policies allows for something other than low or high

AFAIK low and high are the only options, unless locality or variation are chosen.

resource/policies/base/test/matcher_policy_factory_test02.cpp

jameshcorbett

Just did a quick look-through, it looks reasonable to me. I think the commits could be broken up better though (as you noted already)

trws · 2024-08-19T20:01:55Z

resource/policies/dfu_match_policy_factory.hpp

 bool known_match_policy (const std::string &policy);

+const std::map<std::string, std::string> policies =


I'll do a deeper look over this, but first thing this should be split. Put the declaration here with extern and the definition in the cpp file so this variable doesn't end up getting defined in all includers.

To be fair, that's what the old code did, but it shouldn't 😬

garlick · 2024-08-20T15:40:05Z

PR title seems truncated?

wihobbs · 2024-08-20T17:18:53Z

Well, custom is a new type of policy that this PR is enabling. Or will enable eventually. Maybe "resource: refactor policy options to allow a user-defined policy" is more descriptive?

Note that this is still WIP and needs some config work and testing for the custom options. Happy to incorporate any feedback on the initial draft (and thanks to those who have already reviewed!), but it's going to stay in draft state until the code can:

accept a custom string with policy options
reject certain strings that are requesting invalid custom options, such as set_stop_on_k_matches=-100 and such
test some custom options and make sure they behave as expected
break out the commit history in a better way

wihobbs · 2024-09-05T04:02:31Z

Per discussion with Tom:

To handle custom policies, "don't validate, parse" -- parse the input on a delimiter first and then verify that each key is valid, give an error message if it isn't, possibly demote this to a warning based on a config option
Fluxion should issue a fatal warning if a policy named is not valid -- don't fall back on first (might already be done?)
set_stop_on_k_matches probably only takes a value of 1, however, we should test this
~~we should also think about having "node_centric" have a score factor of 10000, maybe we don't want to allow users to configure this, but maybe that number doesn't cover all our bases~~

jameshcorbett

Looks pretty good! Just noticed one thing.

resource/policies/dfu_match_policy_factory.hpp

grondo · 2025-01-16T19:11:08Z

Don't forget to address @garlick's comment about the PR title, which will end up as a line in the release notes.

wihobbs · 2025-01-16T19:51:47Z

I just asked @jameshcorbett to do a first pass to get me started. Note this still is in draft state until it has working sharness tests and additional reviews.

wihobbs · 2025-01-22T21:58:15Z

Once the automated tests pass I'll drop draft from this PR.

trws · 2025-01-23T20:12:46Z

This is starting to look really good @wihobbs! There's one tweak I'd like to see to this so we can keep extending it. I like the way the options are handled, this is exactly what I was hoping for. The one tweak I'd make is to have a policy=* option rather than separate low= and high=, since the base policy object types are low, high, locality and variation. That way, we get the "you can't set both low and high at the same time" for free, and can even collapse the two low and high branches to create a matcher based on which it is then set options on the resulting object after the condition. We can also check that policy has been set, either with the bare name or by having policy=whatever to allow options. Does that sound workable?

wihobbs · 2025-01-31T01:28:00Z

I think I've incorporated this feedback @trws, though slightly differently than you laid out in your comment. Instead of a policy key which accepts locality, variation, high, and low, I added a policy key that will accept low or high, and you can still set match-policy to locality, variation, high, and low, in which case the latter two would result in a "vanilla" high or low policy without the extra options. Does that suffice for what you laid out? It turned out to be not a huge lift.

I haven't implemented allowing set_stop_on_k_matches to accept an integer yet, but I'm going to go ahead and request a few reviews on this, to get feedback on the code and start getting the commits polished for approval. If you're okay with kicking that down the road to another PR (apologies, I forgot what we decided on that in our conversation yesterday), we could do that when the option itself supports integers other than 1.

cmoussa1

Looks pretty good @wihobbs!! I've left some general feedback; I should probably defer to someone who knows the sched code better than I do to actually approve this but I think this looks pretty good to me!

t/CMakeLists.txt

resource/policies/dfu_match_policy_factory.cpp

Problem: policies in fluxion are sets of options that are specified in a shared pointer, and, when a new set of options (a new policy) is created, this requires at least a re-compilation of flux-sched. In other words, the internal options on which policies are built are not exposed to end users for creating new, custom policies. Solution: continue to support the existing policy options, but make them sets of specified options, and allow users to create their own sets and pass them to the scheduler as a string.

Problem: the changes to the policy factory do not have sharness testing. Solution: building from existing, validated outputs, provide the full string for certain policy options (rather than the short string) and check that the matches are the same.

Problem: the policy factory in the resource module has no unit testing. Solution: add a series of unit tests to validate the parsing and selection of policy options in the factory.

Problem: currently, the resource module sets a default policy (first) when an invalid policy is specified. With the expansion of the resource module to include custom policies, users might be less accepting of a default policy being forced on them. Emit an error when an invalid policy is specified in the match-policy key. Exit with EINVAL.

Problem: the current testsuite includes two separate tests that assert that the sched-fluxion-resource module will not emit an error and abort if an invalid policy is specified in its configuration. However, this behavior is undesirable, especially since users will soon be able to specify custom policies and should get an error if they specify an invalid one. Amend these tests to test for failure in the cases where an invalid policy is specified.

Problem: the example config files for tests in t/conf.d includes an example of invalid policy settings, both for match-policy in resource and queue-policy in fluxion. This configuration is only used in t1005, where qmanager is tested to ensure it tolerates an invalid queue policy and defaults to fcfs. Recently, the resource module was updated to not tolerate invalid policies, but this shouldn't change the behavior of qmanager. Use first in the sched-fluxion-resource toml file example for this test.

wihobbs · 2025-01-31T18:40:05Z

Thanks @cmoussa1! Fixed and pushed.

codecov · 2025-01-31T18:45:45Z

Codecov Report

Attention: Patch coverage is 96.73913% with 3 lines in your changes missing coverage. Please review.

Project coverage is 75.4%. Comparing base (471da50) to head (1ced65b).

Files with missing lines	Patch %	Lines
resource/policies/dfu_match_policy_factory.cpp	95.0%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##           master   #1268     +/-   ##
========================================
+ Coverage    75.3%   75.4%   +0.1%     
========================================
  Files         111     112      +1     
  Lines       16042   16103     +61     
========================================
+ Hits        12081   12146     +65     
+ Misses       3961    3957      -4

Files with missing lines	Coverage Δ
resource/modules/resource_match_opts.cpp	`84.4% <100.0%> (+0.3%)`	⬆️
...licies/base/test/matcher_policy_factory_test02.cpp	`100.0% <100.0%> (ø)`
resource/policies/dfu_match_policy_factory.cpp	`90.6% <95.0%> (+2.0%)`	⬆️

... and 2 files with indirect coverage changes

grondo · 2025-01-31T20:06:36Z

resource/policies/dfu_match_policy_factory.cpp

+         */
+        if (std::find (policy_options.begin (), policy_options.end (), settings.at (0))
+            == policy_options.end ()) {
+            std::cerr << "invalid policy option: " << settings.at (0) << std::endl;


Actually, you should probably avoid writing directly to stdout/err from a broker module.

If there's a way to throw an exception and catch the error message at a higher level, which would then flux_log (LOG_ERR, ...) the result, that might be good. Otherwise you'd have to pass down a flux_t * handle to this function so it can log errors.

You could also return an error via a separate parameter when the result is false. Some of the other Fluxion developers may have a preference here.

We had a chat about this yesterday after the meeting. Just above this in the call chain we're passing a std::string through by reference and appending messages to it as necessary, then returning it up to where we can handle it. The trick is we can't always have a handle here, because this is used in both the module and in resource_query. Essentially the separate parameter option. Example of the plan on line 516 in resource_match_opts.

I have a separate issue about doing a better job of this in fluxion as a whole, but don't want it to block this PR, so plan is to pass that output string down through for now.

Yep, that's a hole in this PR. Will fix with what we discussed. Sorry, I should have done that before marking it as "ready." I got excited [and maybe a little ahead of myself] when the testsuite finally passed 😄

wihobbs requested review from jameshcorbett and cmoussa1 August 13, 2024 01:06

wihobbs force-pushed the new-policy-refactor branch from f7f1df3 to cd87d59 Compare August 13, 2024 01:09

cmoussa1 reviewed Aug 13, 2024

View reviewed changes

wihobbs force-pushed the new-policy-refactor branch 2 times, most recently from 5f03ae3 to ba80f7b Compare August 13, 2024 16:31

jameshcorbett reviewed Aug 13, 2024

View reviewed changes

trws reviewed Aug 19, 2024

View reviewed changes

wihobbs force-pushed the new-policy-refactor branch from ba80f7b to 2d176c4 Compare November 15, 2024 17:25

wihobbs force-pushed the new-policy-refactor branch 3 times, most recently from 147eebf to 3484d1c Compare December 23, 2024 18:07

wihobbs requested a review from jameshcorbett January 16, 2025 18:19

jameshcorbett reviewed Jan 16, 2025

View reviewed changes

resource/policies/dfu_match_policy_factory.hpp Outdated Show resolved Hide resolved

wihobbs force-pushed the new-policy-refactor branch from 3484d1c to 2efc97e Compare January 22, 2025 21:54

wihobbs changed the title ~~policy: refactor options to allow custom~~ policy: expose internals of resource module policy factory to allow custom policies Jan 22, 2025

wihobbs force-pushed the new-policy-refactor branch from 2efc97e to bdff4d5 Compare January 22, 2025 21:57

wihobbs marked this pull request as ready for review January 22, 2025 22:48

wihobbs requested a review from jameshcorbett January 22, 2025 22:48

wihobbs force-pushed the new-policy-refactor branch 3 times, most recently from 5628d63 to 7e72499 Compare January 31, 2025 01:06

wihobbs force-pushed the new-policy-refactor branch from 7e72499 to 62b7cf4 Compare January 31, 2025 01:27

wihobbs requested review from trws and cmoussa1 January 31, 2025 01:28

cmoussa1 reviewed Jan 31, 2025

View reviewed changes

t/CMakeLists.txt Show resolved Hide resolved

resource/policies/dfu_match_policy_factory.cpp Outdated Show resolved Hide resolved

resource/policies/dfu_match_policy_factory.cpp Outdated Show resolved Hide resolved

resource/policies/dfu_match_policy_factory.cpp Outdated Show resolved Hide resolved

wihobbs added 6 commits January 31, 2025 10:38

t: add t3037-resource-custom-policy

cd45541

Problem: the changes to the policy factory do not have sharness testing. Solution: building from existing, validated outputs, provide the full string for certain policy options (rather than the short string) and check that the matches are the same.

test: add unit testing for policy factory

826b625

Problem: the policy factory in the resource module has no unit testing. Solution: add a series of unit tests to validate the parsing and selection of policy options in the factory.

wihobbs force-pushed the new-policy-refactor branch from 62b7cf4 to 1ced65b Compare January 31, 2025 18:39

grondo reviewed Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

policy: expose internals of resource module policy factory to allow custom policies #1268

policy: expose internals of resource module policy factory to allow custom policies #1268

wihobbs commented Aug 13, 2024 •

edited

Loading

wihobbs commented Aug 13, 2024

cmoussa1 left a comment

cmoussa1 Aug 13, 2024

wihobbs Aug 13, 2024

wihobbs Aug 13, 2024

cmoussa1 Aug 13, 2024

cmoussa1 Aug 13, 2024

wihobbs Aug 13, 2024

jameshcorbett left a comment

trws Aug 19, 2024

trws Aug 19, 2024

garlick commented Aug 20, 2024

wihobbs commented Aug 20, 2024 •

edited

Loading

wihobbs commented Sep 5, 2024 •

edited

Loading

jameshcorbett left a comment

grondo commented Jan 16, 2025

wihobbs commented Jan 16, 2025

wihobbs commented Jan 22, 2025

trws commented Jan 23, 2025

wihobbs commented Jan 31, 2025

cmoussa1 left a comment

wihobbs commented Jan 31, 2025

codecov bot commented Jan 31, 2025

grondo Jan 31, 2025

trws Jan 31, 2025

wihobbs Jan 31, 2025

		bool known_match_policy (const std::string &policy);

		const std::map<std::string, std::string> policies =

policy: expose internals of resource module policy factory to allow custom policies #1268

Are you sure you want to change the base?

policy: expose internals of resource module policy factory to allow custom policies #1268

Conversation

wihobbs commented Aug 13, 2024 • edited Loading

wihobbs commented Aug 13, 2024

cmoussa1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jameshcorbett left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

garlick commented Aug 20, 2024

wihobbs commented Aug 20, 2024 • edited Loading

wihobbs commented Sep 5, 2024 • edited Loading

jameshcorbett left a comment

Choose a reason for hiding this comment

grondo commented Jan 16, 2025

wihobbs commented Jan 16, 2025

wihobbs commented Jan 22, 2025

trws commented Jan 23, 2025

wihobbs commented Jan 31, 2025

cmoussa1 left a comment

Choose a reason for hiding this comment

wihobbs commented Jan 31, 2025

codecov bot commented Jan 31, 2025

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wihobbs commented Aug 13, 2024 •

edited

Loading

wihobbs commented Aug 20, 2024 •

edited

Loading

wihobbs commented Sep 5, 2024 •

edited

Loading