kernel: Add support for stopping workqueues #82614

Mattemagikern · 2024-12-05T15:21:10Z

This patch adds support for stopping workqueues. This is useful for freeing resources from workqueues when subsystems/modules is deactivated or cleaning up the system between tests in ztest to reach a fully normalized state.

The patch adds a new function k_work_queue_stop() that releases the workqueues thread and stack when a workqueue is unwanted. k_work_queue_stop(...) should be viewed as a counterpart to k_work_queue_start(...).

This would allow to:
k_work_queue_start(...);

k_work_drain(..., true);
k_work_queue_stop(...);

cfriedt · 2024-12-06T13:16:12Z

LGTM, but I'll wait until others have reviewed too.

andyross

Feature in general looks clean and useful.

One bug in the test. Some criticism of IMHO an inappropriately aggressive implementation of "stop()".

And just as general hygiene, can you split this up into three patches:

Add the feature
Add tracing hooks
Add tests

kernel/work.c

tests/kernel/workq/work_queue/src/start_stop.c

tests/kernel/workq/work_queue/prj.conf

Mattemagikern · 2024-12-06T15:24:49Z

@andyross

And just as general hygiene, can you split this up into three patches:

Add the feature

Add tracing hooks

Add tests

I'll split them up if you want, but could I ask why you prefere it that way?
Just for my own curiosity.

cfriedt · 2024-12-06T15:42:35Z

@Mattemagikern - separate commits make it easier to review and maintain in many cases. As general rules, to ensure if a code change needs to be backported, there are fewer side effects with smaller commits. If a regression occurs, it's easier to narrow down the root cause. We usually insist on separate code and test commits because it makes it easier to test bug fixes.

Generally good hygiene

andyross · 2024-12-06T15:51:54Z

I'll split them up if you want, but could I ask why you prefere it that way? Just for my own curiosity.

Once upon a time on Discord:

"Help help my very important code stopped working!"

"Maybe it's the new feature that just got added. Can you try reverting this patch and see if that fixes it?"

"Help help! I tried git revert and now it's giving me a bunch of collisions in the tracing code! Who can help me fix this? My code is very important!"

Small patches that do one thing in an isolated way makes it easier to figure out what broke when things inevitably break. Maintainers are lazy and like to toss off one-line suggestions vs. addressing rebase failures in their spare time.

Mattemagikern · 2024-12-06T15:57:19Z

@cfriedt @andyross Thanks for the explination!

The patch has been updated to only contain the functionallity.
I've confirmed the changes with my testcases seperately.

Mattemagikern · 2024-12-06T16:38:57Z

Another newbi question:
When you say you want the patches in [feature, tracing, tests], are these separate pull-requests or should I just split it up in commits on this pull-request?

npitre · 2024-12-06T16:56:15Z

Separate commits within the same pull request please.

Mattemagikern · 2024-12-06T17:01:10Z

@npitre thanks! ✔️

andyross

A few more nitpicks that occur to me, but this looks good.

andyross · 2024-12-06T17:55:25Z

include/zephyr/kernel.h

+ * @retval -ENODEV if the work queue was not started
+ * @retval -EBUSY if the work queue is actively processing work items
+ */
+int k_work_queue_stop(struct k_work_q *queue);


Note that this is a blocking call (because of k_thread_join()) and so per "Standard Zephyr API Idiom" should probably take a k_timeout to allow the app to control wakeup behavior. In this particular case that's probably not a big deal because the only way this will get stuck is if a workq handler refuses to return, which is already a bug elsewhere.

I'm worried about side effects if an user sets a very short timeout, e.g. K_NO_WAIT and/or thread priorities.
Where we get a false negative.

The case I'm worried about is:

int k_work_queue_stop(struct k_work_q *queue, k_timeout_t timeout) { ... flag_set(&queue->flags, K_WORK_QUEUE_STOP_BIT); notify_queue_locked(queue); k_spin_unlock(&lock, key); if (k_thread_join(&queue->thread, timeout)) { <--- sleeping here key = k_spin_lock(&lock); flag_clear(&queue->flags, K_WORK_QUEUE_STOP_BIT); k_spin_unlock(&lock, key); return -ETIMEOUT; } return 0; ... static void work_queue_main(void *workq_ptr, void *p2, void *p3) { .... k_spinlock_key_t key = k_spin_lock(&lock); <-- past this point .... } k_work_queue_stop(queue, K_NO_WAIT); // returns -ETIMEOUT // Is the thread on it's way to die or is it alive?

It may still be ok?
Subsequent calls to the workqueue will be handled (-ENODEV) due to the thread resets the flags.
I may be overthinking this. What do you think?

If k_work_queue_stop() is guaranteed to set the STOP bit, then the work queue is guaranteed to stop in finite time (absent app bugs that stall the handlers). I think that's acceptable. Apps that care about synchronization can always call it again and take care to avoid racing with their own handlers until they get a success.

Again, I don't think this rises to a -1 given the specifics; the timeout isn't likely to be very useful in practice. Nonetheless timeout arguments for blocking calls are The Zephyr Way, so it should be pointed out.

include/zephyr/kernel.h

tests/kernel/workq/work_queue/src/start_stop.c

cfriedt · 2024-12-07T09:55:23Z

tests/kernel/workq/work_queue/src/start_stop.c

+	}
+}
+
+ZTEST_SUITE(workqueue_reuse, NULL, NULL, NULL, NULL, NULL);


Ideally, rather than adding a whole new testsuite effectively for a single ZTEST testing a single API call, this should be added to an existing testsuite that are more like traditional unit tests for the work queue API.

in the build project there exists the:

workqueue_triggered
workqueue_delayed

They seem more specifc than my test, should I rename my suit to workqueue_api for future use instead?

or should I try to integrate my test into one of them or another test suite?

cfriedt · 2024-12-07T09:59:55Z

tests/kernel/workq/work_queue/src/start_stop.c

+		      "Succeeded to submit work item to non-initialized work queue");
+}
+
+ZTEST(workqueue_reuse, test_submit_to_queue)


Let's narrow the scope of the code we are testing to k_work_queue_stop() and add it to some other testsuite, since it is unlikely that this test is adding new coverage for work queue execution of handling multiple work items.

It doesn't really make sense to create a new suite for effectively testing one API call.

If we don't have an existing unit-style testsuite (e.g. workqueue_api), maybe this could be it.

Suggested change

ZTEST(workqueue_reuse, test_submit_to_queue)

ZTEST(workqueue_api, test_k_work_queue_stop)

I'll wait with these changes until we've decided on the correct placement of the tests 👍

include/zephyr/kernel.h

This patch adds support for stopping workqueues. This is useful for freeing resources from workqueues when subsystems/modules is deactivated or cleaning up the system between tests in ztest to reach a fully normalized state. The patch adds a new function k_work_queue_stop() that releases the workqueues thread and stack when a workqueue is unwanted. k_work_queue_stop(...) should be viewed as a counterpart to k_work_queue_start(...). This would allow to: k_work_queue_start(...); k_work_drain(..., true); k_work_queue_stop(...); Signed-off-by: Måns Ansgariusson <[email protected]>

Adds tracing support for stopping workqueues. Signed-off-by: Måns Ansgariusson <[email protected]>

This patch adds tests for stopping workqueues. Signed-off-by: Måns Ansgariusson <[email protected]>

TaiJuWu · 2024-12-07T13:02:24Z

kernel/work.c

+		} else if (flag_test(&queue->flags, K_WORK_QUEUE_STOP_BIT)) {
+			/* User has requested that the queue stop. Clear the status flags and exit.
+			 */
+			flags_set(&queue->flags, 0);
+			k_spin_unlock(&lock, key);
+			return;


Pardon me.
If we allow task append to the queue continuously after setting K_WORK_QUEUE_STOP_BIT, the block will never be executed.
Is it excepted?

A pre-condition to the _stop(..) function is that the queue has already been plugged.

static inline int queue_submit_locked(struct k_work_q *queue, struct k_work *work) { * * -ENODEV if the queue isn't running. * * -EBUSY if draining and not chained * * -EBUSY if plugged and not draining * * otherwise OK

So the case you're thinking of is:

Thread a { while (true) { queue_submit_locked(....); } } Thread b { k_work_queue_drain(..., true); k_work_queue_stop(..., K_FOREVER); }

Would thread b. freeze indefenately?
Initially, maybe 👍

Question is if we should disallow that append of work if the K_WORK_QUEUE_STOP_BIT is set as well.
I'm not sure. How do you reason?

Oh, I missed we need to call k_work_queue_drain before stopping workqueue, but the design make me thinks about this scenario will happen?

Thread a { while (true) { queue_submit_locked(....); } } Thread b k_work_queue_drain(..., true); Thread c k_work_queue_drain(..., true); Thread b k_work_queue_stop(..., K_FOREVER); Thread c k_work_queue_unplug(...);

Base on doc

The workqueue API is designed to be safe when invoked from multiple threads and interrupts.

I think it is possible so we need to care about this problem. WDYT?

In general, I prefer

Making K_WORK_QUEUE_STOP_BIT being isolated instead depend on K_WORK_QUEUE_PLUGGED_BIT.

Disallow append if K_WORK_QUEUE_STOP_BIT is set.

We don't clear K_WORK_QUEUE_STOP_BIT forever even if timeout avoid appending work to workqueue and documented it.

What do you and others think about this idea?

Sounds like a great opportunity for additional test cases

@TaiJuWu

Another option is to forbid the unplug function if K_WORK_QUEUE_STOP_BIT is set.

Thread a { while (true) { queue_submit_locked(....); <- Error once K_WORK_QUEUE_PLUGGED_BIT is set & not draining } } Thread b k_work_queue_drain(..., true); Thread c k_work_queue_drain(..., true); Thread b k_work_queue_stop(..., K_FOREVER); <- Would exit in finite time Thread c k_work_queue_unplug(...); <- Would yield error & not unplug.

Either thread B or C would return an error, depending on which accessed the queue first.
I believe this is the smallest change we could make to implement this feature. It shouldn't affect current use cases, what do you think?

@cfriedt Agreed!

The case I'm making could still deadlock thread b, if the queue never drains the queue will continue to accept works.

So we probbably need the 2. You outlined earlier as well.

Disallow append if K_WORK_QUEUE_STOP_BIT is set.

But that changes the behaviour of the queue & drain system slightly, is that ok?

Another option is to forbid the unplug function if K_WORK_QUEUE_STOP_BIT is set.

Sound great.

But that changes the behaviour of the queue & drain system slightly, is that ok?

It is ok for me but we still need @andyross or other maintainers comfirm since the caller is time-sensitive.

By the way, I am not sure we need to call k_work_queue_drain before k_work_queue_stop by users is good api design or not.

By the way, I am not sure we need to call k_work_queue_drain before k_work_queue_stop by users is good api design or not.

Myself, being biased (haha), think it is neat :P The queue_drain(..., true) gives the user the knowledge that the works scheduled up to that point will be executed and in so doing letting free-routines kick in as to not casue memory leaks in the application.
The k_work_queue_stop(...) stopps the thread as soon as the last job has been processed.

I'll commit the changes that we outlined here today :)

I think we need to call queue_drain(..., true) in k_work_queue_stop internally and set K_WORK_QUEUE_STOP_BIT berfore queue_drain(..., true)
Is it possible in current design?

TA TB queue_drain(..., true) k_work_queue_stop but flag is not set queue_unplug flag is set

Additionally, System Workqueue  should never stop I guess.

Additionally, System Workqueue  should never stop I guess.

The patch specifically addresses scenarios where stopping a workqueue becomes interesting, like in a couple of tests I have in mind.
I believe the patch aligns with the broader system goals. If this approach conflicts with project guidelines or expectations, I expect that the maintainers wil let me know shortly haha 🥲

I think we need to call queue_drain(..., true) in k_work_queue_stop internally and set K_WORK_QUEUE_STOP_BIT berfore queue_drain(..., true)
Is it possible in current design?

I've just looked through the kernel/work.c file, it would be a neat addition, one less function call from the users to achive the same effect. What's stopping me is the k_work_queue_drain(...) since the current implementation is blocking ( K_FOREVER ) and to resolve that I would need to rework that function to include a timeout and then manage to make the timeout time be split between the draining & the stopping of the thread (the stopping of the queue would be, almost, instantanious at that point if we've passed the drain stage).
@cfriedt @andyross How would you like me to proceede with:

to sort out the_drain(..) to include a timeout and incorperate parts of that function in the _stop(...)

or

keep it as is with the addition of blocking _unplug(..) and stop new works to be scheduled if the stop bit is set?

Regarding the race-condition you're (@TaiJuWu) outlining I don't think it is a problem since the getting & setting of the flags are protected by the spin_lock, the winner of the spin_lock would dictate the outcome.

andyross · 2024-12-11T16:49:07Z

FWIW:

If we allow task append to the queue continuously after setting K_WORK_QUEUE_STOP_BIT, the block will never be executed.

I think that's more of a policy decision than a bug. I can see arguments either way: you could force the queue to reject new submissions when a STOP is pending, or you could allow it and document it as "the queue will stop processing the next time it is idle". Either one seems like a reasonable choice.

andyross

Forgot to refresh +1. I'm fine with this as-is. Most of the suggestions seem acceptable too.

cfriedt · 2024-12-11T22:15:20Z

tests/kernel/workq/work_queue/src/start_stop.c

Probably would be best to rename to workq/worq_api, but I'm not blocking.

TaiJuWu

I don’t want to block this feature too.

Mattemagikern force-pushed the k_work_queue_stop branch 3 times, most recently from 63fe953 to 761384d Compare December 6, 2024 13:02

Mattemagikern marked this pull request as ready for review December 6, 2024 13:08

zephyrbot added area: Kernel area: Tracing Tracing labels Dec 6, 2024

zephyrbot requested review from andyross, ceolin, cfriedt, dcpleung, nashif, npitre, peter-mitsis, TaiJuWu and teburd December 6, 2024 13:08

zephyrbot assigned andyross and peter-mitsis Dec 6, 2024

andyross requested changes Dec 6, 2024

View reviewed changes

kernel/work.c Outdated Show resolved Hide resolved

tests/kernel/workq/work_queue/src/start_stop.c Outdated Show resolved Hide resolved

tests/kernel/workq/work_queue/prj.conf Outdated Show resolved Hide resolved

Mattemagikern force-pushed the k_work_queue_stop branch from 761384d to 5032425 Compare December 6, 2024 15:46

Mattemagikern force-pushed the k_work_queue_stop branch 2 times, most recently from f079b61 to ecf9d13 Compare December 6, 2024 15:54

andyross previously approved these changes Dec 6, 2024

View reviewed changes

cfriedt requested changes Dec 7, 2024

View reviewed changes

cfriedt reviewed Dec 7, 2024

View reviewed changes

include/zephyr/kernel.h Outdated Show resolved Hide resolved

Mattemagikern added 3 commits December 7, 2024 11:28

tracing: kernel: Add support for stopping workqueues

510213b

Adds tracing support for stopping workqueues. Signed-off-by: Måns Ansgariusson <[email protected]>

tests: kernel: Add support for stopping workqueues

2c3bf28

This patch adds tests for stopping workqueues. Signed-off-by: Måns Ansgariusson <[email protected]>

Mattemagikern dismissed andyross’s stale review via 2c3bf28 December 7, 2024 10:57

Mattemagikern force-pushed the k_work_queue_stop branch from 726569b to 2c3bf28 Compare December 7, 2024 10:57

TaiJuWu reviewed Dec 7, 2024

View reviewed changes

andyross approved these changes Dec 11, 2024

View reviewed changes

cfriedt approved these changes Dec 11, 2024

View reviewed changes

tests/kernel/workq/work_queue/src/start_stop.c

Copy link

Member

cfriedt Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably would be best to rename to workq/worq_api, but I'm not blocking.

TaiJuWu approved these changes Dec 11, 2024

View reviewed changes

kartben merged commit 8934b81 into zephyrproject-rtos:main Dec 12, 2024
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: Add support for stopping workqueues #82614

kernel: Add support for stopping workqueues #82614

Mattemagikern commented Dec 5, 2024

cfriedt commented Dec 6, 2024

andyross left a comment

Mattemagikern commented Dec 6, 2024 •

edited

Loading

cfriedt commented Dec 6, 2024

andyross commented Dec 6, 2024

Mattemagikern commented Dec 6, 2024

Mattemagikern commented Dec 6, 2024

npitre commented Dec 6, 2024 via email

Mattemagikern commented Dec 6, 2024

andyross left a comment

andyross Dec 6, 2024

Mattemagikern Dec 7, 2024 •

edited

Loading

andyross Dec 7, 2024

cfriedt Dec 7, 2024

Mattemagikern Dec 7, 2024 •

edited

Loading

cfriedt Dec 7, 2024

Mattemagikern Dec 7, 2024

TaiJuWu Dec 7, 2024 •

edited

Loading

Mattemagikern Dec 7, 2024 •

edited

Loading

TaiJuWu Dec 8, 2024 •

edited

Loading

cfriedt Dec 8, 2024

Mattemagikern Dec 8, 2024 •

edited

Loading

Mattemagikern Dec 8, 2024

TaiJuWu Dec 9, 2024 •

edited

Loading

Mattemagikern Dec 10, 2024

TaiJuWu Dec 10, 2024 •

edited

Loading

Mattemagikern Dec 10, 2024

andyross commented Dec 11, 2024

andyross left a comment

cfriedt Dec 11, 2024

TaiJuWu left a comment

	ZTEST(workqueue_reuse, test_submit_to_queue)
	ZTEST(workqueue_api, test_k_work_queue_stop)

kernel: Add support for stopping workqueues #82614

kernel: Add support for stopping workqueues #82614

Conversation

Mattemagikern commented Dec 5, 2024

cfriedt commented Dec 6, 2024

andyross left a comment

Choose a reason for hiding this comment

Mattemagikern commented Dec 6, 2024 • edited Loading

cfriedt commented Dec 6, 2024

andyross commented Dec 6, 2024

Mattemagikern commented Dec 6, 2024

Mattemagikern commented Dec 6, 2024

npitre commented Dec 6, 2024 via email

Mattemagikern commented Dec 6, 2024

andyross left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mattemagikern Dec 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mattemagikern Dec 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaiJuWu Dec 7, 2024 • edited Loading

Choose a reason for hiding this comment

Mattemagikern Dec 7, 2024 • edited Loading

Choose a reason for hiding this comment

TaiJuWu Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Mattemagikern Dec 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaiJuWu Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaiJuWu Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andyross commented Dec 11, 2024

andyross left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TaiJuWu left a comment

Choose a reason for hiding this comment

Mattemagikern commented Dec 6, 2024 •

edited

Loading

Mattemagikern Dec 7, 2024 •

edited

Loading

Mattemagikern Dec 7, 2024 •

edited

Loading

TaiJuWu Dec 7, 2024 •

edited

Loading

Mattemagikern Dec 7, 2024 •

edited

Loading

TaiJuWu Dec 8, 2024 •

edited

Loading

Mattemagikern Dec 8, 2024 •

edited

Loading

TaiJuWu Dec 9, 2024 •

edited

Loading

TaiJuWu Dec 10, 2024 •

edited

Loading