[BugFix] abort compaction task properly during shutdown #55503

kevincai · 2025-01-28T08:49:10Z

Why I'm doing:

What I'm doing:

Fixes #issue

What type of PR is this:

Does this PR entail a change in behavior?

Yes, this PR will result in a change in behavior.
No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

Interface/UI changes: syntax, type conversion, expression evaluation, display information
Parameter changes: default values, similar parameters but with different default values
Policy changes: use new policy to replace old one, functionality automatically enabled
Feature removed
Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

I have added test cases for my bug fix or my new feature
This pr needs user documentation (for new or modified features or behaviors)
- I have added documentation for my new feature or new function
This is a backport pr

Bugfix cherry-pick branch check:

starrocks-xupeng · 2025-02-05T02:32:22Z

be/src/storage/lake/compaction_scheduler.cpp

    }
 }

 void CompactionScheduler::compact(::google::protobuf::RpcController* controller, const CompactRequest* request,
                                  CompactResponse* response, ::google::protobuf::Closure* done) {
+    if (_stopped) {
+        brpc::ClosureGuard guard(done);
+        auto st = Status::Aborted("Compaction shutdown in progress!");


BE/CN shudown

starrocks-xupeng · 2025-02-05T02:47:03Z

be/src/storage/lake/compaction_scheduler.cpp

@@ -197,6 +206,9 @@ void CompactionScheduler::compact(::google::protobuf::RpcController* controller,
    }
    // initialize last check time, compact request is received right after FE sends it, so consider it valid now
    cb->set_last_check_time(time(nullptr));
+    // FIXME: be noticed that even the `_stopped` is checked in the beginning, it is still possible that the scheduler
+    // turns to stop status after the check and before this point. In case of the conflict happens, the contexts_vec
+    // may never be processed which leads to rpc hanging without response.


should be done by abort all?

so comment can be adjusted

np. there is a gap between the abort_all and compact(). Consider thread A and B done in the following seq

1. thread-A: compact() execute to line 183 2. thread-B: stop() 3. thread-A: run to line 212

So the abort_all() is done and new context are still pushed into the task queue after abort_all.

i think it can be handled? by adding a mutex in compaction scheduler?

starrocks-xupeng · 2025-02-05T02:48:33Z

be/src/storage/lake/compaction_scheduler.cpp

@@ -432,6 +459,21 @@ Status CompactionScheduler::abort(int64_t txn_id) {
    return Status::NotFound(fmt::format("no compaction task with txn id {}", txn_id));
 }

+void CompactionScheduler::abort_all() {
+    for (int i = 0; i < _task_queues.task_queue_size(); ++i) {


there might be a concurrency problem here, task_queue_size might change

Theoretically, expect the abort_all called only during the shutdown procedure, where all the services have already shutdown. Nevertheless, worth to deal the case.

Take a second look at the implementation, it is safe even if the task_queue_size changed, because the try_get() interface can handle the i out of index if the queue size shrink. As for the expansion of the queue, it should be fine as long as no tasks can be queued after stopped (a different topic to take care).

Signed-off-by: Kevin Xiaohua Cai <[email protected]>

github-actions · 2025-02-07T04:30:33Z

[Java-Extensions Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2025-02-07T04:30:36Z

[FE Incremental Coverage Report]

✅ pass : 0 / 0 (0%)

github-actions · 2025-02-07T04:41:20Z

[BE Incremental Coverage Report]

✅ pass : 44 / 46 (95.65%)

file detail

	path	covered_line	new_line	coverage	not_covered_line_detail
🔵	be/src/storage/lake/compaction_scheduler.cpp	44	46	95.65%	[273, 274]

kevincai requested a review from a team as a code owner January 28, 2025 08:49

github-actions bot added 3.4 3.3 3.2 labels Jan 28, 2025

mergify bot assigned kevincai Jan 28, 2025

kevincai force-pushed the abort-compaction-during-shutdown branch from 0dac0f6 to 1f09a1b Compare January 28, 2025 14:02

starrocks-xupeng reviewed Feb 5, 2025

View reviewed changes

kevincai force-pushed the abort-compaction-during-shutdown branch from 1f09a1b to 367a64f Compare February 5, 2025 06:23

[BugFix] abort compaction task properly during shutdown

ff35b32

Signed-off-by: Kevin Xiaohua Cai <[email protected]>

kevincai force-pushed the abort-compaction-during-shutdown branch from 367a64f to ff35b32 Compare February 7, 2025 03:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix] abort compaction task properly during shutdown #55503

[BugFix] abort compaction task properly during shutdown #55503

kevincai commented Jan 28, 2025

starrocks-xupeng Feb 5, 2025

kevincai Feb 5, 2025

starrocks-xupeng Feb 5, 2025

starrocks-xupeng Feb 5, 2025

kevincai Feb 5, 2025

starrocks-xupeng Feb 5, 2025

kevincai Feb 5, 2025

starrocks-xupeng Feb 5, 2025

kevincai Feb 5, 2025

kevincai Feb 5, 2025

github-actions bot commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

github-actions bot commented Feb 7, 2025

[BugFix] abort compaction task properly during shutdown #55503

Are you sure you want to change the base?

[BugFix] abort compaction task properly during shutdown #55503

Conversation

kevincai commented Jan 28, 2025

Why I'm doing:

What I'm doing:

What type of PR is this:

Checklist:

Bugfix cherry-pick branch check:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Feb 7, 2025

[Java-Extensions Incremental Coverage Report]

github-actions bot commented Feb 7, 2025

[FE Incremental Coverage Report]

github-actions bot commented Feb 7, 2025

[BE Incremental Coverage Report]

file detail