Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Move delta write open out of brpc worker #56517

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

banmoy
Copy link
Contributor

@banmoy banmoy commented Mar 4, 2025

Why I'm doing:

tablet_writer_open brpc request needs to initialize DeltaWriter (DeltaWriter::_init). The initialization can be heavy because it needs to hold many pthread locks, such as

  • Tablet _schema_lock, _migration_lock, _ingest_lock
  • TabletUpdates::_lock
  • TabletManager:tablet shard lock
  • TxnManager: txn shard lock

Waiting lock can block brpc worker and make the whole brpc service unavailable. We need move the initialization out of brpc worker so that it won't affect the brpc service.

What I'm doing:

when receiving the tablet_writer_open rpc request, submit it to a separate thread pool to process it

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 3.4
    • 3.3
    • 3.2
    • 3.1
    • 3.0

@banmoy banmoy requested review from a team as code owners March 4, 2025 01:30
@mergify mergify bot assigned banmoy Mar 4, 2025
@@ -679,11 +679,59 @@ Status LocalTabletsChannel::_open_all_writers(const PTabletWriterOpenRequest& pa
}

_is_replicated_storage = params.is_replicated_storage();
std::shared_ptr<DeltaWriterOpenContext> context = std::make_shared<DeltaWriterOpenContext>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can move the execution to thread pool in the brpc entrypoint in backend_service.cpp or internal_service.cpp just like other brpc interface, e.g. exec_plan_fragment ...

@banmoy banmoy force-pushed the delta_write_open branch from 66757a6 to 38d907f Compare March 4, 2025 06:51
@banmoy banmoy force-pushed the delta_write_open branch from 38d907f to a72f6b8 Compare March 4, 2025 11:57
Signed-off-by: PengFei Li <[email protected]>
@banmoy banmoy changed the title [Enhancement][WIP] Move delta write open out of brpc worker [Enhancement] Move delta write open out of brpc worker Mar 5, 2025
@@ -205,6 +205,7 @@ class StarRocksMetrics {
METRIC_DEFINE_INT_COUNTER(load_bytes_total, MetricUnit::BYTES);

// Metrics for LoadChannel
METRICS_DEFINE_THREAD_POOL(load_channel_mgr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load_channel_pool? a little confused a thread pool is named as *_mgr

@@ -71,6 +71,14 @@ namespace lake {
class TabletManager;
}

struct LoadChannelOpenRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also the same as the RPC's controller, maybe name it LoadChannelOpenController or LoadChannelOpenContext

return;
}
auto task = [=]() { this->_open(open_request); };
Status status = _async_rpc_pool->submit_func(std::move(task));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the task can be submit successful but not executed at all when doing pool shutdown, hence lead the done never get called and resulting the brpc server exit hanging.

Signed-off-by: PengFei Li <[email protected]>
Copy link

github-actions bot commented Mar 5, 2025

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

Copy link

github-actions bot commented Mar 5, 2025

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants