Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plumb unified scheduler for BP minimally #4533

Merged

Conversation

ryoqun
Copy link
Collaborator

@ryoqun ryoqun commented Jan 19, 2025

Finally (yet minimally still), unified scheduler enters into the banking stage land.

Many essential functionalities for production use are still missing, yet it now can handle the most normal case of transaction inflow from the sigverify stage to work as a banking stage.

extracted from #3946

@ryoqun ryoqun requested a review from apfitzge January 19, 2025 04:42
@ryoqun ryoqun force-pushed the unified-scheduler-minimal-bp-plumbing branch 5 times, most recently from 1a637e6 to a5190b4 Compare January 19, 2025 11:38
@ryoqun ryoqun force-pushed the unified-scheduler-minimal-bp-plumbing branch 3 times, most recently from e6e6119 to cf20e54 Compare January 22, 2025 14:39
@ryoqun ryoqun requested a review from apfitzge January 22, 2025 15:10
@ryoqun
Copy link
Collaborator Author

ryoqun commented Jan 22, 2025

@apfitzge thanks for the initial code-review.

dunno how much i read into this pr. but i largely managed to simplify the code at
71cb2ef ... after marking this pr as ready for code-review... ;)

also added docs at cf20e54

hopefully, your 2nd code-review should be easier with these changes....

Comment on lines +19 to +21
//! 1. Translate the raw packet bytes into some structs
//! 2. Do various sanitization on them
//! 3. Calculate priorities
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not at this pr. ;)

//! block production) as much as possible.
//!
//! Lastly, what the callback closure in this module does is roughly as follows:
//! 1. Translate the raw packet bytes into some structs
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support of the recently landed TransactionView is out of scope for this pr as well.

@@ -185,3 +203,97 @@ fn test_scheduler_waited_by_drop_bank_service() {
// the scheduler used by the pruned_bank have been returned now.
assert_eq!(pool_raw.pooled_scheduler_count(), 1);
}

#[test]
fn test_scheduler_producing_blocks() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is the only code which calls some newly added fns. so, there's some #[allow(dead_code)] here and there across this pr.

@ryoqun ryoqun force-pushed the unified-scheduler-minimal-bp-plumbing branch from a3e78c2 to 46ac6ab Compare January 31, 2025 13:29
) {
let mut root_bank_cache = RootBankCache::new(bank_forks.clone());
let unified_receiver = channels.unified_receiver().clone();
let decision_maker = DecisionMaker::new(cluster_info.id(), poh_recorder.clone());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rugged your PR with #4724.

Decision maker caches the decision now internally for a short period (5ms) to avoid taking poh locks all the time when worker threads need them most.

I think you should be able to just make this mut and it'll work.

let index = task_id_base + packet_index;

let task = helper.create_new_task(transaction, index);
helper.send_new_task(task);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to make sure I understand the flow of transactions and their lifetime, since it is different than for replay where txs should be 1:1 with a bank.

When we receive new packet batches from sigverify this handler is called; it pops, deserializes, etc for the tasks. They get sent over a channel here.

  • These are going to the Scheduler.
  • That scheduler does not yet have a SchedulingContext.
  • The scheduler will be given a SchedulingContext.

Does scheduler start scheduling unconflicted work immediately or wait for a context to be given?
Scheduled/executed items may be lost in slot transitions but unscheduled txs will be retained for processing in the next slot?
What's the plan for limiting the number of tasks outstanding in banking stage scheduler?

Copy link
Collaborator Author

@ryoqun ryoqun Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice question! Maybe, I'll accompany next pr with a doccomment of transaction lifetime with a fancy chart... lol

Note that the transaction lifetime described blow doesn't match to the following answers as of this pr.

That said..., this is the planned impl:

it is different than for replay where txs should be 1:1 with a bank.

yeah, 1:1 doens't hold anymore for the banking stage. fyi, Task was designed with this in mind from start, so that it can be consumed by any recent banks.

Does scheduler start scheduling unconflicted work immediately or wait for a context to be given?

yes, by locking transaction's addresses internally. then, if higher paying conflicting tx arrives, the scheduler will relocks the addresses with the new one. As soon as it gets a context, it bursts to send all of successfully locked highest-paying-at-the-moment tasks back to the handler threads, literally just with channel.send()s for lowest latency possible.

Scheduled/executed items may be lost in slot transitions but unscheduled txs will be retained for processing in the next slot?

unified scheduler won't lose any items in slot transitions, considering these could be most paying transactions. in those times, unified scheduler behaves just like before context is given. so, it continuously reorders its per-address priority queue (UsageQueue) until the very moment of next context.

the trick for this is that SchedulingStateMachine can trivially be carried over without emptying tasks to subsequent contexts. this behavior isn't like the currently-existing block verification mode. so, you might need to wrap a head a bit to grok the next pr...

What's the plan for limiting the number of tasks outstanding in banking stage scheduler?

So, assuming SchedulingStateMachine is now stateful accorss banks, it just maintains the number of outstanding tasks and pops off any excess tasks.

Lastly, as of this pr, transactions are buffered at banking_packet_receiver until context is given, and SchedulingStateMachine will be emptied at session ending. so, it's quite sub-optimal. ;)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the trick for this is that SchedulingStateMachine can trivially be carried over without emptying tasks to subsequent contexts. this behavior isn't like the currently-existing block verification mode. so, you might need to wrap a head a bit to grok the next pr...

Yeah I think I understand and it makes sense after reading the comments on the SchedulingContext, and mainly wanted to clarify this point.
AFAICT we'd still lose transactions in the forking events I described, since they'd be on banks we abandoned and (at least for now) not re-inserted (we probably should do this). <-- out of scope for this PR and probably even the scheduler behavior imo

Lastly, as of this pr, transactions are buffered at banking_packet_receiver until context is given, and SchedulingStateMachine will be emptied at session ending. so, it's quite sub-optimal. ;)

Thanks for detailed explanations. I think the general path is correct so far, but obviously this PR is minimized and not ready for prod as you stated. Just trying to layout here (for myself) what we still need to happen:

  1. Need to ingest packets earlier before being given context, otherwise its at risk of OOM. if too early just drop them.
  2. Is there any concept of retryable transactions in unified-scheduler? I think natively these cannot occur, but if jito is running they will steal locks your scheduler expects AND reserve CUs the scheduler thinks you might have.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT we'd still lose transactions in the forking events I described, since they'd be on banks we abandoned and (at least for now) not re-inserted (we probably should do this). <-- out of scope for this PR and probably even the scheduler behavior imo

I agree. forking events isn't supported for unified scheduler.

  1. Need to ingest packets earlier before being given context, otherwise its at risk of OOM. if too early just drop them.

here it is.. :) #4949

  1. Is there any concept of retryable transactions in unified-scheduler? I think natively these cannot occur, but if jito is running they will steal locks your scheduler expects AND reserve CUs the scheduler thinks you might have.

nice question. this will be addressed yet another upcoming pr...

@ryoqun ryoqun force-pushed the unified-scheduler-minimal-bp-plumbing branch from 46ac6ab to 31500c7 Compare February 5, 2025 04:08
@ryoqun ryoqun force-pushed the unified-scheduler-minimal-bp-plumbing branch from 7889256 to 9c48731 Compare February 5, 2025 06:23
@diman-io
Copy link

diman-io commented Feb 5, 2025

@ryoqun Hi, sorry for pinging you here, but it looks like the other channels aren’t working. Just wanted to make sure you saw #4211

@ryoqun ryoqun requested a review from apfitzge February 5, 2025 14:25
@ryoqun ryoqun merged commit 5a7852a into anza-xyz:master Feb 6, 2025
77 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants