Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: load balancing by implicit worker-chain locking of global account states #6

Open
rphmeier opened this issue Nov 7, 2023 · 0 comments

Comments

@rphmeier
Copy link
Contributor

rphmeier commented Nov 7, 2023

A follow-on from this excerpt in the discussion of https://github.com/thrumdev/sugondat/issues/4#issuecomment-1797062292

A rough writeup of the idea, although in its current state I can't see wanting to go with it

There is a possible set of approaches where each blob can actually land on any worker chain, but only a single one per relay-parent, and only if it has not been included on any other worker chain already...but I'm not sure exactly how this would look.

What this essentially asks for is:

  1. A global state of account balances and nonces
  2. A locking protocol which ensures any particular transaction is handled on only a single worker chain

Solutions of this type are possible, but with latency, because worker chains do not run completely independently but are all secured by the same relay chain and they can receive state proofs from the relay chain.

We can have a global state of accounts stored on the aggregator as well as a record of the last relay parent known for each worker. The aggregator updates the nonce of a sender whenever a worker includes a blob from the sender. The worker chains will use state proofs of the aggregator state to implicitly acquire unique locks on the account state.

Each transaction (sender, nonce) pair implies a random ordering of all worker chains such that at any relay chain block number there is a deterministic fn transaction_lock_description(RelayChainBlockNumber) -> { current: WorkerId, previous: WorkerId, prev_release: RelayChainBlockNumber } which describes the current conditions of the transaction lock:

  1. The current worker chain which may host the transaction
  2. The previous worker chain which could have posted the transaction
  3. The last relay parent at which the transaction could have been posted to the previous

Therefore, if tx.nonce == nonce_from_aggregator_state(sender) + 1 and next_possible_relay_parent(previous_worker) > transaction_lock_description(now).prev_release then the lock is owned by the current chain.

However, there is an edge case when the previous(previous) worker chain has had a block stuck in availability - such a block may be included with a very old relay parent and may include the transaction. So although the previous chain would have not acquired the lock by the above conditions, the current chain assumes that it has exclusive access and this results in a double spend when the old block from 2 hops before is finally included. This can be fixed by ensuring that the transaction_lock_description changes slowly, so that by the time of prev_release, the next_possible_relay_parent(previous(previous)) must have been released.

Lock handovers are very time consuming. Each worker chain has a view of the aggregator which is 12-18 seconds out of date, due to asynchronous backing. The aggregator in turn has a view of each worker chain which is 12-18 seconds out of date, leading to a total expected case latency of 24-36 seconds for a lock handover. This could be improved by having worker chains directly read each others' state rather than taking the aggregator's view of the state, but this harms horizontal scaling.

The random ordering for each (sender, nonce) pair should vary in ordering as well as a changeover time (but not duration) to ensure a balanced load across the system.

Balance withdrawals from the global state also add some complexity, in that the aggregator chain would need to acquire an exclusive lock on the account state. One approach is to add an additional (bit) flag to each account and execute the withdrawal only at the point where all worker chains would have seen the bit flag and refused to execute any transaction from the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant