You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current implementation blocks the main processing flow during connection refresh operations. When a reconnect is needed, poll_recover blocks until the refresh operation completes, preventing the processing of any requests during this time. This blocking behavior:
Impacts system responsiveness
Inefficiently handles temporary network issues
Forces all requests to wait, even those targeting different addresses
Creates unnecessary latency for requests that could be routed to available nodes
By moving reconnect operations to background tasks, implementing exponential backoff, and enhancing request routing, we can:
Improve system throughput by eliminating blocking operations
Handle network issues more gracefully
Allow requests to wait appropriately for refreshing connections
Maintain existing ordering guarantees and consistency
Better utilize available connections during refresh operations
Current impl.:
Core Structure:
Implements Sink trait with methods: start_send, poll_ready, and poll_flush
Connected to a socket output stream via a forward implementation.
poll_ready:
Always returns Poll::Ready
forward Implementation:
Calls start_send to send requests
Calls poll_flush to process requests
poll_flush Behavior:
Returns Poll::Ready when:
No requests in pending_requests queue
No requests in inflight_requests queue
Returns Poll::Pending when:
poll_complete returns Pending
Request Processing Flow:
poll_flush is called
poll_recover is called first:
Blocks on the recovery future if one exists
Ensures all recovery operations complete before proceeding
After recovery completes, poll_complete is called:
Moves all pending_requests to inflight_requests queue
Calls poll_next on inflight_requests to process them
Returns Pending if all current inflight requests are pending
Returns Ready when all requests are processed
Reordering Prevention:
All requests in inflight_requests see the same connection_map
Ensures consistent behavior for batches of requests
Key Points:
The system uses a two-stage queue (pending_requests and inflight_requests)
poll_recover acts as a synchronization point, ensuring all requests see a consistent state
Batches of requests are processed together, maintaining order and consistency
poll_next is called within poll_complete to process inflight requests
This structure ensures that:
Requests are processed in batches
Each batch sees a consistent view of the connection state
Recovery operations complete before new requests are processed
The processing of inflight requests is handled within the poll_complete method
PR 1 - Move connection refresh to the background
Core Changes:
Move refresh_connection to run as a tokio task in the background, eliminating blocking in poll_recover.
New State Management:
Introduce two new sets in ConnectionContainer:
'pending_refresh': Tracks addresses that need to be refreshed
'completed_refresh': Tracks addresses that have completed refresh
Refresh Task Behavior:
When refresh is needed:
Add address to 'pending_refresh' set
Start refresh logic in background tokio task
When refresh completes:
Add address to 'completed_refresh' set
Modified poll_flush Behavior:
Before calling poll_complete:
Remove connections for all addresses in 'pending_refresh' set
Add connections for all addresses in 'completed_refresh' set
Clear both sets after processing
Continue with normal poll_complete flow
Consistency Maintenance:
All connection_map modifications happen at poll_flush start
Each batch of requests sees consistent connection state
No blocking on refresh operations
Key Benefits:
Unblocks poll_recover, improving system responsiveness
Maintains consistency for request batches
Enables concurrent refresh operations
Preserves existing batch processing model
Maintains request ordering guarantees
PR 2 - Enhanced get_connection with State-Based Notifier:
Core Changes:
Modify get_connection to intelligently wait on refresh operations while maintaining order guarantees.
New State Management:
Add refresh state tracking per address
Introduce notifier mechanism that's triggered by poll_flush
New state enum for refresh operations:
Refreshing
RefreshTimeout (when refresh operation exceeds time threshold)
Modified poll_flush Behavior:
At the beginning, before connection map updates:
Check duration of ongoing refresh operations
For operations running too long:
Update state to RefreshTimeout
Trigger associated notifier
Allow awaiting requests to proceed
New Connection Retrieval Flow:
When connection not found for a route:
Check if address has active refresh operation
If refresh is active:
Await on address's notifier
When notified:
If connection exists in map - use it
If RefreshTimeout state - proceed to random node
If no refresh active:
Proceed with current random node logic
Key Benefits:
Maintains current behavior of falling back to random node
Replaces blocking recovery with controlled waiting period
Prevents request reordering by:
Using centralized state management in poll_flush
Ensuring consistent state views for request batches
Provides natural transition from current blocking behavior to more efficient approach
Comparison to Current Behavior:
Current: Blocks entirely during recover task
Proposed: Waits briefly for refresh completion before random routing
Both ensure consistent handling of request batches
Both eventually route to random node when connection unavailable
Description
Motivation:
The current implementation blocks the main processing flow during connection refresh operations. When a reconnect is needed, poll_recover blocks until the refresh operation completes, preventing the processing of any requests during this time. This blocking behavior:
By moving reconnect operations to background tasks, implementing exponential backoff, and enhancing request routing, we can:
Current impl.:
Core Structure:
Implements Sink trait with methods: start_send, poll_ready, and poll_flush
Connected to a socket output stream via a forward implementation.
poll_ready:
Always returns Poll::Ready
forward Implementation:
Calls start_send to send requests
Calls poll_flush to process requests
poll_flush Behavior:
Returns Poll::Ready when:
Returns Poll::Pending when:
Request Processing Flow:
Reordering Prevention:
Key Points:
This structure ensures that:
PR 1 - Move connection refresh to the background
Core Changes:
Move refresh_connection to run as a tokio task in the background, eliminating blocking in poll_recover.
New State Management:
Introduce two new sets in ConnectionContainer:
Refresh Task Behavior:
Modified poll_flush Behavior:
Consistency Maintenance:
Key Benefits:
PR 2 - Enhanced get_connection with State-Based Notifier:
Core Changes:
Modify get_connection to intelligently wait on refresh operations while maintaining order guarantees.
New State Management:
Modified poll_flush Behavior:
New Connection Retrieval Flow:
Key Benefits:
Comparison to Current Behavior:
PR 3 - Exponential Backoff for Reconnection: #473
Core Changes:
Enhance refresh task with exponential backoff retry mechanism for connection attempts.
Refresh Task Behavior:
Key Benefits:
Checklist
The text was updated successfully, but these errors were encountered: