-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(networking): add bootstrap cache for peer discovery #2459
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
devskim found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
How is security addressed in this model? |
This is not a security related issue per se' It's a mechanism to reconnect to the network easily. It is cross any client/node on the machine. The security aspect would be who you connect to. If somebody broke into your machine and replaced the file with all bad nodes you could end up on the wrong network, but in that case I imagine you would quickly know. In terms of connecting to a wrong network clients would not find their data and does woudl likely not be paid, but it's unlikely to cause you to reveal any secret information unless you were using a badly written app, but that app woudl still be bad on a good nework. |
Add persistent bootstrap cache to maintain a list of previously known peers, improving network bootstrapping efficiency and reducing cold-start times. Enhance the bootstrap cache implementation with robust corruption detection and recovery mechanisms. This change ensures system resilience when the cache file becomes corrupted or invalid. Key changes: * Add explicit cache corruption detection and error reporting * Implement cache rebuilding from in-memory peers or endpoints * Use atomic file operations to prevent corruption during writes * Improve error handling with specific error variants * Add comprehensive test suite for corruption scenarios The system now handles corruption by: 1. Detecting invalid/corrupted JSON data during cache reads 2. Attempting recovery using in-memory peers if available 3. Falling back to endpoint discovery if needed 4. Using atomic operations for safe cache updates Testing: * Add tests for various corruption scenarios * Add concurrent access tests * Add file operation tests * Verify endpoint fallback behavior - Add smarter JSON format detection by checking content structure - Improve error handling with specific InvalidResponse variant - Reduce unnecessary warnings by only logging invalid multiaddrs - Simplify parsing logic to handle both JSON and plain text formats - Add better error context for failed parsing attempts All tests passing, including JSON endpoint and plain text format tests. feat(bootstrap_cache): implement circuit breaker with exponential backoff - Add CircuitBreakerConfig with configurable parameters for failures and timeouts - Implement circuit breaker states (closed, open, half-open) with state transitions - Add exponential backoff for failed request retries - Update InitialPeerDiscovery to support custom circuit breaker configuration - Add comprehensive test suite with shorter timeouts for faster testing This change improves system resilience by preventing cascading failures and reducing load on failing endpoints through intelligent retry mechanisms.
…ork isolation * Refactor CacheStore::from_args to handle peer sources more consistently * Ensure test network mode is properly isolated from cache system * Fix default behavior to use URL endpoint when no peers provided * Add proper handling for local and first node modes * Prevent cache operations when in test network mode This change ensures that: - Test network peers are isolated from cache operations - Default behavior (no args) correctly uses URL endpoints - Local and first node modes return empty stores - Explicit peers take precedence over default behavior - Cache operations only occur in non-test network mode The changes make the peer source handling more predictable and maintain proper isolation between different network modes (test, local, default).
* Fix test_safe_peers_env to verify env var peer inclusion - Assert presence of env var peer in total peer set - Remove incorrect assertion of exact peer count * Fix test_network_contacts_fallback isolation - Enable test_network mode to prevent interference from cache/endpoints - Verify exact peer count from mock server * Improve from_args implementation - Add environment variable peer handling before other sources - Use empty cache path in test network mode - Prevent cache file operations in test network mode These changes ensure proper test isolation and correct handling of peers from different sources (env vars, args, cache, endpoints) across different modes (normal, test network, local).
Looks like worth an RFC...? |
Could this cache be used also by clients? |
Superseded by #2487 |
Bootstrap Cache Implementation
Summary
This PR introduces a bootstrap cache mechanism to improve network bootstrapping efficiency. The cache persists information about previously known peers, enabling faster network rejoining and reducing cold-start latency.
Motivation
When nodes join the network, they need to discover and connect to existing peers. Without a persistent cache, nodes must rediscover the network from scratch each time they restart, which is inefficient and time-consuming. This implementation addresses that limitation.
Technical Details
Impact
Testing
The implementation includes:
Future Considerations