-
Notifications
You must be signed in to change notification settings - Fork 6.4k
WAL Recovery Modes
Every application is unique and requires a certain consistency guarantee from RocksDB. Every committed record is RocksDB is persisted. The uncommitted records are recorded in write-ahead-log (WAL). When RocksDB is shutdown cleanly, all uncommitted data is committed before shutdown and hence consistency is always guaranteed. When RocksDB is killed or the machine is restarted, on restart RocksDB need to restore itself to a consistent state.
One of the important recovery operation is replay uncommitted records in WAL. The different WAL recovery modes define the behavior of WAL replay.
In this mode, the WAL replay ignores any error discovered at the tail of the log. The rational is that, on unclean shutdown there can be incomplete writes at the tail of the log. This is a heuristic mode, the system cannot differentiate between corruption at the tail of the log and incomplete write. Any other IO error, will be considered as data corruption.
This mode is acceptable for most application since this provides the tradeoff between starting RocksDB after an unclean shutdown and consistency.
In this mode, any IO error during WAL replay is considered as data corruption. This mode is ideal for application that cannot afford to loose even a single record and have other means of recovering uncommitted data.
In this mode, the WAL replay is stopped on the after encountering an IO error. The system is recovered to a point where it is consistent. This is ideal for systems with replicas. Data from another replica can be used to replay past the "point-in-time" where the system is recovered to.
In this mode, any IO error while reading the log is ignored. The system tries to recover as much data as possible. This is ideal for disaster recovery.
Contents
- RocksDB Wiki
- Overview
- RocksDB FAQ
- Terminology
- Requirements
- Contributors' Guide
- Release Methodology
- RocksDB Users and Use Cases
- RocksDB Public Communication and Information Channels
-
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
- Options
- MemTable
- Journal
- Cache
- Write Buffer Manager
- Compaction
- SST File Formats
- IO
- Compression
- Full File Checksum and Checksum Handoff
- Background Error Handling
- Huge Page TLB Support
- Tiered Storage (Experimental)
- Logging and Monitoring
- Known Issues
- Troubleshooting Guide
- Tests
- Tools / Utilities
-
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
- Extending RocksDB
- RocksJava
- Lua
- Performance
- Projects Being Developed
- Misc