-
Notifications
You must be signed in to change notification settings - Fork 147
How OSC works
Above all, OSC works outside of replication, all the statements are issued under sql_log_bin=0. It can be run on either replica or master at your demand. This means you can issue a schema change on one replica first for prove of concept and then roll out to master when you're sure about the change, as long as the schema change meets the requirement described in MySQL manual
OSC algorithms can be divided into following phases:
In this phase, we will do some sanity check in order to make sure the schema will meet basic requirements. Then we will create an empty table(shadow table) with the desired schema, and another change capture table to record the changes happen between the async dump & load process. The change capture table is referred to as the deltas table. It has has all columns as original table plus two additional columns: an integer auto-increment column to track order of changes AND an integer column to track dml type (insert, update or delete).
- An insert trigger is created on the original table to capture all column values of row being inserted in deltas.
- A delete trigger is created on original table to capture only the PK columns of row being deleted in deltas.
- An update trigger is created on the original table so that if the update does not change the PK columns then it captures new values of all columns in deltas. If the update changes the PK columns, then the update trigger captures it as a delete followed by an insert. A possible optimization in the future is to log only changed columns.
A consistent snapshot will be open to give us a stale view of the data in the existing table. We then use “SELECT INTO OUTFILE” to dump data from existing table into on disk files by chunks. Notice that INSERT INTO ... SELECT FROM is not an option here, because it will hold gap lock when scanning rows and block concurrent writes. Also we avoid fetching from database into Python and insert back to avoid SQL regeneration and dealing with encoding problems.
We then load all those on disk files back to database filling the shadow table using LOAD DATA INFILE. Each file will be removed from disk after being load successfully. So the disk space required for each schema change is approximately the same as the original table. It may require more if the existing table is well compressed.
During dump & load phase, there are probably lots of DML happening against the existing table. We now need to use those change log recorded by the triggers we've created at the beginning to bring the shadow table forward to a more fresh state. This phase can repeat several times because during each round of replay there're more DMLs coming into MySQL for us to catch up. Again we will not fetch actual data from delta table, but only delta IDs and delta type. With both information available we can form a replay query depending on the change type.
After moving shadow table forward to a state which is close to existing table's current state, we now can compare the data consistency between two tables. We open a consistent snapshot at the beginning, and then do a quick replay to fill the gap between shadow and existing table. After this, in theory those two tables should be exactly the same inside current transaction. We then use crc32 to calculate checksums for each chunk from existing table. When this is finished, we will close the transaction to keep the history length small. This is possible because the shadow will not change without further replay. It's exactly the same no matter it's inside and outside of the transaction. We then generate the checksums for shadow table. In the end, we compare the checksum, and bail out immediately if there's a mismatch.
Now everything is ready, we can proceed to bring the shadow table online. The whole cut over process looks like below:
- Another round of replay is required to move shadow table forward, and as close to existing table as possible.
- Lock the existing table to block further changes from happening
- Final round of replay to fill the small gap that may exist.
- Swap the table name to bring shadow table online.
Notice that we are not able to do an atomic rename here because of the limitation in MySQL. We have to use two separate ALTER TABLE RENAME statements. Since alter table causes an implicit commit in innodb, innodb locks get released after the first alter table. So any transaction that sneaks in after the first alter table and before the second alter table gets a 'table not found' error. The second alter table is expected to be very fast though because shadow table is not visible to other transactions and so there is no need to wait.
Drop all the triggers and old table. The new schema is now ready to use!
Basic
Advanced
For Bug reporters & Contributors