-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase speed of incremental backup by using counting bloom filter #5149
Comments
Nice! Is this incremental backup feature going to be implemented in the automatic backup plugin, too? |
@finid I am responsible for storage engine but I bet that if you will create issue it will be implemented :-). |
Cool, I will, then! |
I confirm it will be supported by automatic backup component. To track it I've created this issue: #5155 |
This means at every update/delete/insert we should update and save the bloom filter somewhere. What's this cost? Also, this could be seen as an improvement. I think we could stay with current solution, otherwise 2.2 will never reach the RC! |
@lvca That is not true we do not need to save bloom filter anywhere. Also cost of update of bloom filter is no more than addition time in hashmap . |
So do want to keep the BF in RAM only? |
Sure we do not need them for long term. |
removed from backlog in favor of #7686 |
In current implementation of incremental backup we iterate over pages to find which of them were changed, but for huge databases such process may take several hours. It is not really a problem for many cases because we do not block database on writes but sometimes backup should be done in very stiff terms. To avoid given problem we are going to introduce counting bloom filter. Just to clarify.
By counting bloom filter I do not mean classic implementation of given data structure which can not be expanded in case of it reaches it's limit of capacity of registered items, but I mean family of modern algorithms with similar behaviour which allows to dynamically expand capacity of "bloom filter".
In nutshell algorithm is following:
Actually that is almost speed limit for incremental backup because in 95% of cases we will load pages which are changed and that is action which we have to do any way.
Also as drawback part of database journal which have to be "cut" and backed up will be really small because of small period of time is needed to perform backup.
The text was updated successfully, but these errors were encountered: