Increase speed of incremental backup by using counting bloom filter #5149

andrii0lomakin · 2015-10-19T06:54:47Z

In current implementation of incremental backup we iterate over pages to find which of them were changed, but for huge databases such process may take several hours. It is not really a problem for many cases because we do not block database on writes but sometimes backup should be done in very stiff terms. To avoid given problem we are going to introduce counting bloom filter. Just to clarify.
By counting bloom filter I do not mean classic implementation of given data structure which can not be expanded in case of it reaches it's limit of capacity of registered items, but I mean family of modern algorithms with similar behaviour which allows to dynamically expand capacity of "bloom filter".

In nutshell algorithm is following:

For each file we get amount of pages it contains.
We do not load any page instead we merely filter indexes of pages through "bloom filter" which is matter of few binary operations for iteration.
Fetch changed pages and backup them.
During small "freeze" phase when we wait till ongoing atomic operations will be completed to copy database journal with changes happens to the data during backup phase we clear "bloom filter".

Actually that is almost speed limit for incremental backup because in 95% of cases we will load pages which are changed and that is action which we have to do any way.

Also as drawback part of database journal which have to be "cut" and backed up will be really small because of small period of time is needed to perform backup.

finid · 2015-10-19T07:11:58Z

Nice! Is this incremental backup feature going to be implemented in the automatic backup plugin, too?

andrii0lomakin · 2015-10-19T07:14:55Z

@finid I am responsible for storage engine but I bet that if you will create issue it will be implemented :-).

finid · 2015-10-19T07:24:38Z

Cool, I will, then!

lvca · 2015-10-19T15:46:06Z

I confirm it will be supported by automatic backup component. To track it I've created this issue: #5155

lvca · 2015-11-05T05:55:37Z

This means at every update/delete/insert we should update and save the bloom filter somewhere. What's this cost? Also, this could be seen as an improvement. I think we could stay with current solution, otherwise 2.2 will never reach the RC!

andrii0lomakin · 2015-11-09T12:49:07Z

This means at every update/delete/insert we should update and save the bloom filter somewhere. >What's this cost?

@lvca That is not true we do not need to save bloom filter anywhere.

Also cost of update of bloom filter is no more than addition time in hashmap .
So I do not see any hidden costs which I did not describe.

lvca · 2015-11-12T05:53:47Z

So do want to keep the BF in RAM only?

andrii0lomakin · 2015-11-20T09:27:11Z

Sure we do not need them for long term.

andrii0lomakin · 2017-08-27T06:41:54Z

removed from backlog in favor of #7686

finid mentioned this issue Oct 19, 2015

Retool the automatic backup plugin to perform incremental backups #5150

Closed

luigidellaquila assigned andrii0lomakin Nov 6, 2015

lvca added the enhancement label Jan 12, 2016

andrii0lomakin modified the milestone: 3.2 Apr 8, 2016

andrii0lomakin added storage team and removed storage team labels Apr 12, 2016

andrii0lomakin added the sbacklog label Aug 26, 2017

andrii0lomakin removed the sbacklog label Aug 27, 2017

andrii0lomakin mentioned this issue Aug 27, 2017

Hash only indexes #7686

Closed

andrii0lomakin removed the storage team label Sep 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase speed of incremental backup by using counting bloom filter #5149

Increase speed of incremental backup by using counting bloom filter #5149

andrii0lomakin commented Oct 19, 2015

finid commented Oct 19, 2015

andrii0lomakin commented Oct 19, 2015

finid commented Oct 19, 2015

lvca commented Oct 19, 2015

lvca commented Nov 5, 2015

andrii0lomakin commented Nov 9, 2015

lvca commented Nov 12, 2015

andrii0lomakin commented Nov 20, 2015

andrii0lomakin commented Aug 27, 2017

Increase speed of incremental backup by using counting bloom filter #5149

Increase speed of incremental backup by using counting bloom filter #5149

Comments

andrii0lomakin commented Oct 19, 2015

finid commented Oct 19, 2015

andrii0lomakin commented Oct 19, 2015

finid commented Oct 19, 2015

lvca commented Oct 19, 2015

lvca commented Nov 5, 2015

andrii0lomakin commented Nov 9, 2015

lvca commented Nov 12, 2015

andrii0lomakin commented Nov 20, 2015

andrii0lomakin commented Aug 27, 2017