Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple issues blocking our use case #110

Open
fazo96 opened this issue May 16, 2022 · 2 comments
Open

Multiple issues blocking our use case #110

fazo96 opened this issue May 16, 2022 · 2 comments
Labels
bug Something isn't working enhancement New feature or request question Further information is requested

Comments

@fazo96
Copy link

fazo96 commented May 16, 2022

Hello, thanks for helping us successfully create a DB dump with replibyte by fixing #94

Unfortunately we have realised there are some limitations in replibyte preventing it from being used to replace our existing combination of application code and custom scripts we use to automatically copy production data to our staging environment while editing it a bit to change values that should be changed such as payment information and passwords.

I thought best to create a single issue for now to centralize the discussion but if preferred by you I can create an issue for each separate problem we have with the tool

Here are the problems we have encounted:

1. Slow dump/restore

This is a minor issue, but we noticed the replibyte dump and restore are much slower than PostgreSQL's pg_dump and pg_restore using the -c flag for the compressed format. It seems that the restore works by doing INSERT instructions, it would probably be a lot faster by using COPY.

2. INSERT failures during restore operation

while restoring a dump, replibyte logs many errors on INSERT instructions where it seems the values don't match the columns being inserted, causing the INSERT to fail.

I will provide a dump that can be used to replicate the issue, if I manage to remove sensitive company data from it.

For now here is a sample of the errors:

ERROR:  invalid input syntax for type bigint: "London, Greater London, England, United Kingdom"
LINE 14: ...0, NULL, 0.1275, NULL, 'London', 'gb', 'England', 'London, G...
                                                              ^
ERROR:  invalid input syntax for type bigint: "London, Greater London, England, United Kingdom"
LINE 14: ...0, NULL, 0.1275, NULL, 'London', 'gb', 'England', 'London, G...
                                                              ^
ERROR:  invalid input syntax for type bigint: "Atlanta, Georgia, United States"
LINE 6: ... NULL, 84.3902, NULL, 'Atlanta', 'us', 'Georgia', 'Atlanta, ...
                                                             ^
ERROR:  invalid input syntax for type bigint: "London, Greater London, England, United Kingdom"
LINE 14: ...2, NULL, 0.1275, NULL, 'London', 'gb', 'England', 'London, G...

3. Downtime during the restore operation

At the start of a restore operation, replibyte logs a lot of lines about drop being cascaded. It seems the tool works by dropping the tables in the destination DB. This causes the application to not be usable while the tool is running, also due to a combination with problem 1.

Our current custom solution will restore and transform the data to a temporary DB, then swap it with the real one by terminating connections and renaming. This isn't a good solution and has multiple issues, at least for our custom scripts.

It would be interesting to find a way to atomically apply the replibyte restore at the end, instead of running it directly on the destination database.

4. Missing general use transformers

We have many numeric columns (float, bigint, etc) that contain payment and financial information. It would be ideal to be able to replace all of them with a fixed value using a transformer, but it seems that built in transformers only work with string columns.

We would also benefit from being able to replace a string column with a fixed value.

In general, it would be useful to have the string transforms such as randomize and redacted work on numeric values as well.

We would like your help in finding solutions to these issues so we can replace our custom solution for seeding the staging environment with replibyte.

Let me know if there is anything I can do to help!

@evoxmusic evoxmusic added enhancement New feature or request question Further information is requested bug Something isn't working labels May 16, 2022
@evoxmusic
Copy link
Contributor

evoxmusic commented May 17, 2022

Hi @fazo96 , I reply inline

1. Slow dump/restore

This is a minor issue, but we noticed the replibyte dump and restore are much slower than PostgreSQL's pg_dump and pg_restore using the -c flag for the compressed format. It seems that the restore works by doing INSERT instructions, it would probably be a lot faster by using COPY.

Indeed, it's an improvement that we need to do. We have an issue opened here. INSERT INTO ... is super slow compared to COPY. I will try to take a look this weekend.

2. INSERT failures during restore operation

I will provide a dump that can be used to replicate the issue if I manage to remove sensitive company data from it.

Can you:

  1. open a new issue
  2. provide a dump leading to this error?

🙏🏽

3. Downtime during the restore operation

It would be interesting to find a way to atomically apply the replibyte restore at the end, instead of running it directly on the destination database.

It makes sense to me. We can provide such a feature that will ease the use of Replibyte with a very large database. WDYT @benny-n @fabriceclementz ?

4. Missing general use transformers

Can you open a feature request for this one?


Thank you @fazo96 🙏🏽

@fabriceclementz
Copy link
Contributor

3. Downtime during the restore operation

It would be interesting to find a way to atomically apply the replibyte restore at the end, instead of running it directly on the destination database.

It makes sense to me. We can provide such a feature that will ease the use of Replibyte with a very large database. WDYT @benny-n @fabriceclementz ?

Hi,
Makes sense to me too, maybe we could restore in a table names {original_table_name}_replibyte and rename the table at the end. Maybe I haven't think of some edge cases.
We should look at the Percona Toolkit too which provides some tools to help us https://www.percona.com/doc/percona-toolkit/LATEST/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants