-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K8SPG-493: fix scheduled backups #634
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…stop trying to run backups in parallel
egegunes
reviewed
Jan 15, 2024
egegunes
reviewed
Jan 16, 2024
egegunes
previously approved these changes
Jan 16, 2024
inelpandzic
previously approved these changes
Jan 16, 2024
egegunes
previously approved these changes
Jan 16, 2024
tplavcic
previously approved these changes
Jan 16, 2024
hors
previously approved these changes
Jan 16, 2024
commit: b31fb3a |
hors
reviewed
Jan 16, 2024
@@ -26,9 +25,6 @@ status: | |||
updatedReplicas: 3 | |||
observedGeneration: 3 | |||
pgbackrest: | |||
manualBackup: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was it removed?
egegunes
approved these changes
Jan 17, 2024
inelpandzic
approved these changes
Jan 17, 2024
tplavcic
pushed a commit
that referenced
this pull request
Jan 17, 2024
* K8SPG-493: fix scheduled backups https://jira.percona.com/browse/K8SPG-493 * fix tests * remove deletion of backup job after creating a new manual backup and stop trying to run backups in parallel * fix tests * add `AnnotationBackupInProgress` and simplify code * `golangci-lint` * cron fix * there should be at least 4 pg-backups * fix `scheduled-backup` * wrap errors and add `RetryOnConflict` * fix scary error --------- Co-authored-by: Viacheslav Sarzhan <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://jira.percona.com/browse/K8SPG-493
DESCRIPTION
Problem:
Operator doesn't create more than one scheduled backup.
Cause:
After implementing K8SPG-410 we started to create pg-backup for each backup job. The K8SPG-432 was about assigning owner references to the pg-backup resources. This resulted in changing owner references for cronjob jobs. If the owner references of a job created by cronjob belong to a different resource, cronjob can't tell if the job has finished. In this way, only one scheduled backup job can be created.
Solution:
We should move to our own cronjob implementation inside the operator, which will create pg-backups on schedule. The operator shouldn't set schedules for the crunchy's
PostgresCluster
.There were also problems with existing multiple pg-backups trying to start backing up in parallel. To prevent this, the
pgv2.percona.com/backup-in-progress
annotation has been introduced. It is set on thePerconaPGCluster
resource at the start of the backup and removed after the backup job has finished or failed. This annotation is used to check if there is a backup in progress.Crunchy operator deletes manual backup jobs after creating a new one. To prevent this and to keep completed/failed jobs as in the cronjob implementation, it was decided to delete the
postgres-operator.crunchydata.com/cluster
andpostgres-operator.crunchydata.com/pgbackrest
labels from the completed or failed job. This allows keeping these jobs without touching crunchy code. After the backup job is completed, before deleting the job labels, the operator should remove thepostgres-operator.crunchydata.com/pgbackrest-backup
annotation from crunchy'sPostgresCluster
resource to stop the backup job management that deletes the completed jobs when new ones are created.CHECKLIST
Jira
Needs Doc
) and QA (Needs QA
)?Tests
Config/Logging/Testability