Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SPG-493: fix scheduled backups #634

Merged
merged 12 commits into from
Jan 17, 2024
Merged

K8SPG-493: fix scheduled backups #634

merged 12 commits into from
Jan 17, 2024

Conversation

pooknull
Copy link
Contributor

@pooknull pooknull commented Jan 9, 2024

K8SPG-493 Powered by Pull Request Badge

https://jira.percona.com/browse/K8SPG-493

DESCRIPTION

Problem:
Operator doesn't create more than one scheduled backup.

Cause:
After implementing K8SPG-410 we started to create pg-backup for each backup job. The K8SPG-432 was about assigning owner references to the pg-backup resources. This resulted in changing owner references for cronjob jobs. If the owner references of a job created by cronjob belong to a different resource, cronjob can't tell if the job has finished. In this way, only one scheduled backup job can be created.

Solution:
We should move to our own cronjob implementation inside the operator, which will create pg-backups on schedule. The operator shouldn't set schedules for the crunchy's PostgresCluster.

There were also problems with existing multiple pg-backups trying to start backing up in parallel. To prevent this, the pgv2.percona.com/backup-in-progress annotation has been introduced. It is set on the PerconaPGCluster resource at the start of the backup and removed after the backup job has finished or failed. This annotation is used to check if there is a backup in progress.

Crunchy operator deletes manual backup jobs after creating a new one. To prevent this and to keep completed/failed jobs as in the cronjob implementation, it was decided to delete the postgres-operator.crunchydata.com/cluster and postgres-operator.crunchydata.com/pgbackrest labels from the completed or failed job. This allows keeping these jobs without touching crunchy code. After the backup job is completed, before deleting the job labels, the operator should remove the postgres-operator.crunchydata.com/pgbackrest-backup annotation from crunchy's PostgresCluster resource to stop the backup job management that deletes the completed jobs when new ones are created.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are the manifests (crd/bundle) regenerated if needed?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PG version?
  • Does the change support oldest and newest supported Kubernetes version?

percona/controller/pgbackup/controller.go Outdated Show resolved Hide resolved
percona/controller/pgbackup/controller.go Outdated Show resolved Hide resolved
percona/controller/pgbackup/controller.go Show resolved Hide resolved
percona/controller/pgbackup/controller.go Outdated Show resolved Hide resolved
percona/controller/pgbackup/controller.go Outdated Show resolved Hide resolved
percona/controller/pgcluster/cron.go Outdated Show resolved Hide resolved
pkg/apis/pgv2.percona.com/v2/perconapgcluster_types.go Outdated Show resolved Hide resolved
percona/controller/pgcluster/schedule.go Outdated Show resolved Hide resolved
@pooknull pooknull marked this pull request as ready for review January 16, 2024 01:41
@pooknull pooknull requested a review from egegunes January 16, 2024 01:42
egegunes
egegunes previously approved these changes Jan 16, 2024
inelpandzic
inelpandzic previously approved these changes Jan 16, 2024
@pooknull pooknull dismissed stale reviews from inelpandzic and egegunes via 53ef125 January 16, 2024 08:02
egegunes
egegunes previously approved these changes Jan 16, 2024
tplavcic
tplavcic previously approved these changes Jan 16, 2024
hors
hors previously approved these changes Jan 16, 2024
@pooknull pooknull dismissed stale reviews from hors, tplavcic, and egegunes via b47a09b January 16, 2024 11:41
@JNKPercona
Copy link
Collaborator

Test name Status
custom-extensions passed
demand-backup passed
init-deploy passed
monitoring passed
operator-self-healing passed
scaling passed
scheduled-backup passed
self-healing passed
start-from-backup passed
telemetry-transfer passed
upgrade-minor passed
users passed
We run 12 out of 12

commit: b31fb3a
image: perconalab/percona-postgresql-operator:PR-634-b31fb3a17

@hors hors self-requested a review January 16, 2024 20:29
@@ -26,9 +25,6 @@ status:
updatedReplicas: 3
observedGeneration: 3
pgbackrest:
manualBackup:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it removed?

@hors hors merged commit 8c8c59f into main Jan 17, 2024
14 checks passed
@hors hors deleted the dev/K8SPG-493 branch January 17, 2024 07:00
tplavcic pushed a commit that referenced this pull request Jan 17, 2024
* K8SPG-493: fix scheduled backups

https://jira.percona.com/browse/K8SPG-493

* fix tests

* remove deletion of backup job after creating a new manual backup and stop trying to run backups in parallel

* fix tests

* add `AnnotationBackupInProgress` and simplify code

* `golangci-lint`

* cron fix

* there should be at least 4 pg-backups

* fix `scheduled-backup`

* wrap errors and add `RetryOnConflict`

* fix scary error

---------

Co-authored-by: Viacheslav Sarzhan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants