Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement Cassandra backup and restore. (#418)
* First draft of icarus sidecar container usage. * Refactor auth logic to use a separate struct that holds all auth data. * Implement TLS support for icarus container. Pass the proper JMX credentials for icarus. * Create skeleton for cassandrabackup controller. * Fix auth logic. A fresh cluster couldn't init admin user. * fix unit tests * Create a separate icarus client. The generated had bugs and was inconvenient to work with. * Implement first draft of the backup controller. Tested and work with only with the file storage type. * Track backup progress. Shows as an integer value from 0-100 in status. Do not use float as it's not recommended by the controller-gen tool. Do a requeue until the backup is not finished to update status. * Create the skeleton for restore controller. * Regenerate manifestss * Move icarus backup related methods into a separate file. * Add restore methods for the icarus client. * Get the list of restores to later check if there's one exiting already. * Fix the check if the backup with the requested snapshot name exists already. Fix typo. * Implement restore logic. Tested with file storage type only. * Create a serviceaccount with necessary roles for cassandra pods. Needed for icarus to allow reading k8s secrets. Expose secret name arg option for icarus to be support storage providers other than file. * Add description to the backup CR fields. * Add backup duration option. * Add bandwidth option. * Add concurrentConnections option. * Add dc option. * Add entities, timeout and metadataDirective options. * Add the rest of the backup options. * Generate assets. * Add most of the fields for restore CRD. * Implement failed backup process restart if user changed the config and a failed backup exists in icarus. If the backup request is absent in icarus - tell the user to recrete the CR. * Implement failed restore process restart if user changed the config and a failed restore exists in icarus. If the restore request is absent in icarus - tell the user to recreate the CR. * Implement validating webhook for cassandrabackup. * Implement validating webhook for cassandrarestore. * Validate storage location in both controllers. * validate duration * Add more CRD fields validations. * Fix docs. * Move related backup search and failed backup reconcile logic into separate functions. * Move status reconcile into separate func. * Break up main func into smaller ones. * Split controller into smaller functions. * Move code around, rename vars and move icarus related funcs into separate file. * Move main restore logic into separate file. * Refactor restore logic. * Track cluster readiness in the CassandraCluster status. * Use CassandraCluster readiness status field in backup and restore controllers to block execution befor the cluster becomes ready. * Remove restorationStrategyType as only HARDLINKS cam be supported. IMPORT available only on Cassandra 4 and IN_PLACE can be used only on a node that's down. We support only alive clusters (at least for now). Remove singlePhase field as we don't plan to support (at least not yet) single phase restores. For that reason the restorationPhase is also removed. Only INIT can be supported if singlePhase is false. Remove the actualSnapshotTag status field from backup since icarus supports specifying only the tag name withouth appended schema version and timestamp. * Add schemaVersion and exactSchemaVersion fields. * Fix updating the active admin secret with the wrong role and password, * Drop support for file storage. * Fix tests and add checks for icarus container. * Fix lint issues. * Make the backup and restore controllers more testable. Implement new controllers initialization for test manager. Create icarus mock. Create a simple test for backup logic. * Cover with tests failure scenario. * Add restore tests. Hardcore doesnloaded sstables location on restore. Fix CassandraRestore cleanup in tests * Add docs. * Implement storage secret validation. * make manifests * Fix a few bugs and descriptions. * Fix tests and lint issues. * Fix not using the duration field. Removed not used field. * Allow to override the snapshotTag name. * Build and push icarus image in CI * Fix trivy vulnerability issue. * fix Dockerfile for icarus * Run tests against k8s 1.24.2. * Don't run against old k8s versions. * Fix CRD cleanuo in CI script. * Choose the container in `execPod`. Stopped working since we have 2 containers now, need to choose for request to succeed. Make `utils.MergeMap` resilient to nil maps. * Allow more processes during e2e tests. * Rename vars to avoid struct name shadowing. Don't mix value and pointer receiver methods declaration. * Fix misuse of util.MergeMap. It used a sideeffect of a bug that populated the map passed as a first argument but only the resulting map should have the merged elements. The args should not change. * Return a nil map if the inputs are nil in `MargeMap.` * Use .Before instead of comparing timestamps. * Fix compile errors after main merge. * Revert to run integration tests against 1.20.2 * Don't output debug logs into stdout on failed e2e tests since it became very verbose and hard to read. User should download the logs in artifacts on Github actions or look at /tmp/debug-logs folder if running tests locally. * Use constants to identify storage providers. * Remove commented code. * Don't parse time twice. * Mark the network policies test as Serial since it uses host ports. * Fix network policies e2e test. Set the correct container name. * Fix circular dependencies. * Fix networkpolicy for icarus and test them in the networkpolicy test. * Fix deprecated io/ioutil package usage. * Fix networkpolicy integration test. * Upgrade icarus and re-enable trivy scanner for the image. * Fix proxy registry URL. * Replace string literals with constants. * Apply suggestions from code review Co-authored-by: Craig Ingram <[email protected]> * Replace string literals with constants. * Apply suggestions from code review Co-authored-by: Craig Ingram <[email protected]> * Apply suggestions from code review Co-authored-by: Craig Ingram <[email protected]> Co-authored-by: Craig Ingram <[email protected]>
- Loading branch information