Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check: add spot check functionality #1389

Open
simonsan opened this issue Dec 11, 2024 · 1 comment
Open

check: add spot check functionality #1389

simonsan opened this issue Dec 11, 2024 · 1 comment
Labels
S-triage Status: Waiting for a maintainer to triage this issue/PR

Comments

@simonsan
Copy link
Contributor

simonsan commented Dec 11, 2024

Summary

Implement a "spot check" feature for rustic to enhance consistency checks by comparing source file counts and data against the latest backup archive. This feature is intended to detect discrepancies like incorrect excludes, inadvertent deletes, or malicious file modifications.

Background

Most existing consistency checks do not validate against the original source files, leaving a potential blind spot if the source files are altered or deleted. The spot check aims to address this by performing a targeted comparison, ensuring backup integrity while balancing performance.

Key Features

  1. File Count Comparison with Tolerance

    • Allow a configurable percentage tolerance for differences between source file counts and the backup archive.
    • Example: If there are 100 source files and 105 files in the archive, a 10% tolerance would pass the check, while 200 files in the archive or 50 would fail.
  2. Sampling for Efficiency

    • Enable sampling a configurable percentage of files from the source for comparison to optimize speed.
    • Example: For a 1% sampling rate of 1,000 source files, only 10 files would be compared.
  3. Hash-Based Data Comparison

    • Use efficient hashing (e.g., xxhash/xxh64sum) to compare sampled files with their counterparts in the archive.
  4. Tolerance for Data Mismatches

    • Allow a configurable percentage of sampled files to fail the hash comparison without failing the entire check.

Proposed Configuration Options

[[checks]]
name = "spot"
count_tolerance_percentage = 10
data_sample_percentage = 1
data_tolerance_percentage = 0.5
only_run_on = ["Friday", "weekend"]
  • count_tolerance_percentage: Max allowed percentage difference in file count between source and archive.
  • data_sample_percentage: Percentage of files to sample for comparison.
  • data_tolerance_percentage: Max allowed percentage of sampled files with mismatches before failing the check.

Usage Notes

  • Spot checks rely on tolerances, requiring careful tuning to minimize false positives or negatives.
  • May not suit workloads with frequent, large changes to source files.
  • Should be scheduled separately from backup creation to avoid immediate false positives after changes.

Additional Considerations

  • Recommend using xxhash for efficient file hashing; users should ensure it is installed.
  • Logs or metrics should clearly report any discrepancies detected.
  • Provide documentation to help users configure and troubleshoot spot checks effectively.

Benefits

  • Increases the likelihood of detecting issues with source files or backup configurations.
  • Offers a balance between thoroughness and performance.

Potential Challenges

  • Requires additional processing overhead, particularly with large source sets.
  • May need retries or fallbacks for mismatches caused by temporary issues (e.g., file locks).

(needs refinement, created by ChatGPT from https://torsion.org/borgmatic/docs/how-to/deal-with-very-large-backups/#spot-check )

CC: Addresses spot check idea from rustic-rs/docs#111

@github-actions github-actions bot added the S-triage Status: Waiting for a maintainer to triage this issue/PR label Dec 11, 2024
@aawsome
Copy link
Member

aawsome commented Jan 19, 2025

We should remove the comments about xxhash. We already have the sha256 of the junks. No need to compute another hash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-triage Status: Waiting for a maintainer to triage this issue/PR
Projects
None yet
Development

No branches or pull requests

2 participants