Skip to content

neapsix/cloudbackup-benchmarks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cloudbackup-benchmarks

Procedure and scripts for benchmarking some popular cloud backup utilities for use with Backblaze B2.

This repo is a companion to a series of articles on neap.space:

The results of each test are collected in the data/ directory. For a more legible version, refer to the graphsheet/ directory and to the accompanying articles.

The following sections, as well as the other README files in the materials directories describe how to re-run the tests and collect your own results.

Run Backup Tests for Each Client and Test Case

Prerequisites: a test machine with a dataset containing some files mounted as /tank and a remote target, such as a b2 bucket, ready to go.

To run the backup tests for a certain backup client, complete the following tasks:

  • Copy the directory containing materials for one of the clients to the test machine.
  • Perform client-specific setup, such as installing the client, initializing a backup repository, and setting up encryption keys. This setup is described in README files in the materials directory.
  • Perform the test procedure described below. After doing the file operations for each test case, run the backup script in the working directory provided for that test case.

To compare multiple backup clients, repeat these tasks for each backup client. For example, you might set up four virtual machines, and repeat the tasks for a different backup client in each virtual machine.

The test procedure is:

  1. Perform an initial backup of a dataset.
  2. Add some files and perform an incremental backup.
  3. Modify some files and perform an incremental backup.
  4. Touch just one file and perform an incremental backup.
  5. Remove some files and perform an incremental backup.
  6. Delete everything and restore from the backup.

Here's an example of how you would run a backup test for restic:

  1. Copy the restic materials to your test machine:
$ rsync -hav ./restic-tests user@testmachine:/home/user/
  1. Perform client-specific setup. For example:
$ wget https://github.com/restic/restic/releases/download/v0.12.1/restic_0.12.1_freebsd_amd64.bz2
$ bzip2 -d restic_0.12.1_freebsd_amd64.bz2
$ ./restic -r b2:name-of-your-b2-bucket init
$ vi ./restic-pw # write the restic passowrd in a file the script can read
$ vi ~/restic-tests/run-restic-test # specify b2 and restic credentials
$ vi ~/restic-tests/run-restic-restore-test # enter b2 and restic credentials

Note: if you're using ZFS or btrfs, it's helpful to create a filesystem snapshot before performing each test. For example, run # zfs snapshot -r tank@initial before the first test.

  1. Perform the first test. To capture the output of time and to avoid having to babysit these tests, I recommend running tests overnight in a logged and detached screen or tmux session.
$ screen -X
$ cd ~/restic-tests/0-initial
$ ../run-restic-test
  1. For each test case after the first, modify the files and run the script again:
$ rsync -hav user@remote:/path/to/files/for/test/1/ /tank/
$ screen -X
$ cd ~/restic-tests/1-add_files
$ ../run-restic-test
[...]
$ rm -rf /tank/*
$ screen -X
$ cd ~/restic-tests/5-restore
$ ../run-restic-restore-test

Collect Results and Fix File Formatting

Prerequisites: POSIX shell, GNU sed (see note below), and Rscript.

After performing all the tests for all the backup clients you want to compare, run a few scripts to collect all the data in one place and to create tidy CSV flat files measurements during each test. For more information about these scripts, refer to the comments and help text in each script.

Run the following scripts with ./data as the working directory:

$ cd ./data

Collect all the test output files in the data directory:

$ ls ../materials/*/*/*.txt | xargs -n 1 ../scripts/collect_files.sh

Fix XML output from netstat and vmstat:

Note: the next two scripts assume you're using GNU sed. To run them on macOS or BSD, install GNU sed, and edit the script to point to that, such as by replacing sed with gsed.

$ ls ./*netstat.txt | xargs -n 1 ../scripts/fix_netstat_output.sh
$ ls ./*vmstat.txt | xargs -n 1 ../scripts/fix_vmstat_output.sh

Generate CSV files from the XML:

Note: the output_generic_csv.R script creates a CSV from the netstat ouptut without needing to parse each XML node, but the vmstat output needs some special handling. Expect the second script to take much longer.

$ ls *netstat.xml | xargs -n 1 -I{} Rscript ../scripts/output_generic_csv.R {}
$ ls *vmstat.xml | xargs -n 1 -I{} Rscript ../scripts/output_vmstat_csv.R {}

After you run these scripts, the data from each client and each test case is available for you to review as a CSV in the ./data directory.

(Optional) Create Plots to Compare Results for Each Test

Prerequisites: Rscript or an R workspace

You can make line graphs like the ones in my test data by using an R script. The provided script covers all the potential graphs in one file with instructions to comment out or uncomment different sections, depending on which variable you want to plot during which test.

You need to modify the script each time you run it to generate the plots. There are four potential variables:

  • Sent bytes (from netstat when backing up)
  • Received bytes (from netstat when restoring)
  • Memory usage stats (from vmstat during all tests)
  • CPU usage stats (from vmstat during all tests)

Because the first two are mutually exclusive, you can create up to three graphs of different measurements for each of six tests, modifying and running the script file a total of 18 times.

Repeat the following steps for each graph you want to make:

  1. Open the file ./scripts/make_plots.R in a text editor.
  2. Search for the text ***, and follow the instructions at each marked comment to make the required edits. A comment near the end of the file lets you know when you're done.
  3. Run the script with ./data as the working directory:
$ cd ./data
$ Rscript ../scripts/make_plots.R

Note: If you prefer to work in a traditional R workspace, you can instead copy and paste the script at the R prompt.

Your plot is saved as an SVG file in the ./data directory with the test and the variable measured as the file name, such as 1-add_files_vmstat_cpu.svg.

(Optional) Create a PDF Containing All Graphs

Prerequisites: pandoc, rsvg-convert, and a supported PDF creation engine, such as the LaTeX compilers included in TeX Live.

The ./graphsheet directory contains a markdown file that presents all the graphs. After creating graphs, you can use pandoc to generate a PDF version by running pandoc -o graphsheet.pdf graphsheet.md.

Someday Maybe

Here are some changes I'd like to make if I were to run the tests again:

  • Use tuning options where available to try to account for some of the differences in the results.
  • Improve backup scripts to better model safe, real-world use. Point to files for credentials and include file permission steps in client-specific setup.
  • Use git-secrets for credentials to make it safe for you to fork this repository and post your own results without needing to manually set, then strip out keys and passwords.
  • Add more backup clients (such as kopia), and use measurement tools that are easier to parse than FreeBSD libxo XML output.
  • Improve plot script to suck in all the test cases at once without needing tweaks for each one.

About

Procedures and data comparing cloud backup clients

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published