Automated tests
- Generally speaking, the automated tests are more stringent than what production use is going
to require. So if you just want to do some backups and restores, that's fine.
- To run the automated tests, you need at least /usr/local/cpython-2.7 and /usr/local/cpython-3.2,
unless you change some of the automated tests and/or this-interpreter.
- After installing /usr/local/cpython-3.2:
- cd /usr/local/cpython-3.2/bin; ln -s python3.2 python
Production use
- For file-count-based progress, you need a find that supports -print0; this includes GNU find and
at least one of the BSD's.
- For file-size-based progress, you need a find that supports -printf; this includes GNU find.
- Generally speaking, file-size-based progress is more accurate than file-count-based progress.
But file-size-based progress requires more out of your "find" command.
- Generally works over sshfs, CIFS and NFS, but see the backshift notes for your OS, if any.
- Backshift is going to want an xz binary to do compression with, at least until there's a python
module that directly supports xz.
Features include:
- Ability to deduplicate data on a variable-sized, content-based blocking
- Just renaming a 1.5 gigabyte file doesn't cause a second copy of that same file to be
stored, unlike rsync-based schemes
- Storing 3 copies of your fambly jpeg's on 3 different computers results in a single copy of the
pictures being stored in the repository
- Changing one byte in the middle of a 6 gigabyte file doesn't result in a distinct copy of the whole
file in the repository
- Ability to compress deduplicated chunks with xz compression (planned, not yet fully implemented)
- Ability to expire old data for the repo as a whole (planned, not yet implemented - not down to a host
or filesystem granularity, just repo-granularity)
- Safe, concurrent operation over local or remote filesystems, including but not limited to: NFS, CIFS
and sshfs (could use a little improvement with temp files and renames though). The only operation
that isn't (designed to be) concurrent-safe, is expiring old files.
- No big filelist inhale at the outset, unless you request a progress report during a backup - similar
to rsync in this regard
- Hybrid fullsaves/incrementals, much like what one gets with an rsync --link-dest backup script - so an
interrupted backup can in a significant sense be subsequently resumed
- Ability to not mess up your buffer cache during a backup (planned, not yet fully implemented)
- A far smaller number of directory entries than a year's worth of daily snapshots with an rsync-based
backup script would give
- Copying a backup repository with 1 year of daily snapshots from one host to another is far more
practical with backshift than rsync --link-dest
- Input files are selected in a manner similar to cpio, using GNU find with -print0
- Output is created in GNU tar format; a restore is a matter of piping tar output into a tar process for
extraction. This means there's no restore application to worry about race conditions in other than
tar itself
- No temporary files are necessary on the client system for backups or restores; even a system
with (nearly?) full disks can be backed up
- Runs on a wide assortment of Python interpreters, including:
- CPython 2.x (with or without Cython, with or without Psyco)
- CPython 3.x (with or without Cython)
- PyPy 1.4.x and 1.5.
- Jython 2.5.2 -r 7288, but not jython 2.5.2; IOW, you would need to check out jython and
build it yourself.
- Backshift is known not to work on IronPython, due to IronPython's lack of a proper standard
library.
- The backup process is cautious about symlink races, at least if the Python interpreter has
an os.fstat (noteably, Jython does not have an os.fstat. CPython 2 and 3, and PyPy do have
os.fstat)
Misfeatures:
- There's currently no way for a user to restore their own files without requiring excessive
trust in users; the administrator needs to get involved.
- During a backup, users can see each others' files; data is not saved in an encrypted format
- It could conceivably be nice to have host- or filesystem- granularity on expires, but this would require
quite a bit more metadata to be saved
- Disk-to-disk only - Disk-to-tape is not supported
The gist of how it works
- Backshift works a bit like an rsync-based backup script, but it's intended to be used solely
for backups.
- Selecting files
- The selection of files to backup is specified in a manner similar to using cpio: by using
the find command.
- See the example-finds directory for examples of find commands for various OS's
- It does not operate over ssh directly, but works well over network filesystems
like sshfs, CIFS or NFS.
- For each filename read from stdin, the program will chop the file into variable-length blocks
and compress them, before writing them to a repository of backed up files.
- Metadata is stored anew on each backup. For this reason, there is no need to sort directories.
- Your first backup with backshift for a given filesystem, will probably be a bit slow. Subsequent
backups should be pretty fast unless there's been a lot of file changes.
- You never need to do another fullsave after your first one, for a given set of files.
- The author has done a fullsave over wifi - it worked well. Between the xz compression and the
deduplication before the data hits the network, the network use was relatively low.
- Incremental behavior
- rsync --link-dest incrementals are normally done relative to the single most recent "similar"
backup by one's rsync wrapper
- Backshift's incrementals are done relative to up to three previous backups, simultaneously:
- The most recent backup found for the (hostname, subset) pair
- The most recent completed backup for the (hostname, subset) pair
- The backup with the most files in it, for the (hostname, subset) pair
Example use
- Backing up
- Note that the first time you use a save directory (repository), you'll need --init-savedir.
- Back up your root filesystem (absent ZFS, which breaks -xdev), with file-count-based progress, creating
the repository if it does not yet exist:
- find / -xdev -print0 | backshift --save-directory /where/ever/save-directory --backup -subset slash --init-savedir
- To pull from an sshfs, which flattens filesystems into a single filesystem, to a local filesystem
(but writing to a remote filesystem is faster than
reading from one), not creating the savedir (note that backing up a ZFS could be done analogously):
- cd /ssh/fs/base
- find . -xdev \( \( \
- -path ./sys -o \
- -path ./dev -o \
- -path ./var/run -o \
- -path ./var/lock -o \
- -name .gvfs \) -prune -o -print0 \) | \
- backshift --save-directory /where/ever/save-directory --backup --subset fullsave
- To back up / with a more accurate progress report (assumes your find
supports -printf). This one is based on the lengths of files; the above two are just
based on file counts:
- find / -xdev -printf '%s %p\0' | backshift --save-directory /where/ever/save-directory --backup -subset slash --init-savedir --progress-report full
- To back up / with a minimal progress report - this one does not do a big inhale of filenames at the beginning:
- find / -xdev -printf '%s %p\0' | backshift --save-directory /where/ever/save-directory --backup -subset slash --init-savedir --progress-report minimal
- To back up / with no progress report - this one is usually best in cron jobs:
- find / -xdev -printf '%s %p\0' | backshift --save-directory /where/ever/save-directory --backup -subset slash --init-savedir --progress-report none
- This one backs up /movie with a progress report, keeping the progress report pretty accurate despite a previous incomplete backup by using randomize --preserve-directories. Note that this example splits one logical line into multiple physical lines in the manner of POSIX shells and *csh, by using backslashes on all but the last line:
- find /movie -xdev -printf '%s %p\0' | \
- ~/src/home-svn/backshift/trunk/randomize -0 -v --preserve-directories --skip-size | \
- /usr/local/pypy-1.4.1/bin/pypy ~/src/home-svn/backshift/tags/0.94/backshift \
- --backup \
- --save-directory /mnt/backshift-incremental-test/save-directory \
- --subset movie \
- --progress-report full
- Restoring
- Overview of process
- First, locate what backups are available to restore from, using --list-backups, and select the best one, for some definition of "best" ^_^
- Second, locate the files within that backup you require using --list-backup --backup-id
- Third, use "--produce-tar --starting-directory | tar xvfp -" to extract the files
- Strictly speaking, you can use --produce-tar with a pipe to "tar tvf -" in the second step
too, but it's much slower.
- Example restore:
- First we list all backups that finished (the last column is None for an unfinished backup). For the sake of discussion, assume the green backup id is the "best" one:
- # ~/src/home-svn/backshift/trunk/backshift --save-directory /mnt/backshift-incremental-test/save-directory --list-backups | awk ' { if ($4 != "None") print }' | sort
- 1305581966.39_openindiana_export_mon-may-16-14-39-26-2011_6244d94b726da6c6 Mon-May-16-14:39:26-2011 2 Mon-May-16-14:39:26-2011
- 1305583872.56_openindiana_export_mon-may-16-15-11-12-2011_8cfbd6e4f5d87142 Mon-May-16-15:11:12-2011 2 Mon-May-16-15:11:13-2011
- 1305609181.37_openindiana_slash_mon-may-16-22-13-01-2011_04be24c2e608ec32 Mon-May-16-22:13:01-2011 160326 Tue-May-17-13:18:34-2011
- 1305609205.38_openindiana_export_mon-may-16-22-13-25-2011_20abd67bf8d07db3 Mon-May-16-22:13:25-2011 17177 Tue-May-17-04:12:41-2011
- Next we identify what file we need (the red path is its directory):
- # ~/src/home-svn/backshift/trunk/backshift --save-directory /mnt/backshift-incremental-test/save-directory --list-backup --backup-id 1305609205.38_openindiana_export_mon-may-16-22-13-25-2011_20abd67bf8d07db3 2>&1 | egrep -i 'xz.*local-script'
- -rw-r--r-- strombrg/staff 249 2011-05-16 10:32 export/home/strombrg/src/xz/local-script
- Note that in the preceding step, if we already knew the directory, but not the filename, we
could've used the following much more rapidly:
- # ~/src/home-svn/backshift/trunk/backshift --save-directory /mnt/backshift-incremental-test/save-directory --list-backup --backup-id 1305609205.38_openindiana_export_mon-may-16-22-13-25-2011_20abd67bf8d07db3 --starting-directory /export/home/strombrg/src/xz 2>&1
- -rw-r--r-- strombrg/staff 216 2011-05-16 10:07 export/home/strombrg/src/xz/Notes
- -rw-r--r-- strombrg/staff 626 2011-05-16 10:41 export/home/strombrg/src/xz/last-archives
- -rw-r--r-- strombrg/staff 249 2011-05-16 10:32 export/home/strombrg/src/xz/local-script
- drwxr-xr-x strombrg/staff 0 2011-05-16 10:11 export/home/strombrg/src/xz/old/
- -rw-r--r-- strombrg/staff 1023720 2011-04-01 03:11 export/home/strombrg/src/xz/xz-5.0.2.tar.bz2
- -rw-r--r-- strombrg/staff 1270541 2011-04-12 03:49 export/home/strombrg/src/xz/old/xz-5.1.1alpha.tar.gz
- Finally, we extract the file we want:
- # ~/src/home-svn/backshift/trunk/backshift --save-directory /mnt/backshift-incremental-test/save-directory --backup-id 1305609205.38_openindiana_export_mon-may-16-22-13-25-2011_20abd67bf8d07db3 --starting-directory /export/home/strombrg/src/xz --produce-tar | tar xvf - export/home/strombrg/src/xz/local-script
- export/home/strombrg/src/xz/local-script
- Note that during the restore, backshift didn't write to your filesystem; tar did.