• First off, note that backshift is not fast (though it's not super-slow if you run it on Pypy or use the Cython chunking). Backshift is more about being frugal - that is, a modestly-sized backup disk (or RAID) can go a long way.

  • Backshift works a bit like an rsync-based backup script (EG: Backup.rsync), but it's intended to be used solely for backups.
  • Selecting files for backup
  • It does not operate over ssh directly, but works well over network filesystems like sshfs, CIFS or NFS.
  • For each filename read from stdin, the program will:
  • Metadata is stored anew on each backup.
  • Metadata is stored compressed - directories are only partially compressed, but their filenames and attributes are compressed.
  • Each directory is compressed separately, minimizing storage requirements while still allow rapid partial restores.
  • Your first backup with backshift for a given filesystem, will probably be a bit slow. Subsequent backups should be pretty fast unless there's been a lot of file changes.
  • You never need to do another fullsave after your first one, for a given set of files.
  • The author has done fullsaves over wifi (802.11g) - it worked well. Between the xz compression and the deduplication before the data hits the network, the network use was relatively low.
  • Incremental behavior
  • Expiration allows you to remove old data you no longer care about, to clear up space for new data.
  • You can set a retention interval for the repo as a whole. You cannot set different retention intervals for different hosts or different filesystems.
  • The expiration process will go through each of the following, removing any that are too old: