- Ability to deduplicate data on a variable-sized, content-based blocking
- Just renaming a 1.5 gigabyte file doesn't cause a second copy of that same file to be
stored, unlike rsync-based schemes
- Storing 3 identical copies of your fambly jpeg's on 3 different computers results in only a single copy of the
pictures being stored in the repository
- Changing one byte in the middle of a 6 gigabyte file doesn't result in a distinct copy of the whole
file in the repository - only the changed block is stored again
- Compresses deduplicated chunks with xz compression (falling back on bzip2 if necessary)
- Compresses almost all metadata, again with xz compression (again falling back on bzip2 if necessary)
- Few to no arbitrary limits on how big files can be - even if you're backing up to a
- Ability to expire old data for the repo as a whole.
- Safe, concurrent operation over local or remote filesystems, including but not limited to: NFS, CIFS
and sshfs. The only operation that isn't (designed to be) concurrent-safe, is expiring old files.
- No big filelist inhale at the outset is necessary, but if you allow one, you'll get a nice progress report
as a result.
- Hybrid fullsaves/incrementals, much like what one gets with an rsync --link-dest backup script - so an
interrupted backup can in a significant sense be subsequently resumed
- Ability to not mess up your buffer cache during a backup (planned, not yet fully implemented)
- A far smaller number of directory entries than a year's worth of daily snapshots with an rsync-based
backup script would give
- Copying a backup repository with 1 year of daily snapshots from one host to another is far more
practical with backshift than rsync --link-dest
- Input files are selected in a manner similar to cpio, using GNU find with -print0 or -printf
- Output is created in GNU tar format; a restore is a matter of piping tar output into a tar process for
extraction. This means there's no restore application to worry about race conditions in other than
- No temporary files are necessary on the client system for backups or restores; even a system
with (nearly?) full disks can be backed up (except on Cygwin, where a large number of
small temporary files are written and read, but there's only one on disk at a given time).
- Easy, no-temp-files (except on Cygwin) backup verification using a pipe to GNU tar's --diff
- Runs on a wide assortment of Python interpreters, including:
- CPython 2. (with or without Cython, with or without Psyco - Psyco is not available for 2.7 or x86-64 though)
- CPython 3. (with or without Cython)
- PyPy 1.4.x, 1.5, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2. 1.6 had a problem.
- Jython 2.5.3 or 2.7b1, but not Jython 2.5.2.
- Backshift is known not to work on IronPython, due to IronPython's lack of a python standard
- The backup process is cautious about symlink races, at least if the Python interpreter has
an os.fstat (noteably, Jython does not have an os.fstat. CPython 2 and 3, and PyPy do have
- Backshift compresses data pretty hard through its deduplication and use of xz. EG on 2014-10-05, I calculated that I have
a smattering of gig over 2.3 terabytes of data in use that I'm backing up at home, and 1 year of backshift snapshots of that data came
to only 2.4 terabytes, including metadata.
- There's currently no way for a user to restore their own files without requiring excessive
trust in users; the administrator needs to get involved.
- During a backup, users can see each others' files; data is not saved in an encrypted format
(but note that sshfs restricts who can see a mount)
- It could conceivably be nice to have host- or filesystem- granularity on expires, but this would require
that quite a bit more metadata be saved
- Disk-to-disk only - Disk-to-tape is not supported
- It's not super fast - especially the first time you backup a system.