The gist of how it works

Backshift works a bit like an rsync-based backup script, but it's intended to be used solely for backups.

Selecting files for backup

The selection of files to backup is specified in a manner similar to using cpio: by using the find command.
See the example-finds directory for examples of find commands for various OS's
I'll likely add a more tar-like mode for conducting backups at some point, but for now, the cpio-like method works fine

It does not operate over ssh directly, but works well over network filesystems like sshfs, CIFS or NFS.

For each filename read from stdin, the program will:

...chop the file into variable-length blocks averaging about 2 mebibytes in size
For each such block:
- ...compute a cryptographic digest representating the block
- ...compress the block using xz
- ...save the block to a repository of backed up files, under its cryptographic digest - but only if the repo doesn't already have a copy of that particular block (digest)
...save file metadata to the repository, again compressed with xz

Metadata is stored anew on each backup. For this reason, there is no need to sort directories.

Metadata is stored compressed - directories are only partially compressed, but their content is compressed.

Your first backup with backshift for a given filesystem, will probably be a bit slow. Subsequent backups should be pretty fast unless there's been a lot of file changes.

You never need to do another fullsave after your first one, for a given set of files.

The author has done fullsaves over wifi (802.11g) - it worked well. Between the xz compression and the deduplication before the data hits the network, the network use was relatively low.

Incremental behavior

rsync --link-dest incrementals are normally done relative to the single most recent "similar" backup by one's rsync wrapper
Backshift's incrementals are done relative to up to three previous backups, simultaneously:
- The most recent backup found for the (hostname, subset) pair
- The most recent completed backup for the (hostname, subset) pair
- The backup with the most files in it, for the (hostname, subset) pair

Expiration will go through each of the following, removing any that are too old:

Individual chunk files (based on file timestamps)
Individual "files" files (again, based on timestamps)
Individual saveset summary files (based on a time-of-completion (or last touch, in the case of a system crash or early backup termination) timestamp stored within the file)