Backshift works a bit like an rsync-based backup script, but it's intended to be used solely
for backups.
Selecting files
- The selection of files to backup is specified in a manner similar to using cpio: by using
the find command.
- See the example-finds directory for examples of find commands for various OS's
It does not operate over ssh directly, but works well over network filesystems
like sshfs, CIFS or NFS.
For each filename read from stdin, the program will:
- ...chop the file into variable-length blocks averaging about 2 mebibytes in size
- For each such block:
- ...compute a cryptographic digest representating the block
- ...compress the block using xz
- ...save the block to a repository of backed up files, under its cryptographic digest - but only if the repo
doesn't already have a copy of that particular block (digest)
- ...save file metadata to the repository, again compressed with xz
Metadata is stored anew on each backup. For this reason, there is no need to sort directories.
Metadata is stored compressed - directories are only partially compressed, but their content is compressed.
Your first backup with backshift for a given filesystem, will probably be a bit slow. Subsequent
backups should be pretty fast unless there's been a lot of file changes.
You never need to do another fullsave after your first one, for a given set of files.
The author has done fullsaves over wifi (802.11g) - it worked well. Between the xz compression and the
deduplication before the data hits the network, the network use was relatively low.
Incremental behavior
- rsync --link-dest incrementals are normally done relative to the single most recent "similar"
backup by one's rsync wrapper
- Backshift's incrementals are done relative to up to three previous backups, simultaneously:
- The most recent backup found for the (hostname, subset) pair
- The most recent completed backup for the (hostname, subset) pair
- The backup with the most files in it, for the (hostname, subset) pair