At http://burp.grke.org/burp2/08results1.html you
can find a performance comparison between some backup applications.
The comparison did not compare backshift, because backshift was believed to have prohibitively slow
deduplication.
Backshift is truly not a speed-demon. It is designed to:
- minimize storage requirements
- minimize bandwidth requirements
- emphasize parallel (concurrent backups of different computers) performance to some extent
- allow expiration of old data that is no longer needed
If you have a need for speed, you might check into an rsync wrapper
like Backup.rsync.
It requires considerably more storage, but it is very fast.
Also, it was almost certainly not backshift's deduplication that was slow, it was:
- backshift's variable-length, content-based blocking algorithm. This makes python inspect every byte
of the backup, one byte at a time.
- backshift's use of xz compression. xz packs files very hard,
reducing storage and bandwidth requirements, but it is known to be slower than something like
gzip that doesn't compress as well.
Also, while the initial fullsave is slow, subsequent backups are much faster because they
do not reblock or recompress any files that still have the same mtime and size as found in 1 of (up to) 3
previous backups.
Also, if you run backshift on Pypy, its variable-length, content-based
blocking algorithm is
many times faster than if you run it on
CPython. Pypy is not only faster than CPython, it's also much
faster than CPython augmented with Cython.
I received an e-mail from G.P.E Keeling in Feb 2016. He intends to spend a little more time on a similar comparison
in the future.