Backshift uses a "file count" or "file count estimate" to size a bloom filter. The bloom filter is responsible for backshift's
strong ability to detect hard links. It's better to use an estimate that is too high, than too low, though excessively large
estimates may cause excessive memory use.
How the file count is produced:
- If you use one of the more verbose progress report modes (including the default mode), backshift will use an exact file count
by counting the list of files to back up from stdin.
- If you use an especially terse progress report mode like "none", then backshift will:
- Attempt to get the size of the largest backup (in terms of file count, not storage space) for the (host, subset) pair, and if this number
is greater than 5 million, it'll double the number and use that.
- If no prior backup for the (host, subset) pair is available, it'll estimate 10 million files - which is adequate for most filesystems today.
Estimating a little too high is not a problem.
Questions
- What if the file count (estimate) is too low?
- Then backshift will simply overdetect possible hardlinks. It won't backup incorrectly, it'll just use somewhat more memory than necessary
during restores of the save(s) created.
- What if the file count (estimate) is too high?
- Then backshift will allocate a bit more memory than necessary in its bloom filter. It won't backup incorrectly, it'll just use somewhat
more memory than necessary during backup of the save(s) created.