Note: This web page was automatically created from a PalmOS "pedit32" memo.

Backer upper design


A modern backup system design :)

Should have:

1) A database of hash->backend storage filename.  In fact, might even
just name files by
their hashes - but compressed with whatever compression algorithm
works well.  In fact, we might be able to use "file" to determine what
compression algorithms to use
2) A database of machines backed up filenames' to hashes.  This one'll
get pretty big.  Or should it be one database per host?
3) On the machines to be backed up, we should go through files and make
sure that their hashes haven't changed unless their mtime's have too -
otherwise, we probably have disk corruption
4) A means of using O_DIRECT, if the OS supports it, while doing the
hashes on the client machines

If we do use hashes as filenames, we'll need a directory hierarchy that's
named by parts of the hash, say maybe two or three or four hex digits
per directory level

Should there be a mapping from:
1) hostname to base255
2) base255 to hostname
3) hash to base255
4) base255 to hash
...to save space?  Would it really save that much space?  Yeah, it
probably would

In python, naturally.  Is there any other choice?  :)

We'll need to store somewhere, what compression algorithms were used...

We'll perhaps want some way of managing the tradeoff due to hard
compression vs fast compression as they relate to the bandwidth...  EG,
rzip packs pretty hard, but takes forever...

Should be able to compress with gzip, bzip2, rzip or cat (noop)...
 


Back to Dan's palm memos