/ backups - rather than files/dir-whatever/files, it probably should be files/dir-whatever/entries. - backup id's have colons in them... - saveset is not written until the 1000th file is processed? That could be pesky - constants_mod - incrementals! + saveset_mod - for an incremental save, it might work well to pick three prior savesets to examine for previous results: The most recent, the one with the most files in it, and the most recent complete save. - create Backshift_file.close_enough() - create a backshift_file for the 4 savesets of relevance, and compare with a Backshift_file.close_enough() - Put the pieces together - revisit __exit__'s and make sure they deal with exceptions well - when writing a content .time, the preceeding directories are chdir'd individually. The initial open already uses a full path though. - temp file and rename for content files, to achieve greater concurrency safety on hashed files - Fix the chdir situation. cd once and stay there, don't hop all over. - Fix the list vs str path stuff. We should probably just use paths with os.path.dirname as needed - Get the b'' stuff out of the progress report on 3.x - Unix domain sockets are skipped correctly, but doing so does strange things to the tty stats output - Try stat'ing to the content directory first - only mkdir if this fails - make sure we don't traceback if two things try to mkdir at the same time - eliminate content .time files, replace with .metadata files. Modify to include how, if at all, the file was compressed. - if files, contents and savesets don't exist, ask before creating them - unless a magic option is given / exotic file attributes + setuid + setgid + sticky - POSIX ACL's - Linux capabilities (?) - Linux xattr metadata? - Windows streams? - MacOS resource forks? - MacOS xattr metadata? + Save the number of files actually in the saveset - as distinct from the number of files intended to go into the saveset + add a --subset for backup id's, in order to get a better idea of what to start from on a resumed save + Figure out why an interrupted save would have both a start and finish time in savesets + if hostname is localhost.localdomain, error out and tell the user to specify a true hostname somehow + if you get a permission denied when lstat'ing a file during a backup, don't traceback + file types + Directories + Symlinks + Character and block device files + Sockets - ignored + fifos + We won't really know how well this is working, until we have restores and some automated tests therewith + Hostnames in backup id - MacOS xattr metadata? + Save username and group, not just uid and gid + Completed save marker + last time of presence in contents hierarchy, not a separate hierarchy, with -'d prefixes + Directory prefixes (to allow a directory named "files" or "files.db" or whatever) - listing all available backups + basic implementation + unit test - sorting / Directory listing + listing the files in a backup + regular files + Directories + Symlinks + Character and block device files + Sockets - ignored + fifos + starting from an arbitrary directory within a backup - hardlinks / Restores! + from an arbitrary starting directory / tar output - renaming on the fly / file types + directories / regular files + plain - hardlinked + symlinks + fifo's + device files: character, block - Expiration - of old data - of old metadata - might be nice to make these separate, since metadata takes much less room - Incrementals - Incrementals that don't read every file; we already don't rewrite data we don't have, but the reading and chunking takes quite a while - Finding the last n backups to perform the incremental relative to (subset string?) - Compression - xz_mod - Compressing data chunks in content hierarchy - Skipping compression of chunks that don't compress well - Uncompressing files and recompressing as chunks - Dealing with files that don't uncompress! - Compression of metadata too? - A compression shell script to run on the server asynchronously? That'd mean we'd only uncompress on the client during a backup, not recompress... I kind of like that. - optional libodirect - support in CPython 2.x should be straightforward - Might need some tweaking for 3.x; libodirect never tested here - Pypy - Does Pypy cooperate with swig? - Maybe a ctypes-based interface to libodirect for pypy? - Client/server operation - Authentication - Host - User - Concurrency - Encryption for transfers - Renaming a host - Users - misc internal + remove my_split.py + Split Repo out of backshift_file_mod into its own file + add treap.py to documentation list + Not going to do this due to Jython's use of unicode in 2.x: jython via ctypes fstat (or finding that java's open is fstat'ing for us) + Not going to do this until Jython has better ctypes support + gdbm via ctypes (for pypy, and maybe jython too) + Skipped for dohdbm: sort out why gdbm_ctypes is giving gibberish filenames in pypy but not cpython 2.x or 3.x + figure out why there's file content in files/1289715016.78-benchbox-Sat_Nov_13_22_10_16_2010-b1bb981f35a41bd0/usr/src/linux-headers-2.6.35-22/include/linux/sunrpc/files.db and fix + This was apparently dbm.py-related - with my gdbm.py module, it doesn't happen. - Performance tuning + Get the b'' stuff out of the directory prefixes on 3.x