+ Interpreters + CPython 2.x + CPython 3.x + PyPy + Jython - FePy? Close to IronPython, but reportedly has a better standard library. - OS's - developed on Linux (Ubuntu) mostly - port to OpenIndiana (OpenSolaris continued) - port to PureDarwin - port to ReactOS - port to Haiku - port to FreeDOS? - port to Solaris Express? - port to MacOS X? - port to Windows 7? / backups - backup id's probably should have their starting time since the epoch zero extended to two decimal places - It'd be nice to have a list of what backup_id's are being used as priors in an incremental - It'd be nice to have a summary, at the end, of how many were stat's, fast's and slow's - A --compare mode would be nice, so that you can get a list of files that've changed since a particular backup - rolling_checksum_pyx_mod could benefit from quite a bit more optimization, in light of the html report; CPython might even overtake PyPy in performance with some Cython tuning. In short, there's too much yellow in the report. - Some sort of heartbeat might be nice during a backup of a large file, particularly in light of NFS and sshfs's tendency to silently hang - constants_mod - temp file and rename for content files, to achieve greater concurrency safety on hashed files - Fix the chdir situation. cd once and stay there, don't hop all over. - Fix the list vs str path stuff. We should probably just use paths with os.path.dirname as needed - compression - Put compression info at the beginning of the .data files; leave the .time file alone. - The .time files should be single-purpose, because they are the only part that needs to change during an incremental. - use xz_mod for now, later use the standard library's xz module. / incrementals! + saveset_mod + saveset_mod -> saveset_summary_mod + chdir once, not all over + split out "files" operations into their own "saveset_files_mod.py" + for an incremental save, it might work well to pick three prior saves to examine for previous results within the specified subset: The most recent, the one with the most files in it, and the most recent complete save. + create Backshift_file.close_enough() + decided against this + create a backshift_file for the 4 savesets of relevance, and compare with a Backshift_file.close_enough() + Put the pieces together - test! / exotic file attributes + setuid + setgid + sticky - POSIX ACL's - Linux capabilities (?) - Linux xattr metadata? - Windows streams? - MacOS resource forks? - MacOS xattr metadata? + rename savesets to summaries + 66-rcm-perf needs a proportion for the size of the test and the threshold for "too long" + Get the b'' stuff out of the progress report on 3.x + Unix domain sockets are skipped correctly, but doing so does strange things to the tty stats output + saveset is not written until the 1000th file is processed? That could be pesky on backups with large files. + Improve TRY_FSTAT logic so we only hasattr once + Need to test some UTF-8 pathnames, and perhaps other encodings + revisit __exit__'s and make sure they deal with exceptions well + commented out + rather than files/dir-whatever/files, it probably should be files/dir-whatever/entries. + backup id's have colons in them... + No, backup id's don't, but timestamps in the --list-backups report do. + when writing a content .time, the preceeding directories are chdir'd individually. The initial open already uses a full path though. + Try stat'ing to the content directory first - only mkdir if this fails + make sure we don't traceback if two things try to mkdir at the same time + if files, contents and savesets don't exist, ask before creating them - unless a magic option is given + Save the number of files actually in the saveset - as distinct from the number of files intended to go into the saveset + add a --subset for backup id's, in order to get a better idea of what to start from on a resumed save + Figure out why an interrupted save would have both a start and finish time in savesets + if hostname is localhost.localdomain, error out and tell the user to specify a true hostname somehow + if you get a permission denied when lstat'ing a file during a backup, don't traceback + file types + Directories + Symlinks + Character and block device files + Sockets - ignored + fifos + We won't really know how well this is working, until we have restores and some automated tests therewith + Hostnames in backup id - MacOS xattr metadata? + Save username and group, not just uid and gid + Completed save marker + last time of presence in contents hierarchy, not a separate hierarchy, with -'d prefixes + Directory prefixes (to allow a directory named "files" or "files.db" or whatever) + listing all available backups + basic implementation + unit test + sorting + If the user want it sorted, they can sort it / Directory listing + listing the files in a backup + regular files + Directories + Symlinks + Character and block device files + Sockets - ignored + fifos + starting from an arbitrary directory within a backup - hardlinks / Restores! + from an arbitrary starting directory / tar output - renaming on the fly / file types + directories / regular files + plain - hardlinked + symlinks + fifo's + device files: character, block - Expiration - of old data - of old metadata - might be nice to make these separate, since metadata takes much less room - Incrementals - Incrementals that don't read every file; we already don't rewrite data we don't have, but the reading and chunking takes quite a while - Finding the last n backups to perform the incremental relative to (subset string?) - Compression + xz_mod - Compressing data chunks in content hierarchy - Skipping compression of chunks that don't compress well - Uncompressing files and recompressing as chunks - Dealing with files that don't uncompress! - Compression of metadata too? + A compression shell script to run on the server asynchronously? That'd mean we'd only uncompress on the client during a backup, not recompress... I kind of like that. + decided against this - optional libodirect - support in CPython 2.x should be straightforward - Might need some tweaking for 3.x; libodirect never tested here - Pypy - Does Pypy cooperate with swig? - Maybe a ctypes-based interface to libodirect for pypy? - Client/server operation - Authentication - Host - User - Concurrency - Encryption for transfers - Renaming a host - Users - misc internal + remove my_split.py + Split Repo out of backshift_file_mod into its own file + add treap.py to documentation list + Not going to do this due to Jython's use of unicode in 2.x: jython via ctypes fstat (or finding that java's open is fstat'ing for us) + Not going to do this until Jython has better ctypes support + gdbm via ctypes (for pypy, and maybe jython too) + Skipped for dohdbm: sort out why gdbm_ctypes is giving gibberish filenames in pypy but not cpython 2.x or 3.x + figure out why there's file content in files/1289715016.78-benchbox-Sat_Nov_13_22_10_16_2010-b1bb981f35a41bd0/usr/src/linux-headers-2.6.35-22/include/linux/sunrpc/files.db and fix + This was apparently dbm.py-related - with my gdbm.py module, it doesn't happen. - Performance tuning + Get the b'' stuff out of the directory prefixes on 3.x