From the backshift project:
- Classes - with methods and attributes
- Module, class, function and method documentation
- Imports
- Including external modules
- Not including external modules
- rolling_checksum_pyx_mod Cython report
- Avant-garde datastructures used:
- Treap: Hybrid of a binary tree and a binary heap. A randomized, ordered dictionary
datastructure used for the database (dohdbm) cache. It's not just in order of addition, it's in key order.
- Bloom filter: Probabilistic set operations well suited to huge quantities of data; used for hardlink detection.
Not from the backshift project, but relevant:
- Compression: Time vs Size