The chunks are content-based and variable-length.
The point of the algorithm is, for EG, to make it so if you have a 4 gigabyte file, and you insert one byte at a random part of the file, most of the blocks will remain the same - even those after the inserted byte.
The algorithm has been in use in production for years, and is considered stable. However, the packaging of the modules implementating them for pypi is new (2021-04-23).
The three modules are:
Name of module |
Required? |
Suitable for Pypy3? |
Suitable for CPython? |
What does it do? |
rolling_checksum_mod | Yes | Yes | Yes | Tries to import rolling_checksum_pyx_mod. If that fails, it imports rolling_checksum_py_mod |
rolling_checksum_py_mod | Yes | Yes | Yes, but it's slow | Provides the blocking algorithm in Pure Python |
rolling_checksum_pyx_mod | No | No | Yes | Provides the blocking algorithm in Cython for speed |
I'd like to stress that rolling_checksum_pyx_mod is not needed for speed on Pypy3, and may actually make things slower.
To install rolling_checksum_mod and rolling_checksum_py_mod for pypy3:
Please note that on some systems, rolling_checksum_mod on Pypy3 is faster, and on other systems rolling_checksum_mod on CPython+Cython is faster. It's not at all a bad idea to compare them.
(Cython transpiles .pyx files to .c, which can be compiled using a C compiler to produce a C extension
module for CPython to use)
You can e-mail the author with questions or comments: