This is a python-callable Rabin Fingerprinting class. It wraps Hyang-Ah Kim / David Mazieres's C++ Rabin Fingerprinting implementation.

Rabin Fingerprinting is useful for dividing a large file up into content-based, variable-length blocks, for deduplication. This is done as a file is read, and the chunks are fed individually to a cryptographic digest algorithm.

You may wish to note that I ultimately decided against using this code,instead going with a pure python variable length blocking algorithm of my own design, that's acceptably fast on Pypy or with Cython.

Anyway, you can download it from here


Hits: 2485
Timestamp: 2024-12-27 07:52:28 PST

Back to Dan's tech tidbits

You can e-mail the author with questions or comments: