This is a python-callable Rabin Fingerprinting class. It wraps Hyang-Ah Kim / David Mazieres's C++ Rabin Fingerprinting implementation.

Rabin Fingerprinting is useful for dividing a large file up into content-based, variable-length blocks, for deduplication. This is done as a file is read, and the chunks are fed individually to a cryptographic digest algorithm.

You may wish to note that I ultimately decided against using this code,instead going with a pure python variable length blocking algorithm of my own design, that's acceptably fast on Pypy or with Cython.

Anyway, you can download it from here


Hits: 2503
Timestamp: 2025-01-09 18:25:12 PST

Back to Dan's tech tidbits

You can e-mail the author with questions or comments: