This is a python-callable Rabin Fingerprinting class. It wraps Hyang-Ah Kim / David Mazieres's C++ Rabin Fingerprinting implementation.

Rabin Fingerprinting is useful for dividing a large file up into content-based, variable-length blocks, for deduplication. This is done as a file is read, and the chunks are fed individually to a cryptographic digest algorithm.

You may wish to note that I ultimately decided against using this code,instead going with a pure python variable length blocking algorithm of my own design, that's acceptably fast on Pypy or with Cython.

Anyway, you can download it from here


Hits: 2349
Timestamp: 2024-09-11 19:04:29 PDT

Back to Dan's tech tidbits

You can e-mail the author with questions or comments: