Rabin Fingerprinting is useful for dividing a large file up into content-based, variable-length blocks, for deduplication. This is done as a file is read, and the chunks are fed individually to a cryptographic digest algorithm.
You may wish to note that I ultimately decided against using this code,instead going with a pure python variable length blocking algorithm of my own design, that's acceptably fast on Pypy or with Cython.
Anyway, you can download it from here
You can e-mail the author with questions or comments: