A pure python bloom filter (low storage requirement, probabilistic set datastructure) is provided.

It is known to work on CPython 2.x, CPython 3.x, Pypy and Jython.

It is derived from this, but many changes have been made, including:

The user specifies the desired maximum number of elements and the desired maximum false positive probability, and the module calculates the rest.

The hash functions used are not exorbitantly expensive, but they give a much better distribution of values - confirmed by the automated tests.

It includes mmap, in-memory (array-based) and file-seek backends.

You can get it here.

See also this list of datastructures I've worked on.


1786

Back to Dan's tech tidbits

You can e-mail the author with questions or comments: