It is known to work on CPython 2.x, CPython 3.x, Pypy and Jython.
It is derived from this, but many changes have been made, including:
The user specifies the desired maximum number of elements and the desired maximum false positive probability, and the module calculates the rest.
The hash functions used are not exorbitantly expensive, but they give a much better distribution of values - confirmed by the automated tests.
It includes mmap, in-memory (array-based) and file-seek backends.
Example use:
#!/usr/bin/env python3 """Demonstrate usage.""" import bloom_filter_mod # These replicate the problem # n = 10 # m = 3 # n = 4 # m = 1 n = 10 m = 1 for means, filename in ( # These 3 are all in-memory as well as partially file seek (if needed). Somewhat persistent. Works fully. (f'part in memory, part file seek: {n // 2}', ('bloom-filter.bin', n // 2)), # works fully. ('part in memory, part file seek: 10000', ('bloom-filter.bin', 10000)), # works fully ('part in memory, part file seek: 0', ('bloom-filter.bin', 0)), # works fully # File seek alone: persistent. Works fully. ('file seek', 'bloom-filter.bin'), # Wholly in-memory. Not persistent. Works fully. ('in memory', None), # mmap alone: persistent. Works fully. ('mmap', ('bloom-filter.bin', -1)), ): bloom_filter = bloom_filter_mod.Bloom_filter( ideal_num_elements_n=n * 2, error_rate_p=0.01, filename=filename, start_fresh=True, ) for i in range(0, n * 2 - m, 2): bloom_filter.add(i) in_count = 0 for i in range(0, n * 2): if i in bloom_filter: in_count += 1 bloom_filter.close() print(f'in_count for {means:40}: {in_count}')
You can get it here.
See also this list of datastructures I've worked on.
You can e-mail the author with questions or comments: