drs-bloom-filter

A pure python bloom filter (low storage requirement, probabilistic set datastructure) is provided.

It is known to work on CPython 2.x, CPython 3.x, Pypy and Jython.

It is derived from this, but many changes have been made, including:

better hash functions
better automated tests
a selection of backends
more intuitive arguments to the __init__ method

The user specifies the desired maximum number of elements and the desired maximum false positive probability, and the module calculates the rest.

The hash functions used are not exorbitantly expensive, but they give a much better distribution of values - confirmed by the automated tests.

It includes mmap, in-memory (array-based) and file-seek backends.

Example use:

#!/usr/bin/env python3

"""Demonstrate usage."""

import bloom_filter_mod

# These replicate the problem
# n = 10
# m = 3

# n = 4
# m = 1

n = 10
m = 1

for means, filename in (
    # These 3 are all in-memory as well as partially file seek (if needed).  Somewhat persistent.  Works fully.
    (f'part in memory, part file seek: {n // 2}', ('bloom-filter.bin', n // 2)),  # works fully.
    ('part in memory, part file seek: 10000', ('bloom-filter.bin', 10000)),  # works fully
    ('part in memory, part file seek: 0', ('bloom-filter.bin', 0)),  # works fully
    # File seek alone: persistent.  Works fully.
    ('file seek', 'bloom-filter.bin'),
    # Wholly in-memory.  Not persistent.  Works fully.
    ('in memory', None),
    # mmap alone: persistent.  Works fully.
    ('mmap', ('bloom-filter.bin', -1)),
):
    bloom_filter = bloom_filter_mod.Bloom_filter(
        ideal_num_elements_n=n * 2,
        error_rate_p=0.01,
        filename=filename,
        start_fresh=True,
    )

    for i in range(0, n * 2 - m, 2):
        bloom_filter.add(i)

    in_count = 0
    for i in range(0, n * 2):
        if i in bloom_filter:
            in_count += 1

	bloom_filter.close()

    print(f'in_count for {means:40}: {in_count}')

You can get it here.

Hits: 10223
Timestamp: 2025-08-04 13:55:37 PDT

Back to Dan's tech tidbits

You can e-mail the author with questions or comments: