Bloom Filters in Pythonic Art

Mohammed Zaheeruddin Malick
2 min readJan 12, 2020

Use Case: User login handle existence test during profile creation using Bloom Filter, in Python.

Part I of the 2020 weekend coding series!

Bloom Filter is a highly space efficient probabilistic data structure that guarantees zero false negatives and tunable probability false positives.

Bloom filters are popularly used for filtering use cases, requiring fast and space efficient key existence checks, such as testing for existing user handles during login and profile creation, Access control lists for security applications such as spam filtering, Cache filtering in CDNs, Database design (Cassandra uses bloom filter) among others.

Bloom filters are implemented using bitmaps offering orders of magnitude higher memory efficiency than hash maps.

https://en.wikipedia.org/wiki/Bloom_filter

Run the program multiple times to see instances of False positives and FPR.

...
...
Trial: 199/200 [.....1...1...1..11.11.113..111..7.3..1..211221.3.21...27231422.1.21.3711142321224243173.32211327322233214133]
Trial: 200/200 [.........1...2...1..2.11...1212172211..22.21.121321.431712142.211.221723333124311264372353.12227231133224424]

[Legend] Hash Collision = 'n' functions, Clear = '.'

Test Data: {False: ['zu', 'zzzu', 'jaggu', 'ninja', 'neo', 'trinity', 'morpheus', 'po', 'shifu', 'tigress'], True: ['mz', 'maryam', 'isabelle', 'jo', 'ch']} Positives

Trials=200, Filter Size(m)=1513, Hash Functions(k)=7 Items(n)=108

True Negatives: Counter({'zu': 200, 'zzzu': 200, 'jaggu': 200, 'ninja': 200, 'neo': 200, 'trinity': 200, 'morpheus': 200, 'shifu': 200, 'tigress': 200, 'po': 198})
True Positives: Counter({'mz': 200, 'maryam': 200, 'isabelle': 200, 'jo': 200, 'ch': 200})
False Positives: Counter({'po': 2})

False Positive Rate: 0.001
Avg Collisions on all k: 5.38
Bloom Filter BitArray Size: 190 bytes

--

--

Mohammed Zaheeruddin Malick

Tireless Learner | Geek | Father of two budding Women leaders!