Bloom Filters

Due
Fri Nov 22 11:59 pm
Starter files
lab9.zip
Submission
Please submit your work using Gradescope.

Introduction

Your task is to implement a Bloom filter, analyze its false postive rate, and conduct an experiment to empirically measure its false positive rate.

Bloom filters are described in the book (see exercise DLN 10.99) and on this site.

Overview of Starter Code

You are expected to modify these files:

  • BloomFilter.java – where you implement your BloomFilter

You are expected to read, use, but not necessarily modify these files:

  • HashFamily.java – interface for hash families (just one method)
  • HashFamilyDumb.java – a concrete implementation of a hash family. This implementation is dumb but is useful for testing.
  • HashFamilySmart.java – a somehwat smarter implementation of a hash family

Analysis

In the method called selectBestKParameter you should analytically determine the best setting of $k$ for the given $m$ and $n$. Try all values of $k$ from 1 to 100 and see which one has the lowest false positive rate. You should compute the false positive rate using the formula that is derived in a series of exercises (see DLN 10.99 to 10.103).

Experiment

In the method called runExperiment you should do the following experiment.

  • Create a bloom filter with m = 1,000,000 slots and k = 7 hash functions from the smart hash family.
  • Important: The bloom filter should be parameterized to store data of type String, not Integer.
  • Insert n = 100,000 strings. The strings inserted should be: “0”, “2”, “4”, …, “199998”. These should be strings and not integers.
  • Lookup n = 100,000 strings. The strings that are looked up should be: “1”, “3”, “5”, …, “199999”
  • Caclculate the false positive rate: for what fraction of the strings that are looked up did the bloom filter return true.
  • runExperiment should return the false positive rate.