Query
I’m contemplating the usage of Cuckoo filter for a enterprise case. To simplify the reason right here is an analogy of my wants:
- There are over $n = 30 000$ first names that exists in the entire phrase
- I’ve a Cuckoo filter storing the names of people that gave me a present no less than as soon as
- I’m sure that the Cuckoo filter won’t need to retailer greater than 600 names
- The names the Cuckoo filter will retailer are evenly distributed throughout the 30 000 names
- After all the ultimate purpose is to have the ability to know which names gave me a present no less than as soon as, I might need to question for any of the 300 000 names.
I wish to design my filter for 600 parts; nevertheless the unique paper about Cuckoo filters considers that the variety of buckets $m$ is a a number of of $n$.
Now take into account a development course of that inserts $n$ random gadgets to an empty desk of $m=cn$ buckets for a relentless $c$
Am I speculated to dimension the filter in response to set of all names on the planet ? Is there one thing I’m lacking ? This isn’t addressed within the unique publication.
Try at a self reply
Contemplating that the fingerprint is $f$ bits lengthy, and I’ve $m$ buckets, the chance that the title Bob has the identical options (index and fingerprint) because the title Alice is :
$$
frac{1}{2^f} cdot frac{1}{m}
$$
Which signifies that the chance that there’s one other title colliding with the title Anna is:
$$
(n – 1) cdot frac{1}{2^f} cdot frac{1}{m}
$$
With 30 000 names, a fingerprint of eight bits and 600 buckets, the result’s $0.39$ which implies a false optimistic charge of 39 %.
To scale back the false optimistic charge I can both enhance the fingerprint dimension or the variety of buckets. Utilizing 6000 buckets as a substitute of 600 provides me a false optimistic charge of 3.9%. This hack may be working in my instance case however in actuality we’re speaking of $10^{19}$ names in the entire world and $10000$ names to retailer within the filter.
It appears to me that Cuckoo filter weren’t designed with this use case in thoughts, and when individuals are utilizing Cuckoo filter, they hope to retailer virtually each present merchandise in there in some unspecified time in the future.