object MutableScalableBloomFilter extends Serializable
A mutable, scalable wrapper around a Guava BloomFilter. Not thread-safe.
Scalable bloom filters use a series of bloom filters, adding a new one and scaling its size by
growthRate
once the previous filter is saturated. A scalable bloom filter contains
("might
contain") an item if any of its filters contains the item.
fpProb
, the initial false positive probability, and tighteningRatio
must be carefully chosen
to maintain the desired overall false positive probability. Similarly, the initialCapacity
determines how often the SBF must scale to support a given capacity and influences the effective
false positive probability. See below for more details.
Import magnolify.guava.auto._
to get common instances of Guava
Funnel s.
When a SBF scales a new filter is appended. The false positive probability for a series of
numFilters
appended filters is:
1 - Range(0, numFilters).map { i => (1 - fpProb * scala.math.pow(tighteningRatio, i)) }.product
For the defaults, the false positive probability after each append is 3%, 5.6%, 7.9%, 9.9%, 11.7% and so on. It is therefore in the interest of the user to appropriately size the initial filter so that the number of appends is small.
An approximation of the long-term upper bound of the false positive probability is fpProb / (1 -
tighteningRatio)
. For the defaults of fpProb = 0.03
and tighteningRatio = 0.9
this gives
0.03 / (1 - 0.9)
, or around 30% false positives as an upper bound.
Bloom filters inherently trade precision for space. For a single filter, the number of bits is:
-1 * capacity * scala.math.log(fpProb) / scala.math.pow(scala.math.log(2), 2)
For example, for an initialCapacity
of 1 million and the default fpProb
of 3%, a filter will
be 912 kilobytes; if fpProb
is instead 0.1%, then the size grows to 1797 kilobytes.
When using scalable bloom filters, if possible, always size the first filter to be larger than
the known number of items to be inserted rather than choosing a small default and letting the
filters scale to fit. For example, if a SBF must contain ~65.5 million items, then with the
defaults and an initialCapacity
of 1000, the SBF will have scaled 16 times, will consume ~84
megabytes and have a false positive probability of ~22%. If, on the other hand, a SBF is
constructed with an initial capacity of 65.5 million items, it will not have scaled at all and
therefore have its initial false positive probability of 3%, and consume only ~60 megabytes.
- Alphabetic
- By Inheritance
- MutableScalableBloomFilter
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def apply[T](initialCapacity: Long, fpProb: Double = 0.03, growthRate: Int = 2, tighteningRatio: Double = 0.9)(implicit arg0: Funnel[T]): MutableScalableBloomFilter[T]
The default parameter values for this implementation are based on the findings in "Scalable Bloom Filters", Almeida, Baquero, et al.: http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf
The default parameter values for this implementation are based on the findings in "Scalable Bloom Filters", Almeida, Baquero, et al.: http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf
Import
magnolify.guava.auto._
to get common instances of Guava Funnel s.- T
The type of objects inserted into the filter
- initialCapacity
The capacity of the first filter. Must be positive
- fpProb
The initial false positive probability
- growthRate
The growth rate of each subsequent filter added to
filters
- tighteningRatio
The tightening ratio to be applied to the current
fpProb
- returns
A scalable bloom filter
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- implicit def coder[T](implicit funnel: Funnel[T]): Coder[MutableScalableBloomFilter[T]]
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def fromBytes[T](bytes: Array[Byte])(implicit funnel: Funnel[T]): MutableScalableBloomFilter[T]
Import
magnolify.guava.auto._
to get common instances of Guava Funnel s. - final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toBytes[T](sbf: MutableScalableBloomFilter[T]): Array[Byte]
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()