Packages

object MutableScalableBloomFilter extends Serializable

A mutable, scalable wrapper around a Guava BloomFilter. Not thread-safe.

Scalable bloom filters use a series of bloom filters, adding a new one and scaling its size by growthRate once the previous filter is saturated. A scalable bloom filter contains ("might contain") an item if any of its filters contains the item.

fpProb, the initial false positive probability, and tighteningRatio must be carefully chosen to maintain the desired overall false positive probability. Similarly, the initialCapacity determines how often the SBF must scale to support a given capacity and influences the effective false positive probability. See below for more details.

Import magnolify.guava.auto._ to get common instances of Guava Funnel s.

When a SBF scales a new filter is appended. The false positive probability for a series of numFilters appended filters is:

1 - Range(0, numFilters).map { i => (1 - fpProb * scala.math.pow(tighteningRatio, i)) }.product

For the defaults, the false positive probability after each append is 3%, 5.6%, 7.9%, 9.9%, 11.7% and so on. It is therefore in the interest of the user to appropriately size the initial filter so that the number of appends is small.

An approximation of the long-term upper bound of the false positive probability is fpProb / (1 - tighteningRatio). For the defaults of fpProb = 0.03 and tighteningRatio = 0.9 this gives 0.03 / (1 - 0.9), or around 30% false positives as an upper bound.

Bloom filters inherently trade precision for space. For a single filter, the number of bits is:

-1 * capacity * scala.math.log(fpProb) / scala.math.pow(scala.math.log(2), 2)

For example, for an initialCapacity of 1 million and the default fpProb of 3%, a filter will be 912 kilobytes; if fpProb is instead 0.1%, then the size grows to 1797 kilobytes.

When using scalable bloom filters, if possible, always size the first filter to be larger than the known number of items to be inserted rather than choosing a small default and letting the filters scale to fit. For example, if a SBF must contain ~65.5 million items, then with the defaults and an initialCapacity of 1000, the SBF will have scaled 16 times, will consume ~84 megabytes and have a false positive probability of ~22%. If, on the other hand, a SBF is constructed with an initial capacity of 65.5 million items, it will not have scaled at all and therefore have its initial false positive probability of 3%, and consume only ~60 megabytes.

Source
MutableScalableBloomFilter.scala
Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. MutableScalableBloomFilter
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def apply[T](initialCapacity: Long, fpProb: Double = 0.03, growthRate: Int = 2, tighteningRatio: Double = 0.9)(implicit arg0: Funnel[T]): MutableScalableBloomFilter[T]

    The default parameter values for this implementation are based on the findings in "Scalable Bloom Filters", Almeida, Baquero, et al.: http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

    The default parameter values for this implementation are based on the findings in "Scalable Bloom Filters", Almeida, Baquero, et al.: http://gsd.di.uminho.pt/members/cbm/ps/dbloom.pdf

    Import magnolify.guava.auto._ to get common instances of Guava Funnel s.

    T

    The type of objects inserted into the filter

    initialCapacity

    The capacity of the first filter. Must be positive

    fpProb

    The initial false positive probability

    growthRate

    The growth rate of each subsequent filter added to filters

    tighteningRatio

    The tightening ratio to be applied to the current fpProb

    returns

    A scalable bloom filter

  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  7. implicit def coder[T](implicit funnel: Funnel[T]): Coder[MutableScalableBloomFilter[T]]
  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  11. def fromBytes[T](bytes: Array[Byte])(implicit funnel: Funnel[T]): MutableScalableBloomFilter[T]

    Import magnolify.guava.auto._ to get common instances of Guava Funnel s.

  12. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  14. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  15. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  16. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  17. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  18. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  19. def toBytes[T](sbf: MutableScalableBloomFilter[T]): Array[Byte]
  20. def toString(): String
    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  22. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  23. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped