hash

package hash

Main package for hash APIs. Import all.

import com.spotify.scio.hash._

Source: package.scala

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

hash
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Type Members

sealed trait ApproxFilter[T] extends Serializable
An approximate filter for instances of T, e.g.
An approximate filter for instances of T, e.g. a Bloom filter. A Bloom filter offers an approximate containment test with one-sided error: if it claims that an element is contained in it, this might be in error, but if it claims that an element is not contained in it, then this is definitely true.
sealed trait ApproxFilterCompanion extends AnyRef
A trait for all ApproxFilter companion objects.
implicit final class ApproxFilterIterable[T] extends AnyVal
implicit final class ApproxFilterSCollection[T] extends AnyVal
class BloomFilter[T] extends ApproxFilter[T]
An ApproxFilter implementation backed by a Guava BloomFilter.
An ApproxFilter implementation backed by a Guava BloomFilter.
Import magnolify.guava.auto._ to get common instances of Guava Funnel s.
case class MutableScalableBloomFilter[T](fpProb: Double, headCapacity: Long, growthRate: Int, tighteningRatio: Double, headFPProb: Double, headCount: Long, head: Option[google.common.hash.BloomFilter[T]], tail: List[Either[google.common.hash.BloomFilter[T], SerializedBloomFilters]])(implicit funnel: Funnel[T]) extends Serializable with Product
Import magnolify.guava.auto._ to get common instances of Guava Funnel s.
Import magnolify.guava.auto._ to get common instances of Guava Funnel s.
T
The type of objects inserted into the filter
fpProb
The initial false positive probability
headCapacity
The capacity of the filter at the head of filters
growthRate
The growth rate of each subsequent filter added to filters
tighteningRatio
The tightening ratio applied to headFPProb when scaling
headFPProb
The false positive probability of the head of filters
headCount
The number of items currently in the filter at the head of filters
head
The underlying bloom filter currently being inserted into, or None if this scalable filter has just been initialized
tail
The underlying already-saturated bloom filters, lazily deserialized from bytes as necessary.
funnel
The funnel to turn Ts into bytes
case class SerializedBloomFilters(numFilters: Int, filterBytes: Array[Byte]) extends Product with Serializable

Value Members

object BloomFilter extends ApproxFilterCompanion with Serializable
Companion object for BloomFilter.
object MutableScalableBloomFilter extends Serializable
A mutable, scalable wrapper around a Guava BloomFilter.
A mutable, scalable wrapper around a Guava BloomFilter. Not thread-safe.
Scalable bloom filters use a series of bloom filters, adding a new one and scaling its size by growthRate once the previous filter is saturated. A scalable bloom filter contains ("might contain") an item if any of its filters contains the item.
fpProb, the initial false positive probability, and tighteningRatio must be carefully chosen to maintain the desired overall false positive probability. Similarly, the initialCapacity determines how often the SBF must scale to support a given capacity and influences the effective false positive probability. See below for more details.
Import magnolify.guava.auto._ to get common instances of Guava Funnel s.
When a SBF scales a new filter is appended. The false positive probability for a series of numFilters appended filters is:
```
1 - Range(0, numFilters).map { i => (1 - fpProb * scala.math.pow(tighteningRatio, i)) }.product
```
For the defaults, the false positive probability after each append is 3%, 5.6%, 7.9%, 9.9%, 11.7% and so on. It is therefore in the interest of the user to appropriately size the initial filter so that the number of appends is small.
An approximation of the long-term upper bound of the false positive probability is fpProb / (1 - tighteningRatio). For the defaults of fpProb = 0.03 and tighteningRatio = 0.9 this gives 0.03 / (1 - 0.9), or around 30% false positives as an upper bound.
Bloom filters inherently trade precision for space. For a single filter, the number of bits is:
```
-1 * capacity * scala.math.log(fpProb) / scala.math.pow(scala.math.log(2), 2)
```
For example, for an initialCapacity of 1 million and the default fpProb of 3%, a filter will be 912 kilobytes; if fpProb is instead 0.1%, then the size grows to 1797 kilobytes.
When using scalable bloom filters, if possible, always size the first filter to be larger than the known number of items to be inserted rather than choosing a small default and letting the filters scale to fit. For example, if a SBF must contain ~65.5 million items, then with the defaults and an initialCapacity of 1000, the SBF will have scaled 16 times, will consume ~84 megabytes and have a false positive probability of ~22%. If, on the other hand, a SBF is constructed with an initial capacity of 65.5 million items, it will not have scaled at all and therefore have its initial false positive probability of 3%, and consume only ~60 megabytes.

Packages

hash

package hash

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

hash

package hash

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

hash