Packages

c

com.spotify.scio.extra.hll.sketching

SketchHllPlusPlus

case class SketchHllPlusPlus[T](p: Int, sp: Int) extends ApproxDistinctCounter[T] with Product with Serializable

com.spotify.scio.estimators.ApproxDistinctCounter implementation for org.apache.beam.sdk.extensions.sketching.ApproximateDistinct, ApproximateDistinct estimate the distinct count using HyperLogLog++.

The HyperLogLog++ (HLL++) algorithm estimates the number of distinct values in a data stream. HLL++ is based on HyperLogLog; HLL++ more accurately estimates the number of distinct values in very large and small data streams.

p

Precision, Controls the accuracy of the estimation. The precision value will have an impact on the number of buckets used to store information about the distinct elements. In general one can expect a relative error of about 1.1 / sqrt(2^p). The value should be of at least 4 to guarantee a minimal accuracy.

sp

Sparse Precision, Uses to create a sparse representation in order to optimize memory and improve accuracy at small cardinalities. The value of sp should be greater than p(precision), but lower than 32.

Source
SketchHllPlusPlus.scala
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SketchHllPlusPlus
  2. Serializable
  3. Product
  4. Equals
  5. ApproxDistinctCounter
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new SketchHllPlusPlus(p: Int, sp: Int)

    p

    Precision, Controls the accuracy of the estimation. The precision value will have an impact on the number of buckets used to store information about the distinct elements. In general one can expect a relative error of about 1.1 / sqrt(2^p). The value should be of at least 4 to guarantee a minimal accuracy.

    sp

    Sparse Precision, Uses to create a sparse representation in order to optimize memory and improve accuracy at small cardinalities. The value of sp should be greater than p(precision), but lower than 32.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def estimateDistinctCount(in: SCollection[T]): SCollection[Long]

    Return a SCollection with single (Long)value which is the estimated distinct count in the given SCollection with type T

    Return a SCollection with single (Long)value which is the estimated distinct count in the given SCollection with type T

    Definition Classes
    SketchHllPlusPlusApproxDistinctCounter
  8. def estimateDistinctCountPerKey[K](in: SCollection[(K, T)]): SCollection[(K, Long)]

    Approximate distinct element per each key in the given key value SCollection.

    Approximate distinct element per each key in the given key value SCollection. This will output estimated distinct count per each unique key.

    Definition Classes
    SketchHllPlusPlusApproxDistinctCounter
  9. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  10. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  14. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  15. val p: Int
  16. def productElementNames: Iterator[String]
    Definition Classes
    Product
  17. val sp: Int
  18. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  19. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  20. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  21. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from ApproxDistinctCounter[T]

Inherited from AnyRef

Inherited from Any

Ungrouped