case class SketchHllPlusPlus[T](p: Int, sp: Int) extends ApproxDistinctCounter[T] with Product with Serializable
com.spotify.scio.estimators.ApproxDistinctCounter implementation for org.apache.beam.sdk.extensions.sketching.ApproximateDistinct, ApproximateDistinct estimate the distinct count using HyperLogLog++.
The HyperLogLog++ (HLL++) algorithm estimates the number of distinct values in a data stream. HLL++ is based on HyperLogLog; HLL++ more accurately estimates the number of distinct values in very large and small data streams.
- p
Precision, Controls the accuracy of the estimation. The precision value will have an impact on the number of buckets used to store information about the distinct elements. In general one can expect a relative error of about 1.1 / sqrt(2^p). The value should be of at least 4 to guarantee a minimal accuracy.
- sp
Sparse Precision, Uses to create a sparse representation in order to optimize memory and improve accuracy at small cardinalities. The value of sp should be greater than p(precision), but lower than 32.
- Source
- SketchHllPlusPlus.scala
- Alphabetic
- By Inheritance
- SketchHllPlusPlus
- Serializable
- Product
- Equals
- ApproxDistinctCounter
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new SketchHllPlusPlus(p: Int, sp: Int)
- p
Precision, Controls the accuracy of the estimation. The precision value will have an impact on the number of buckets used to store information about the distinct elements. In general one can expect a relative error of about 1.1 / sqrt(2^p). The value should be of at least 4 to guarantee a minimal accuracy.
- sp
Sparse Precision, Uses to create a sparse representation in order to optimize memory and improve accuracy at small cardinalities. The value of sp should be greater than p(precision), but lower than 32.
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def estimateDistinctCount(in: SCollection[T]): SCollection[Long]
Return a SCollection with single (Long)value which is the estimated distinct count in the given SCollection with type
T
Return a SCollection with single (Long)value which is the estimated distinct count in the given SCollection with type
T
- Definition Classes
- SketchHllPlusPlus → ApproxDistinctCounter
- def estimateDistinctCountPerKey[K](in: SCollection[(K, T)]): SCollection[(K, Long)]
Approximate distinct element per each key in the given key value SCollection.
Approximate distinct element per each key in the given key value SCollection. This will output estimated distinct count per each unique key.
- Definition Classes
- SketchHllPlusPlus → ApproxDistinctCounter
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val p: Int
- def productElementNames: Iterator[String]
- Definition Classes
- Product
- val sp: Int
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()