Packages

package sketching

Type Members

  1. case class SketchHllPlusPlus[T](p: Int, sp: Int) extends ApproxDistinctCounter[T] with Product with Serializable

    com.spotify.scio.estimators.ApproxDistinctCounter implementation for org.apache.beam.sdk.extensions.sketching.ApproximateDistinct, ApproximateDistinct estimate the distinct count using HyperLogLog++.

    com.spotify.scio.estimators.ApproxDistinctCounter implementation for org.apache.beam.sdk.extensions.sketching.ApproximateDistinct, ApproximateDistinct estimate the distinct count using HyperLogLog++.

    The HyperLogLog++ (HLL++) algorithm estimates the number of distinct values in a data stream. HLL++ is based on HyperLogLog; HLL++ more accurately estimates the number of distinct values in very large and small data streams.

    p

    Precision, Controls the accuracy of the estimation. The precision value will have an impact on the number of buckets used to store information about the distinct elements. In general one can expect a relative error of about 1.1 / sqrt(2^p). The value should be of at least 4 to guarantee a minimal accuracy.

    sp

    Sparse Precision, Uses to create a sparse representation in order to optimize memory and improve accuracy at small cardinalities. The value of sp should be greater than p(precision), but lower than 32.

Ungrouped