Packages

p

com.spotify.scio

estimators

package estimators

Type Members

  1. trait ApproxDistinctCounter[T] extends AnyRef

    Approximate distinct element counter for type T, e.g.

    Approximate distinct element counter for type T, e.g. HyperLogLog or HyperLogLog++. This has two APIs one estimate total distinct count for a given SCollection and second one estimate distinct count per each key in a key-value SCollection.

  2. case class ApproximateUniqueCounter[T](sampleSize: Int) extends ApproxDistinctCounter[T] with Product with Serializable

    ApproxDistinctCounter impl for org.apache.beam.sdk.transforms.ApproximateUnique with sample size.

    ApproxDistinctCounter impl for org.apache.beam.sdk.transforms.ApproximateUnique with sample size.

    Count approximate number of distinct values for each key in the SCollection.

    sampleSize

    the number of entries in the statistical sample; the higher this number, the more accurate the estimate will be; should be >= 16.

  3. case class ApproximateUniqueCounterByError[T](maximumEstimationError: Double = 0.02) extends ApproxDistinctCounter[T] with Product with Serializable

    ApproxDistinctCounter impl for org.apache.beam.sdk.transforms.ApproximateUnique with maximum estimation error.

    ApproxDistinctCounter impl for org.apache.beam.sdk.transforms.ApproximateUnique with maximum estimation error.

    Count approximate number of distinct elements in the SCollection.

    maximumEstimationError

    the maximum estimation error, which should be in the range [0.01, 0.5]

Ungrouped