package estimators
Type Members
- trait ApproxDistinctCounter[T] extends AnyRef
Approximate distinct element counter for type
T
, e.g.Approximate distinct element counter for type
T
, e.g. HyperLogLog or HyperLogLog++. This has two APIs one estimate total distinct count for a given SCollection and second one estimate distinct count per each key in a key-value SCollection. - case class ApproximateUniqueCounter[T](sampleSize: Int) extends ApproxDistinctCounter[T] with Product with Serializable
ApproxDistinctCounter impl for org.apache.beam.sdk.transforms.ApproximateUnique with sample size.
ApproxDistinctCounter impl for org.apache.beam.sdk.transforms.ApproximateUnique with sample size.
Count approximate number of distinct values for each key in the SCollection.
- sampleSize
the number of entries in the statistical sample; the higher this number, the more accurate the estimate will be; should be
>= 16
.
- case class ApproximateUniqueCounterByError[T](maximumEstimationError: Double = 0.02) extends ApproxDistinctCounter[T] with Product with Serializable
ApproxDistinctCounter impl for org.apache.beam.sdk.transforms.ApproximateUnique with maximum estimation error.
ApproxDistinctCounter impl for org.apache.beam.sdk.transforms.ApproximateUnique with maximum estimation error.
Count approximate number of distinct elements in the SCollection.
- maximumEstimationError
the maximum estimation error, which should be in the range
[0.01, 0.5]