class PairSkewedSCollectionFunctions[K, V] extends AnyRef
Extra functions available on SCollections of (key, value) pairs for skewed joins through an implicit conversion.
- Alphabetic
- By Inheritance
- PairSkewedSCollectionFunctions
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new PairSkewedSCollectionFunctions(self: SCollection[(K, V)])
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val self: SCollection[(K, V)]
- def skewedFullOuterJoin[W](rhs: SCollection[(K, W)], cms: SCollection[TopCMS[K]]): SCollection[(K, (Option[V], Option[W]))]
N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
Perform a skewed full outer join where some keys on the left hand may be hot. Frequency of a key is estimated with
1 - delta
probability, and the estimate is withineps * N
of the true frequency.true frequency <= estimate <= true frequency + eps * N
where N is the total size of the left hand side stream so far.
- cms
left hand side key com.twitter.algebird.TopCMS
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = TopNCMS.aggregator[K](eps, delta, seed, count) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedFullOuterJoin(logMetadata, hotKeyCMS)
Read more about TopCMS: com.twitter.algebird.TopCMS.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - def skewedFullOuterJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]]): SCollection[(K, (Option[V], Option[W]))]
N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
Perform a skewed full outer join where some keys on the left hand may be hot, i.e.appear more than
hotKeyThreshold
times. Frequency of a key is estimated with1 - delta
probability, and the estimate is withineps * N
of the true frequency.true frequency <= estimate <= true frequency + eps * N
where N is the total size of the left hand side stream so far.
- hotKeyThreshold
key with
hotKeyThreshold
values will be considered hot. Some runners have inefficientGroupByKey
implementation for groups with more than 10K values. Thus it is recommended to sethotKeyThreshold
to below 10K, keep upper estimation error in mind.- cms
left hand side key com.twitter.algebird.CMSMonoid
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = CMS.aggregator[K](eps, delta, seed) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedFullOuterJoin(logMetadata, hotKeyThreshold=8500, cms=hotKeyCMS)
Read more about CMS: com.twitter.algebird.CMSMonoid.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - def skewedFullOuterJoin[W](rhs: SCollection[(K, W)], hotKeyMethod: HotKeyMethod = SkewedJoins.DefaultHotKeyMethod, hotKeyFanout: Int = SkewedJoins.DefaultHotKeyFanout, cmsEps: Double = SkewedJoins.DefaultCmsEpsilon, cmsDelta: Double = SkewedJoins.DefaultCmsDelta, cmsSeed: Int = SkewedJoins.DefaultCmsSeed, sampleFraction: Double = SkewedJoins.DefaultSampleFraction, sampleWithReplacement: Boolean = SkewedJoins.DefaultSampleWithReplacement)(implicit hasher: CMSHasher[K]): SCollection[(K, (Option[V], Option[W]))]
N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.fullOuterJoin.
Perform a skewed full join where some keys on the left hand may be hot. Frequency of a key is estimated with
1 - delta
probability, and the estimate is withineps * N
of the true frequency.true frequency <= estimate <= true frequency + eps * N
where N is the total size of the left hand side stream so far.
- hotKeyMethod
Method used to compute hot-keys from the left side collection.
- hotKeyFanout
The number of intermediate keys that will be used during the CMS computation.
- cmsEps
One-sided error bound on the error of each point query, i.e. frequency estimate. Must lie in
(0, 1)
.- cmsDelta
A bound on the probability that a query estimate does not lie within some small interval (an interval that depends on
eps
) around the truth. Must lie in(0, 1)
.- cmsSeed
A seed to initialize the random number generator used to create the pairwise independent hash functions.
- sampleFraction
left side sample fraction.
- sampleWithReplacement
whether to use sampling with replacement, see SCollection.sample.
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val p = logs.skewedFullOuterJoin(logMetadata)
Read more about CMS: com.twitter.algebird.CMS.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - def skewedJoin[W](rhs: SCollection[(K, W)], cms: SCollection[TopCMS[K]]): SCollection[(K, (V, W))]
N to 1 skew-proof flavor of PairSCollectionFunctions.join.
N to 1 skew-proof flavor of PairSCollectionFunctions.join.
Perform a skewed join where some keys on the left hand may be hot. Frequency of a key is estimated with
1 - delta
probability, and the estimate is withineps * N
of the true frequency.true frequency <= estimate <= true frequency + eps * N
where N is the total size of the left hand side stream so far.
- cms
left hand side key com.twitter.algebird.TopCMS
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = TopNCMS.aggregator[K](eps, delta, seed, count) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedJoin(logMetadata, hotKeyCMS)
Read more about TopCMS: com.twitter.algebird.TopCMS.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - def skewedJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]]): SCollection[(K, (V, W))]
N to 1 skew-proof flavor of PairSCollectionFunctions.join.
N to 1 skew-proof flavor of PairSCollectionFunctions.join.
Perform a skewed join where some keys on the left hand may be hot, i.e. appear more than
hotKeyThreshold
times. Frequency of a key is estimated with1 - delta
probability, and the estimate is withineps * N
of the true frequency.true frequency <= estimate <= true frequency + eps * N
where N is the total size of the left hand side stream so far.
- hotKeyThreshold
key with
hotKeyThreshold
values will be considered hot. Some runners have inefficientGroupByKey
implementation for groups with more than 10K values. Thus it is recommended to sethotKeyThreshold
to below 10K, keep upper estimation error in mind.- cms
left hand side key com.twitter.algebird.CMS
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = CMS.aggregator[K](eps, delta, seed) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedJoin(logMetadata, hotKeyThreshold=8500, cms=hotKeyCMS)
Read more about CMS: com.twitter.algebird.CMS.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - def skewedJoin[W](rhs: SCollection[(K, W)], hotKeyMethod: HotKeyMethod = SkewedJoins.DefaultHotKeyMethod, hotKeyFanout: Int = SkewedJoins.DefaultHotKeyFanout, cmsEps: Double = SkewedJoins.DefaultCmsEpsilon, cmsDelta: Double = SkewedJoins.DefaultCmsDelta, cmsSeed: Int = SkewedJoins.DefaultCmsSeed, sampleFraction: Double = SkewedJoins.DefaultSampleFraction, sampleWithReplacement: Boolean = SkewedJoins.DefaultSampleWithReplacement)(implicit hasher: CMSHasher[K]): SCollection[(K, (V, W))]
N to 1 skew-proof flavor of PairSCollectionFunctions.join.
N to 1 skew-proof flavor of PairSCollectionFunctions.join.
- hotKeyMethod
Method used to compute hot-keys from the left side collection.
- hotKeyFanout
The number of intermediate keys that will be used during the CMS computation.
- cmsEps
One-sided error bound on the error of each point query, i.e. frequency estimate. Must lie in
(0, 1)
.- cmsDelta
A bound on the probability that a query estimate does not lie within some small interval (an interval that depends on
eps
) around the truth. Must lie in(0, 1)
.- cmsSeed
A seed to initialize the random number generator used to create the pairwise independent hash functions.
- sampleFraction
left side sample fraction.
- sampleWithReplacement
whether to use sampling with replacement, see SCollection.sample.
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val p = logs.skewedJoin(logMetadata)
Read more about CMS: com.twitter.algebird.CMS.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - def skewedLeftOuterJoin[W](rhs: SCollection[(K, W)], cms: SCollection[TopCMS[K]]): SCollection[(K, (V, Option[W]))]
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
Perform a skewed left join where some keys on the left hand may be hot. Frequency of a key is estimated with
1 - delta
probability, and the estimate is withineps * N
of the true frequency.true frequency <= estimate <= true frequency + eps * N
where N is the total size of the left hand side stream so far.
- cms
left hand side key com.twitter.algebird.TopCMS
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = TopNCMS.aggregator[K](eps, delta, seed, count) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedLeftOuterJoin(logMetadata, hotKeyCMS)
Read more about TopCMS: com.twitter.algebird.TopCMS.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - def skewedLeftOuterJoin[W](rhs: SCollection[(K, W)], hotKeyThreshold: Long, cms: SCollection[CMS[K]]): SCollection[(K, (V, Option[W]))]
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
Perform a skewed left join where some keys on the left hand may be hot, i.e. appear more than
hotKeyThreshold
times. Frequency of a key is estimated with1 - delta
probability, and the estimate is withineps * N
of the true frequency.true frequency <= estimate <= true frequency + eps * N
where N is the total size of the left hand side stream so far.
- hotKeyThreshold
key with
hotKeyThreshold
values will be considered hot. Some runners have inefficientGroupByKey
implementation for groups with more than 10K values. Thus it is recommended to sethotKeyThreshold
to below 10K, keep upper estimation error in mind.- cms
left hand side key com.twitter.algebird.CMS
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val keyAggregator = CMS.aggregator[K](eps, delta, seed) val hotKeyCMS = self.keys.aggregate(keyAggregator) val p = logs.skewedLeftOuterJoin(logMetadata, hotKeyThreshold=8500, cms=hotKeyCMS)
Read more about CMS: com.twitter.algebird.CMS.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - def skewedLeftOuterJoin[W](rhs: SCollection[(K, W)], hotKeyMethod: HotKeyMethod = SkewedJoins.DefaultHotKeyMethod, hotKeyFanout: Int = SkewedJoins.DefaultHotKeyFanout, cmsEps: Double = SkewedJoins.DefaultCmsEpsilon, cmsDelta: Double = SkewedJoins.DefaultCmsDelta, cmsSeed: Int = SkewedJoins.DefaultCmsSeed, sampleFraction: Double = SkewedJoins.DefaultSampleFraction, sampleWithReplacement: Boolean = SkewedJoins.DefaultSampleWithReplacement)(implicit hasher: CMSHasher[K]): SCollection[(K, (V, Option[W]))]
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
N to 1 skew-proof flavor of PairSCollectionFunctions.leftOuterJoin.
Perform a skewed left join where some keys on the left hand may be hot. Frequency of a key is estimated with
1 - delta
probability, and the estimate is withineps * N
of the true frequency.true frequency <= estimate <= true frequency + eps * N
where N is the total size of the left hand side stream so far.
- hotKeyMethod
Method used to compute hot-keys from the left side collection.
- hotKeyFanout
The number of intermediate keys that will be used during the CMS computation.
- cmsEps
One-sided error bound on the error of each point query, i.e. frequency estimate. Must lie in
(0, 1)
.- cmsDelta
A bound on the probability that a query estimate does not lie within some small interval (an interval that depends on
eps
) around the truth. Must lie in(0, 1)
.- cmsSeed
A seed to initialize the random number generator used to create the pairwise independent hash functions.
- sampleFraction
left side sample fraction.
- sampleWithReplacement
whether to use sampling with replacement, see SCollection.sample.
// Implicits that enabling CMS-hashing import com.twitter.algebird.CMSHasherImplicits._ val p = logs.skewedLeftJoin(logMetadata)
Read more about CMS: com.twitter.algebird.CMS.
- Note
Make sure to
import com.twitter.algebird.CMSHasherImplicits
before using this join.
Example: - final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()