class PairSCollectionFunctions[K, V] extends AnyRef
Extra functions available on SCollections of (key, value) pairs through an implicit conversion.
 Alphabetic
 By Inheritance
 PairSCollectionFunctions
 AnyRef
 Any
 Hide All
 Show All
 Public
 All
Instance Constructors
 new PairSCollectionFunctions(self: SCollection[(K, V)])
Value Members

final
def
!=(arg0: Any): Boolean
 Definition Classes
 AnyRef → Any

final
def
##(): Int
 Definition Classes
 AnyRef → Any

final
def
==(arg0: Any): Boolean
 Definition Classes
 AnyRef → Any

def
aggregateByKey[A, U](aggregator: Aggregator[V, A, U])(implicit arg0: Coder[A], arg1: Coder[U], koder: Coder[K], voder: Coder[V]): SCollection[(K, U)]
Aggregate the values of each key with Aggregator.
Aggregate the values of each key with Aggregator. First each value
V
is mapped toA
, then we reduce with a Semigroup ofA
, then finally we present the results asU
. This could be more powerful and better optimized in some cases. 
def
aggregateByKey[U](zeroValue: U)(seqOp: (U, V) ⇒ U, combOp: (U, U) ⇒ U)(implicit arg0: Coder[U], koder: Coder[K], voder: Coder[V]): SCollection[(K, U)]
Aggregate the values of each key, using given combine functions and a neutral "zero value".
Aggregate the values of each key, using given combine functions and a neutral "zero value". This function can return a different result type,
U
, than the type of the values in this SCollection,V
. Thus, we need one operation for merging aV
into aU
and one operation for merging twoU
's. To avoid memory allocation, both of these functions are allowed to modify and return their first argument instead of creating a new
U.

def
applyPerKeyDoFn[U](t: DoFn[KV[K, V], KV[K, U]])(implicit arg0: Coder[U], koder: Coder[K], vcoder: Coder[V]): SCollection[(K, U)]
Apply a DoFn that processes KVs and wrap the output in an SCollection.
 def approxQuantilesByKey(numQuantiles: Int)(implicit ord: Ordering[V], koder: Coder[K], voder: Coder[V], dummy: DummyImplicit): SCollection[(K, Iterable[V])]

def
approxQuantilesByKey(numQuantiles: Int, ord: Ordering[V])(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, Iterable[V])]
For each key, compute the values' data distribution using approximate
N
tiles.For each key, compute the values' data distribution using approximate
N
tiles. returns
a new SCollection whose values are
Iterable
s of the approximateN
tiles of the elements.

final
def
asInstanceOf[T0]: T0
 Definition Classes
 Any

def
asMapSideInput(implicit koder: Coder[K], voder: Coder[V]): SideInput[Map[K, V]]
Convert this SCollection to a SideInput, mapping keyvalue pairs of each window to a
Map[key, value]
, to be used with SCollection.withSideInputs.Convert this SCollection to a SideInput, mapping keyvalue pairs of each window to a
Map[key, value]
, to be used with SCollection.withSideInputs. It is required that each key of the input be associated with a single value.Currently, the resulting map is required to fit into memory.

def
asMultiMapSideInput(implicit koder: Coder[K], voder: Coder[V]): SideInput[Map[K, Iterable[V]]]
Convert this SCollection to a SideInput, mapping keyvalue pairs of each window to a
Map[key, Iterable[value]]
, to be used with SCollection.withSideInputs.Convert this SCollection to a SideInput, mapping keyvalue pairs of each window to a
Map[key, Iterable[value]]
, to be used with SCollection.withSideInputs. In contrast to asMapSideInput, it is not required that the keys in the input collection be unique.Currently, the resulting map is required to fit into memory.

def
batchByKey(batchSize: Long)(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, Iterable[V])]
Batches inputs to a desired batch size.
Batches inputs to a desired batch size. Batches will contain only elements of a single key.
Elements are buffered until there are batchSize elements buffered, at which point they are outputed to the output SCollection.
Windows are preserved (batches contain elements from the same window). Batches may contain elements from more than one bundle.

def
clone(): AnyRef
 Attributes
 protected[java.lang]
 Definition Classes
 AnyRef
 Annotations
 @native() @throws( ... )

def
cogroup[W1, W2, W3](that1: SCollection[(K, W1)], that2: SCollection[(K, W2)], that3: SCollection[(K, W3)])(implicit arg0: Coder[W1], arg1: Coder[W2], arg2: Coder[W3], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]
For each key k in
this
orthat1
orthat2
orthat3
, return a resulting SCollection that contains a tuple with the list of values for that key inthis
,that1
,that2
andthat3
. 
def
cogroup[W1, W2](that1: SCollection[(K, W1)], that2: SCollection[(K, W2)])(implicit arg0: Coder[W1], arg1: Coder[W2], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]
For each key k in
this
orthat1
orthat2
, return a resulting SCollection that contains a tuple with the list of values for that key inthis
,that1
andthat2
. 
def
cogroup[W](that: SCollection[(K, W)])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Iterable[V], Iterable[W]))]
For each key k in
this
orthat
, return a resulting SCollection that contains a tuple with the list of values for that key inthis
as well asthat
. 
def
combineByKey[C](createCombiner: (V) ⇒ C)(mergeValue: (C, V) ⇒ C)(mergeCombiners: (C, C) ⇒ C)(implicit arg0: Coder[C], koder: Coder[K], voder: Coder[V]): SCollection[(K, C)]
Generic function to combine the elements for each key using a custom set of aggregation functions.
Generic function to combine the elements for each key using a custom set of aggregation functions. Turns an
SCollection[(K, V)]
into a result of typeSCollection[(K, C)]
, for a "combined type"C
Note thatV
andC
can be different  for example, one might group an SCollection of type(Int, Int)
into an SCollection of type(Int, Seq[Int])
. Users provide three functions:
createCombiner
, which turns aV
into aC
(e.g., creates a oneelement list)
mergeValue
, to merge aV
into aC
(e.g., adds it to the end of a list)
mergeCombiners
, to combine twoC
's into a single one. 
def
countApproxDistinctByKey(maximumEstimationError: Double = 0.02)(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, Long)]
Count approximate number of distinct values for each key in the SCollection.
Count approximate number of distinct values for each key in the SCollection.
 maximumEstimationError
the maximum estimation error, which should be in the range
[0.01, 0.5]
.

def
countApproxDistinctByKey(sampleSize: Int)(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, Long)]
Count approximate number of distinct values for each key in the SCollection.
Count approximate number of distinct values for each key in the SCollection.
 sampleSize
the number of entries in the statistical sample; the higher this number, the more accurate the estimate will be; should be
>= 16
.

def
countByKey(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, Long)]
Count the number of elements for each key.
Count the number of elements for each key.
 returns
a new SCollection of (key, count) pairs

def
distinctByKey(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]
Return a new SCollection of (key, value) pairs without duplicates based on the keys.
Return a new SCollection of (key, value) pairs without duplicates based on the keys. The value is taken randomly for each key.
 returns
a new SCollection of (key, value) pairs

final
def
eq(arg0: AnyRef): Boolean
 Definition Classes
 AnyRef

def
equals(arg0: Any): Boolean
 Definition Classes
 AnyRef → Any

def
filterValues(f: (V) ⇒ Boolean)(implicit koder: Coder[K]): SCollection[(K, V)]
Pass each value in the keyvalue pair SCollection through a
filter
function without changing the keys. 
def
finalize(): Unit
 Attributes
 protected[java.lang]
 Definition Classes
 AnyRef
 Annotations
 @throws( classOf[java.lang.Throwable] )

def
flatMapValues[U](f: (V) ⇒ TraversableOnce[U])(implicit arg0: Coder[U], koder: Coder[K]): SCollection[(K, U)]
Pass each value in the keyvalue pair SCollection through a
flatMap
function without changing the keys. 
def
flattenValues[U](implicit arg0: Coder[U], ev: <:<[V, TraversableOnce[U]], koder: Coder[K]): SCollection[(K, U)]
Return an SCollection having its values flattened.

def
foldByKey(implicit mon: Monoid[V], koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]
Fold by key with Monoid, which defines the associative function and "zero value" for
V
.Fold by key with Monoid, which defines the associative function and "zero value" for
V
. This could be more powerful and better optimized in some cases. 
def
foldByKey(zeroValue: V)(op: (V, V) ⇒ V)(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]
Merge the values for each key using an associative function and a neutral "zero value" which may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.).

def
fullOuterJoin[W](that: SCollection[(K, W)])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Option[V], Option[W]))]
Perform a full outer join of
this
andthat
.Perform a full outer join of
this
andthat
. For each element (k, v) inthis
, the resulting SCollection will either contain all pairs (k, (Some(v), Some(w))) for w inthat
, or the pair (k, (Some(v), None)) if no elements inthat
have key k. Similarly, for each element (k, w) inthat
, the resulting SCollection will either contain all pairs (k, (Some(v), Some(w))) for v inthis
, or the pair (k, (None, Some(w))) if no elements inthis
have key k. 
final
def
getClass(): Class[_]
 Definition Classes
 AnyRef → Any
 Annotations
 @native()

def
groupByKey(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, Iterable[V])]
Group the values for each key in the SCollection into a single sequence.
Group the values for each key in the SCollection into a single sequence. The ordering of elements within each group is not guaranteed, and may even differ each time the resulting SCollection is evaluated.
Note: This operation may be very expensive. If you are grouping in order to perform an aggregation (such as a sum or average) over each key, using PairSCollectionFunctions.aggregateByKey or PairSCollectionFunctions.reduceByKey will provide much better performance.
Note: As currently implemented,
groupByKey
must be able to hold all the keyvalue pairs for any key in memory. If a key has too many values, it can result in anOutOfMemoryError
. 
def
groupWith[W1, W2, W3](that1: SCollection[(K, W1)], that2: SCollection[(K, W2)], that3: SCollection[(K, W3)])(implicit arg0: Coder[W1], arg1: Coder[W2], arg2: Coder[W3], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Iterable[V], Iterable[W1], Iterable[W2], Iterable[W3]))]
Alias for
cogroup
. 
def
groupWith[W1, W2](that1: SCollection[(K, W1)], that2: SCollection[(K, W2)])(implicit arg0: Coder[W1], arg1: Coder[W2], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Iterable[V], Iterable[W1], Iterable[W2]))]
Alias for
cogroup
. 
def
groupWith[W](that: SCollection[(K, W)])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Iterable[V], Iterable[W]))]
Alias for
cogroup
. 
def
hashCode(): Int
 Definition Classes
 AnyRef → Any
 Annotations
 @native()

def
hashPartitionByKey(numPartitions: Int): Seq[SCollection[(K, V)]]
Partition this SCollection using K.hashCode() into
n
partitionsPartition this SCollection using K.hashCode() into
n
partitions numPartitions
number of output partitions
 returns
partitioned SCollections in a
Seq

def
intersectByKey(that: SCollection[K])(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]
Return an SCollection with the pairs from
this
whose keys are inthat
.Return an SCollection with the pairs from
this
whose keys are inthat
.Unlike SCollection.intersection this preserves duplicates in
this
. 
final
def
isInstanceOf[T0]: Boolean
 Definition Classes
 Any

def
join[W](that: SCollection[(K, W)])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, W))]
Return an SCollection containing all pairs of elements with matching keys in
this
andthat
.Return an SCollection containing all pairs of elements with matching keys in
this
andthat
. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is inthis
and (k, v2) is inthat
. 
def
keys(implicit koder: Coder[K], voder: Coder[V]): SCollection[K]
Return an SCollection with the keys of each tuple.

def
leftOuterJoin[W](that: SCollection[(K, W)])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Option[W]))]
Perform a left outer join of
this
andthat
.Perform a left outer join of
this
andthat
. For each element (k, v) inthis
, the resulting SCollection will either contain all pairs (k, (v, Some(w))) for w inthat
, or the pair (k, (v, None)) if no elements inthat
have key k. 
def
mapValues[U](f: (V) ⇒ U)(implicit arg0: Coder[U], koder: Coder[K]): SCollection[(K, U)]
Pass each value in the keyvalue pair SCollection through a
map
function without changing the keys.  def maxByKey(ord: Ordering[V])(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]

def
maxByKey(implicit ord: Ordering[V], koder: Coder[K], voder: Coder[V], dummy: DummyImplicit): SCollection[(K, V)]
Return the max of values for each key as defined by the implicit
Ordering[T]
.Return the max of values for each key as defined by the implicit
Ordering[T]
. returns
a new SCollection of (key, maximum value) pairs
 def minByKey(ord: Ordering[V])(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]

def
minByKey(implicit koder: Coder[K], voder: Coder[V], ord: Ordering[V]): SCollection[(K, V)]
Return the min of values for each key as defined by the implicit
Ordering[T]
.Return the min of values for each key as defined by the implicit
Ordering[T]
. returns
a new SCollection of (key, minimum value) pairs

final
def
ne(arg0: AnyRef): Boolean
 Definition Classes
 AnyRef

final
def
notify(): Unit
 Definition Classes
 AnyRef
 Annotations
 @native()

final
def
notifyAll(): Unit
 Definition Classes
 AnyRef
 Annotations
 @native()

def
reduceByKey(op: (V, V) ⇒ V)(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]
Merge the values for each key using an associative reduce function.
Merge the values for each key using an associative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.

def
rightOuterJoin[W](that: SCollection[(K, W)])(implicit arg0: Coder[W], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Option[V], W))]
Perform a right outer join of
this
andthat
.Perform a right outer join of
this
andthat
. For each element (k, w) inthat
, the resulting SCollection will either contain all pairs (k, (Some(v), w)) for v inthis
, or the pair (k, (None, w)) if no elements inthis
have key k. 
def
sampleByKey(withReplacement: Boolean, fractions: Map[K, Double])(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]
Return a subset of this SCollection sampled by key (via stratified sampling).
Return a subset of this SCollection sampled by key (via stratified sampling).
Create a sample of this SCollection using variable sampling rates for different keys as specified by
fractions
, a key to sampling rate map, via simple random sampling with one pass over the SCollection, to produce a sample of size that's approximately equal to the sum ofmath.ceil(numItems * samplingRate)
over all key values. withReplacement
whether to sample with or without replacement
 fractions
map of specific keys to sampling rates
 returns
SCollection containing the sampled subset

def
sampleByKey(sampleSize: Int)(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, Iterable[V])]
Return a sampled subset of values for each key of this SCollection.
Return a sampled subset of values for each key of this SCollection.
 returns
a new SCollection of (key, sampled values) pairs
 val self: SCollection[(K, V)]

def
sparseIntersectByKey(that: SCollection[K], thatNumKeys: Int, computeExact: Boolean = false, fpProb: Double = 0.01)(implicit koder: Coder[K], voder: Coder[V], hash: Hash128[K]): SCollection[(K, V)]
Return an SCollection with the pairs from
this
whose keys are inthat
when the cardinality ofthis
>>that
, but neither can fit in memory (see PairHashSCollectionFunctions.hashIntersectByKey).Return an SCollection with the pairs from
this
whose keys are inthat
when the cardinality ofthis
>>that
, but neither can fit in memory (see PairHashSCollectionFunctions.hashIntersectByKey).Unlike SCollection.intersection this preserves duplicates in
this
. thatNumKeys
An estimate of the number of keys in
that
. This estimate is used to find the size and number of BloomFilters that Scio would use to prefilterthis
in a "map" step before any join. Having a value close to the actual number improves the false positives in output. WhencomputeExact
is set to true, a more accurate estimate of the number of keys inthat
would mean less shuffle when finding the exact value. computeExact
Whether or not to directly pass through bloom filter results (with a small false positive rate) or perform an additional inner join to confirm exact result set. By default this is set to false.
 fpProb
A fraction in range (0, 1) which would be the accepted false positive probability for this transform. By default when
computeExact
is set tofalse
, this reflects the probability that an output element is an incorrect intersect (meaning it may not be present inthat
) WhencomputeExact
is set totrue
, this fraction is used to find the acceptable false positive in the intermediate step before computing exact. Note: having fpProb = 0 doesn't mean an exact computation. This value along withthatNumKeys
is used for creating a BloomFilter.

def
sparseJoin[W](that: SCollection[(K, W)], thatNumKeys: Long, fpProb: Double = 0.01)(implicit arg0: Coder[W], hash: Hash128[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, W))]
Inner join for cases when
this
is much larger thanthat
which cannot fit in memory, but contains a mostly overlapping set of keys asthis
, i.e.Inner join for cases when
this
is much larger thanthat
which cannot fit in memory, but contains a mostly overlapping set of keys asthis
, i.e. when the intersection of keys is sparse inthis
. A Bloom Filter of keys inthat
is used to splitthis
into 2 partitions. Only those with keys in the filter go through the join and the rest are filtered out before the join. Read more about Bloom Filter: com.twitter.algebird.BloomFilter. thatNumKeys
An estimate of the number of keys in
that
. This estimate is used to find the size and number of BloomFilters that Scio would use to splitthis
into overlap and intersection in a "map" step before an exact join. Having a value close to the actual number improves the false positives in intermediate steps which means less shuffle. fpProb
A fraction in range (0, 1) which would be the accepted false positive probability when computing the overlap. Note: having fpProb = 0 doesn't mean that Scio would calculate an exact overlap.

def
sparseLeftOuterJoin[W](that: SCollection[(K, W)], thatNumKeys: Long, fpProb: Double = 0.01)(implicit arg0: Coder[W], hash: Hash128[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Option[W]))]
Left outer join for cases when
this
is much larger thanthat
which cannot fit in memory, but contains a mostly overlapping set of keys asthis
, i.e.Left outer join for cases when
this
is much larger thanthat
which cannot fit in memory, but contains a mostly overlapping set of keys asthis
, i.e. when the intersection of keys is sparse inthis
. A Bloom Filter of keys inthat
is used to splitthis
into 2 partitions. Only those with keys in the filter go through the join and the rest are concatenated. This is useful for joining historical aggregates with incremental updates. Read more about Bloom Filter: com.twitter.algebird.BloomFilter. thatNumKeys
An estimate of the number of keys in
that
. This estimate is used to find the size and number of BloomFilters that Scio would use to splitthis
into overlap and intersection in a "map" step before an exact join. Having a value close to the actual number improves the false positives in intermediate steps which means less shuffle. fpProb
A fraction in range (0, 1) which would be the accepted false positive probability when computing the overlap. Note: having fpProb = 0 doesn't mean that Scio would calculate an exact overlap.

def
sparseLookup[A, B](that1: SCollection[(K, A)], that2: SCollection[(K, B)], thisNumKeys: Long)(implicit arg0: Coder[A], arg1: Coder[B], hash: Hash128[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Iterable[A], Iterable[B]))]
Look up values from
that
wherethat
is much larger and keys fromthis
wont fit in memory, and is sparse inthat
.Look up values from
that
wherethat
is much larger and keys fromthis
wont fit in memory, and is sparse inthat
. A Bloom Filter of keys inthis
is used to filter out irrelevant keys inthat
. This is useful when searching for a limited number of values from one or more very large tables. Read more about Bloom Filter: com.twitter.algebird.BloomFilter. thisNumKeys
An estimate of the number of keys in
this
. This estimate is used to find the size and number of BloomFilters that Scio would use to prefilterthat
before doing a cogroup. Having a value close to the actual number improves the false positives in intermediate steps which means less shuffle.

def
sparseLookup[A, B](that1: SCollection[(K, A)], that2: SCollection[(K, B)], thisNumKeys: Long, fpProb: Double)(implicit arg0: Coder[A], arg1: Coder[B], hash: Hash128[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Iterable[A], Iterable[B]))]
Look up values from
that
wherethat
is much larger and keys fromthis
wont fit in memory, and is sparse inthat
.Look up values from
that
wherethat
is much larger and keys fromthis
wont fit in memory, and is sparse inthat
. A Bloom Filter of keys inthis
is used to filter out irrelevant keys inthat
. This is useful when searching for a limited number of values from one or more very large tables. Read more about Bloom Filter: com.twitter.algebird.BloomFilter. thisNumKeys
An estimate of the number of keys in
this
. This estimate is used to find the size and number of BloomFilters that Scio would use to prefilterthat1
andthat2
before doing a cogroup. Having a value close to the actual number improves the false positives in intermediate steps which means less shuffle. fpProb
A fraction in range (0, 1) which would be the accepted false positive probability when discarding elements of
that1
andthat2
in the prefilter step.

def
sparseLookup[A](that: SCollection[(K, A)], thisNumKeys: Long)(implicit arg0: Coder[A], hash: Hash128[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Iterable[A]))]
Look up values from
that
wherethat
is much larger and keys fromthis
wont fit in memory, and is sparse inthat
.Look up values from
that
wherethat
is much larger and keys fromthis
wont fit in memory, and is sparse inthat
. A Bloom Filter of keys inthis
is used to filter out irrelevant keys inthat
. This is useful when searching for a limited number of values from one or more very large tables. Read more about Bloom Filter: com.twitter.algebird.BloomFilter. thisNumKeys
An estimate of the number of keys in
this
. This estimate is used to find the size and number of BloomFilters that Scio would use to prefilterthat
before doing a cogroup. Having a value close to the actual number improves the false positives in intermediate steps which means less shuffle.

def
sparseLookup[A](that: SCollection[(K, A)], thisNumKeys: Long, fpProb: Double)(implicit arg0: Coder[A], hash: Hash128[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (V, Iterable[A]))]
Look up values from
that
wherethat
is much larger and keys fromthis
wont fit in memory, and is sparse inthat
.Look up values from
that
wherethat
is much larger and keys fromthis
wont fit in memory, and is sparse inthat
. A Bloom Filter of keys inthis
is used to filter out irrelevant keys inthat
. This is useful when searching for a limited number of values from one or more very large tables. Read more about Bloom Filter: com.twitter.algebird.BloomFilter. thisNumKeys
An estimate of the number of keys in
this
. This estimate is used to find the size and number of BloomFilters that Scio would use to prefilterthat
before doing a cogroup. Having a value close to the actual number improves the false positives in intermediate steps which means less shuffle. fpProb
A fraction in range (0, 1) which would be the accepted false positive probability when discarding elements of
that
in the prefilter step.

def
sparseOuterJoin[W](that: SCollection[(K, W)], thatNumKeys: Long, fpProb: Double = 0.01)(implicit arg0: Coder[W], hash: Hash128[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Option[V], Option[W]))]
Full outer join for cases when
this
is much larger thanthat
which cannot fit in memory, but contains a mostly overlapping set of keys asthis
, i.e.Full outer join for cases when
this
is much larger thanthat
which cannot fit in memory, but contains a mostly overlapping set of keys asthis
, i.e. when the intersection of keys is sparse inthis
. A Bloom Filter of keys inthat
is used to splitthis
into 2 partitions. Only those with keys in the filter go through the join and the rest are concatenated. This is useful for joining historical aggregates with incremental updates. Read more about Bloom Filter: com.twitter.algebird.BloomFilter. thatNumKeys
An estimate of the number of keys in
that
. This estimate is used to find the size and number of BloomFilters that Scio would use to splitthis
into overlap and intersection in a "map" step before an exact join. Having a value close to the actual number improves the false positives in intermediate steps which means less shuffle. fpProb
A fraction in range (0, 1) which would be the accepted false positive probability when computing the overlap. Note: having fpProb = 0 doesn't mean that Scio would calculate an exact overlap.

def
sparseRightOuterJoin[W](that: SCollection[(K, W)], thatNumKeys: Long, fpProb: Double = 0.01)(implicit arg0: Coder[W], hash: Hash128[K], koder: Coder[K], voder: Coder[V]): SCollection[(K, (Option[V], W))]
Right outer join for cases when
this
is much larger thanthat
which cannot fit in memory, but contains a mostly overlapping set of keys asthis
, i.e.Right outer join for cases when
this
is much larger thanthat
which cannot fit in memory, but contains a mostly overlapping set of keys asthis
, i.e. when the intersection of keys is sparse inthis
. A Bloom Filter of keys inthat
is used to splitthis
into 2 partitions. Only those with keys in the filter go through the join and the rest are concatenated. This is useful for joining historical aggregates with incremental updates. Read more about Bloom Filter: com.twitter.algebird.BloomFilter. thatNumKeys
An estimate of the number of keys in
that
. This estimate is used to find the size and number of BloomFilters that Scio would use to splitthis
into overlap and intersection in a "map" step before an exact join. Having a value close to the actual number improves the false positives in intermediate steps which means less shuffle. fpProb
A fraction in range (0, 1) which would be the accepted false positive probability when computing the overlap. Note: having fpProb = 0 doesn't mean that Scio would calculate an exact overlap.

def
subtractByKey(that: SCollection[K])(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]
Return an SCollection with the pairs from
this
whose keys are not inthat
.  def sumByKey(sg: Semigroup[V])(implicit koder: Coder[K], voder: Coder[V], d: DummyImplicit): SCollection[(K, V)]

def
sumByKey(implicit sg: Semigroup[V], koder: Coder[K], voder: Coder[V]): SCollection[(K, V)]
Reduce by key with Semigroup.
Reduce by key with Semigroup. This could be more powerful and better optimized than reduceByKey in some cases.

def
swap(implicit koder: Coder[K], voder: Coder[V]): SCollection[(V, K)]
Swap the keys with the values.

final
def
synchronized[T0](arg0: ⇒ T0): T0
 Definition Classes
 AnyRef

def
toString(): String
 Definition Classes
 AnyRef → Any
 def topByKey(num: Int, ord: Ordering[V])(implicit koder: Coder[K], voder: Coder[V]): SCollection[(K, Iterable[V])]

def
topByKey(num: Int)(implicit ord: Ordering[V], koder: Coder[K], voder: Coder[V], dummy: DummyImplicit): SCollection[(K, Iterable[V])]
Return the top k (largest) values for each key from this SCollection as defined by the specified implicit
Ordering[T]
.Return the top k (largest) values for each key from this SCollection as defined by the specified implicit
Ordering[T]
. returns
a new SCollection of (key, top k) pairs

def
values(implicit voder: Coder[V]): SCollection[V]
Return an SCollection with the values of each tuple.

final
def
wait(): Unit
 Definition Classes
 AnyRef
 Annotations
 @throws( ... )

final
def
wait(arg0: Long, arg1: Int): Unit
 Definition Classes
 AnyRef
 Annotations
 @throws( ... )

final
def
wait(arg0: Long): Unit
 Definition Classes
 AnyRef
 Annotations
 @native() @throws( ... )

def
withHotKeyFanout(hotKeyFanout: Int)(implicit koder: Coder[K], voder: Coder[V]): SCollectionWithHotKeyFanout[K, V]
Convert this SCollection to an SCollectionWithHotKeyFanout that uses an intermediate node to combine "hot" keys partially before performing the full combine.
Convert this SCollection to an SCollectionWithHotKeyFanout that uses an intermediate node to combine "hot" keys partially before performing the full combine.
 hotKeyFanout
constant value for every key

def
withHotKeyFanout(hotKeyFanout: (K) ⇒ Int)(implicit koder: Coder[K], voder: Coder[V]): SCollectionWithHotKeyFanout[K, V]
Convert this SCollection to an SCollectionWithHotKeyFanout that uses an intermediate node to combine "hot" keys partially before performing the full combine.
Convert this SCollection to an SCollectionWithHotKeyFanout that uses an intermediate node to combine "hot" keys partially before performing the full combine.
 hotKeyFanout
a function from keys to an integer N, where the key will be spread among N intermediate nodes for partial combining. If N is less than or equal to 1, this key will not be sent through an intermediate node.