Packages

c

com.spotify.scio.smb.syntax

SortedBucketScioContext

final class SortedBucketScioContext extends Serializable

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SortedBucketScioContext
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new SortedBucketScioContext(self: ScioContext)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  6. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  7. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  8. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  9. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  10. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  11. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  12. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  13. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  14. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  15. def sortMergeCoGroup[K, A, B, C, D](keyClass: Class[K], a: Read[A], b: Read[B], c: Read[C], d: Read[D])(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B], arg3: Coder[C], arg4: Coder[D]): SCollection[(K, (Iterable[A], Iterable[B], Iterable[C], Iterable[D]))]
    Annotations
    @experimental()
  16. def sortMergeCoGroup[K, A, B, C, D](keyClass: Class[K], a: Read[A], b: Read[B], c: Read[C], d: Read[D], targetParallelism: TargetParallelism)(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B], arg3: Coder[C], arg4: Coder[D]): SCollection[(K, (Iterable[A], Iterable[B], Iterable[C], Iterable[D]))]

    For each key K in a or b or c or d, return a resulting SCollection that contains a tuple with the list of values for that key in a, b, c and d.

    For each key K in a or b or c or d, return a resulting SCollection that contains a tuple with the list of values for that key in a, b, c and d.

    See note on SortedBucketScioContext.sortMergeJoin for information on how an SMB cogroup differs from a regular org.apache.beam.sdk.transforms.join.CoGroupByKey operation.

    keyClass

    cogroup key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.

    targetParallelism

    the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.

    Annotations
    @experimental()
  17. def sortMergeCoGroup[K, A, B, C](keyClass: Class[K], a: Read[A], b: Read[B], c: Read[C])(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B], arg3: Coder[C]): SCollection[(K, (Iterable[A], Iterable[B], Iterable[C]))]
    Annotations
    @experimental()
  18. def sortMergeCoGroup[K, A, B, C](keyClass: Class[K], a: Read[A], b: Read[B], c: Read[C], targetParallelism: TargetParallelism)(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B], arg3: Coder[C]): SCollection[(K, (Iterable[A], Iterable[B], Iterable[C]))]

    For each key K in a or b or c, return a resulting SCollection that contains a tuple with the list of values for that key in a, b and c.

    For each key K in a or b or c, return a resulting SCollection that contains a tuple with the list of values for that key in a, b and c.

    See note on SortedBucketScioContext.sortMergeJoin for information on how an SMB cogroup differs from a regular org.apache.beam.sdk.transforms.join.CoGroupByKey operation.

    keyClass

    cogroup key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.

    targetParallelism

    the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.

    Annotations
    @experimental()
  19. def sortMergeCoGroup[K, A, B](keyClass: Class[K], a: Read[A], b: Read[B])(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B]): SCollection[(K, (Iterable[A], Iterable[B]))]
    Annotations
    @experimental()
  20. def sortMergeCoGroup[K, A, B](keyClass: Class[K], a: Read[A], b: Read[B], targetParallelism: TargetParallelism)(implicit arg0: Coder[K], arg1: Coder[A], arg2: Coder[B]): SCollection[(K, (Iterable[A], Iterable[B]))]

    For each key K in a or b return a resulting SCollection that contains a tuple with the list of values for that key in a, and b.

    For each key K in a or b return a resulting SCollection that contains a tuple with the list of values for that key in a, and b.

    See note on SortedBucketScioContext.sortMergeJoin for information on how an SMB cogroup differs from a regular org.apache.beam.sdk.transforms.join.CoGroupByKey operation.

    keyClass

    cogroup key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.

    targetParallelism

    the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.

    Annotations
    @experimental()
  21. def sortMergeGroupByKey[K, V](keyClass: Class[K], read: Read[V], targetParallelism: TargetParallelism)(implicit arg0: Coder[K], arg1: Coder[V]): SCollection[(K, Iterable[V])]

    For each key K in read return a resulting SCollection that contains a tuple with the list of values for that key in read.

    For each key K in read return a resulting SCollection that contains a tuple with the list of values for that key in read.

    See note on SortedBucketScioContext.sortMergeJoin for information on how an SMB group differs from a regular org.apache.beam.sdk.transforms.GroupByKey operation.

    keyClass

    cogroup key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.

    targetParallelism

    the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.

    Annotations
    @experimental()
  22. def sortMergeGroupByKey[K, V](keyClass: Class[K], read: Read[V])(implicit arg0: Coder[K], arg1: Coder[V]): SCollection[(K, Iterable[V])]

    For each key K in read return a resulting SCollection that contains a tuple with the list of values for that key in read.

    For each key K in read return a resulting SCollection that contains a tuple with the list of values for that key in read.

    See note on SortedBucketScioContext.sortMergeJoin for information on how an SMB group differs from a regular org.apache.beam.sdk.transforms.GroupByKey operation.

    keyClass

    grouping key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.

    Annotations
    @experimental()
  23. def sortMergeJoin[K, L, R](keyClass: Class[K], lhs: Read[L], rhs: Read[R], targetParallelism: TargetParallelism = TargetParallelism.auto())(implicit arg0: Coder[K], arg1: Coder[L], arg2: Coder[R]): SCollection[(K, (L, R))]

    Return an SCollection containing all pairs of elements with matching keys in lhs and rhs.

    Return an SCollection containing all pairs of elements with matching keys in lhs and rhs. Each pair of elements will be returned as a (k, (v1, v2)) tuple, where (k, v1) is in lhs and (k, v2) is in rhs.

    Unlike a regular PairSCollectionFunctions.join, the key information (namely, how to extract a comparable K from L and R) is remotely encoded in a org.apache.beam.sdk.extensions.smb.BucketMetadata file in the same directory as the input records. This transform requires a filesystem lookup to ensure that the metadata for each source are compatible. In return for reading pre-sorted data, the shuffle step in a typical org.apache.beam.sdk.transforms.GroupByKey operation can be eliminated.

    keyClass

    join key class. Must have a Coder in Beam's default org.apache.beam.sdk.coders.CoderRegistry as custom key coders are not supported yet.

    targetParallelism

    the desired parallelism of the job. See org.apache.beam.sdk.extensions.smb.TargetParallelism for more information.

    Annotations
    @experimental()
  24. def sortMergeTransform[K, A, B, C](keyClass: Class[K], readA: Read[A], readB: Read[B], readC: Read[C], targetParallelism: TargetParallelism): SortMergeTransformReadBuilder[K, (Iterable[A], Iterable[B], Iterable[C])]

    Perform a 3-way SortedBucketScioContext.sortMergeCoGroup operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme.

    Perform a 3-way SortedBucketScioContext.sortMergeCoGroup operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.

    Annotations
    @experimental()
  25. def sortMergeTransform[K, A, B, C](keyClass: Class[K], readA: Read[A], readB: Read[B], readC: Read[C]): SortMergeTransformReadBuilder[K, (Iterable[A], Iterable[B], Iterable[C])]

    Perform a 3-way SortedBucketScioContext.sortMergeCoGroup operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme.

    Perform a 3-way SortedBucketScioContext.sortMergeCoGroup operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.

    Annotations
    @experimental()
  26. def sortMergeTransform[K, A, B](keyClass: Class[K], readA: Read[A], readB: Read[B], targetParallelism: TargetParallelism): SortMergeTransformReadBuilder[K, (Iterable[A], Iterable[B])]

    Perform a 2-way SortedBucketScioContext.sortMergeCoGroup operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme.

    Perform a 2-way SortedBucketScioContext.sortMergeCoGroup operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.

    Annotations
    @experimental()
  27. def sortMergeTransform[K, A, B](keyClass: Class[K], readA: Read[A], readB: Read[B]): SortMergeTransformReadBuilder[K, (Iterable[A], Iterable[B])]

    Perform a 2-way SortedBucketScioContext.sortMergeCoGroup operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme.

    Perform a 2-way SortedBucketScioContext.sortMergeCoGroup operation, then immediately apply a transformation function to the merged cogroups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.

    Annotations
    @experimental()
  28. def sortMergeTransform[K, R](keyClass: Class[K], read: Read[R], targetParallelism: TargetParallelism): SortMergeTransformReadBuilder[K, Iterable[R]]

    Perform a SortedBucketScioContext.sortMergeGroupByKey operation, then immediately apply a transformation function to the merged groups and re-write using the same bucketing key and hashing scheme.

    Perform a SortedBucketScioContext.sortMergeGroupByKey operation, then immediately apply a transformation function to the merged groups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.

    Annotations
    @experimental()
  29. def sortMergeTransform[K, R](keyClass: Class[K], read: Read[R]): SortMergeTransformReadBuilder[K, Iterable[R]]

    Perform a SortedBucketScioContext.sortMergeGroupByKey operation, then immediately apply a transformation function to the merged groups and re-write using the same bucketing key and hashing scheme.

    Perform a SortedBucketScioContext.sortMergeGroupByKey operation, then immediately apply a transformation function to the merged groups and re-write using the same bucketing key and hashing scheme. By applying the write, transform, and write in the same transform, an extra shuffle step can be avoided.

    Annotations
    @experimental()
  30. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  31. def toString(): String
    Definition Classes
    AnyRef → Any
  32. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  33. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  34. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

cogroup

join

per_key

Ungrouped