abstract class SortedBucketSource[KeyType] extends BoundedSource[KV[KeyType, CoGbkResult]]
A PTransform for co-grouping sources written using compatible SortedBucketSink
transforms. It differs from org.apache.beam.sdk.transforms.join.CoGroupByKey because no
shuffle step is required, since the source files are written in pre-sorted order. Instead,
matching buckets' files are sequentially read in a merge-sort style, and outputs resulting value
groups as org.apache.beam.sdk.transforms.join.CoGbkResult.
Source compatibility
Each of the BucketedInput sources must use the same key function and hashing scheme.
Since SortedBucketSink writes an additional file representing BucketMetadata,
SortedBucketSource begins by reading each metadata file and using BucketMetadata#isCompatibleWith(BucketMetadata) to check compatibility.
The number of buckets, N, does not have to match across sources. Since that value is
required be to a power of 2, all values of N are compatible, albeit requiring a fan-out
from the source with smallest N.
- Source
- SortedBucketSource.java
- Alphabetic
- By Inheritance
- SortedBucketSource
- BoundedSource
- Source
- HasDisplayData
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
- new SortedBucketSource(sources: List[BucketedInput[_ <: AnyRef]], targetParallelism: TargetParallelism, bucketOffsetId: Int, effectiveParallelism: Int, metricsKey: String, estimatedSizeBytes: Long)
- Attributes
- protected[smb]
- new SortedBucketSource(sources: List[BucketedInput[_ <: AnyRef]], targetParallelism: TargetParallelism, metricsKey: String)
- new SortedBucketSource(sources: List[BucketedInput[_ <: AnyRef]], targetParallelism: TargetParallelism)
- new SortedBucketSource(sources: List[BucketedInput[_ <: AnyRef]])
Abstract Value Members
- abstract def comparator(): Comparator[ComparableKeyBytes]
- Attributes
- protected[smb]
- abstract def createSplitSource(splitNum: Int, totalParallelism: Int, estSplitSize: Long): SortedBucketSource[KeyType]
- returns
A split source of the implementing subtype
- Attributes
- protected[smb]
- abstract def keyTypeCoder(): Coder[KeyType]
- Attributes
- protected[smb]
- abstract def toKeyFn(): Function[ComparableKeyBytes, KeyType]
- Attributes
- protected[smb]
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- def coGbkResultSchema(): CoGbkResultSchema
- Attributes
- protected[smb]
- def createReader(options: PipelineOptions): BoundedReader[KV[KeyType, CoGbkResult]]
- Definition Classes
- SortedBucketSource → BoundedSource
- Annotations
- @Override()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def getEstimatedSizeBytes(options: PipelineOptions): Long
- Definition Classes
- SortedBucketSource → BoundedSource
- Annotations
- @Override()
- def getOrComputeSourceSpec(): SourceSpec
- Attributes
- protected[smb]
- def getOutputCoder(): Coder[KV[KeyType, CoGbkResult]]
- Definition Classes
- SortedBucketSource → Source
- Annotations
- @Override()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def populateDisplayData(builder: Builder): Unit
- Definition Classes
- SortedBucketSource → Source → HasDisplayData
- Annotations
- @Override()
- def split(desiredBundleSizeBytes: Long, options: PipelineOptions): List[_ <: BoundedSource[KV[KeyType, CoGbkResult]]]
- Definition Classes
- SortedBucketSource → BoundedSource
- Annotations
- @Override()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- def validate(): Unit
- Definition Classes
- Source
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
Deprecated Value Members
- def getDefaultOutputCoder(): Coder[KV[KeyType, CoGbkResult]]
- Definition Classes
- Source
- Annotations
- @Deprecated
- Deprecated