Sorter
The sortValues
transform sorts values by a secondary key following a groupByKey
on the primary key, spilling sorting to disk if required. The memoryMB
controls the allowable in-memory overhead before the sorter spills data to disk. Keys are compared based on the byte-array representations produced by their Beam coder.
import com.spotify.scio.values.SCollection
import com.spotify.scio.extra.sorter._
val elements: SCollection[(String, (String, Int))] = ???
val sorted: SCollection[(String, Iterable[(String, Int)])] = elements
.groupByKey
.sortValues(memoryMB = 100)
0.14.8-23-c45685a-20241105T161920Z*