Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package spotify
    Definition Classes
    com
  • package scio
    Definition Classes
    spotify
  • package extra
    Definition Classes
    scio
  • package sparkey

    Main package for Sparkey side input APIs.

    Main package for Sparkey side input APIs. Import all.

    import com.spotify.scio.extra.sparkey._

    To save an SCollection[(String, String)] to a Sparkey fileset:

    val s = sc.parallelize(Seq("a" -> "one", "b" -> "two"))
    s.saveAsSparkey("gs://<bucket>/<path>/<sparkey-prefix>")
    
    // with multiple shards, sharded by MurmurHash3 of the key
    s.saveAsSparkey("gs://<bucket>/<path>/<sparkey-dir>", numShards=2)

    A previously-saved sparkey can be loaded as a side input:

    sc.sparkeySideInput("gs://<bucket>/<path>/<sparkey-prefix>")

    A sharded collection of Sparkey files can also be used as a side input by specifying a glob path:

    sc.sparkeySideInput("gs://<bucket>/<path>/<sparkey-dir>/part-*")

    When the sparkey is needed only temporarily, the save step can be elided:

    val side: SideInput[SparkeyReader] = sc
      .parallelize(Seq("a" -> "one", "b" -> "two"))
      .asSparkeySideInput

    SparkeyReader can be used like a lookup table in a side input operation:

    val main: SCollection[String] = sc.parallelize(Seq("a", "b", "c"))
    val side: SideInput[SparkeyReader] = sc
      .parallelize(Seq("a" -> "one", "b" -> "two"))
      .asSparkeySideInput
    
    main.withSideInputs(side)
      .map { (x, s) =>
        s(side).getOrElse(x, "unknown")
      }

    A SparkeyMap can store any types of keys and values, but can only be used as a SideInput:

    val main: SCollection[String] = sc.parallelize(Seq("a", "b", "c"))
    val side: SideInput[SparkeyMap[String, Int]] = sc
      .parallelize(Seq("a" -> 1, "b" -> 2, "c" -> 3))
      .asLargeMapSideInput()
    
    val objects: SCollection[MyObject] = main
      .withSideInputs(side)
      .map { (x, s) => s(side).get(x) }
      .toSCollection

    To read a static Sparkey collection and use it as a typed SideInput, use TypedSparkeyReader. TypedSparkeyReader can also accept a Caffeine cache to reduce IO and deserialization load:

    val main: SCollection[String] = sc.parallelize(Seq("a", "b", "c"))
    val cache: Cache[String, MyObject] = ...
    val side: SideInput[TypedSparkeyReader[MyObject]] = sc
      .typedSparkeySideInput("gs://<bucket>/<path>/<sparkey-prefix>", MyObject.decode, cache)
    
    val objects: SCollection[MyObject] = main
      .withSideInputs(side)
      .map { (x, s) => s(side).get(x) }
      .toSCollection
    Definition Classes
    extra
  • package instances
    Definition Classes
    sparkey
  • CachedStringSparkeyReader
  • MockByteArrayEntry
  • MockByteArraySparkeyReader
  • MockEntry
  • MockSparkeyReader
  • MockStringEntry
  • MockStringSparkeyReader
  • ShardedSparkeyReader
  • SparkeyMap
  • SparkeyMapBase
  • SparkeyReaderInstances
  • SparkeySet
  • SparkeySetBase
  • StringSparkeyReader
  • TypedSparkeyReader
c

com.spotify.scio.extra.sparkey.instances

ShardedSparkeyReader

class ShardedSparkeyReader extends SparkeyReader

A wrapper class around SparkeyReader that allows the reading of multiple Sparkey files, sharded by their keys (via MurmurHash3). At most 32,768 Sparkey files are supported.

Source
ShardedSparkeyReader.scala
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. ShardedSparkeyReader
  2. SparkeyReader
  3. Closeable
  4. AutoCloseable
  5. Iterable
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new ShardedSparkeyReader(sparkeys: Map[Short, SparkeyReader], numShards: Short)

    sparkeys

    a map of shard ID to sparkey reader

    numShards

    the total count of shards used (needed for keying as some shards may be empty)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  6. def close(): Unit
    Definition Classes
    ShardedSparkeyReader → SparkeyReader → Closeable → AutoCloseable
  7. def duplicate(): SparkeyReader
    Definition Classes
    ShardedSparkeyReader → SparkeyReader
  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  11. def forEach(arg0: Consumer[_ >: Entry <: AnyRef]): Unit
    Definition Classes
    Iterable
  12. def getAsByteArray(key: Array[Byte]): Array[Byte]
    Definition Classes
    ShardedSparkeyReader → SparkeyReader
  13. def getAsEntry(key: Array[Byte]): Entry
    Definition Classes
    ShardedSparkeyReader → SparkeyReader
  14. def getAsString(key: String): String
    Definition Classes
    ShardedSparkeyReader → SparkeyReader
  15. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. def getIndexHeader(): IndexHeader
    Definition Classes
    ShardedSparkeyReader → SparkeyReader
  17. def getLoadedBytes(): Long
    Definition Classes
    ShardedSparkeyReader → SparkeyReader
  18. def getLogHeader(): LogHeader
    Definition Classes
    ShardedSparkeyReader → SparkeyReader
  19. def getTotalBytes(): Long
    Definition Classes
    ShardedSparkeyReader → SparkeyReader
  20. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  21. def hashKey(str: String): Short
  22. def hashKey(arr: Array[Byte]): Short
  23. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  24. def iterator(): Iterator[Entry]
    Definition Classes
    ShardedSparkeyReader → SparkeyReader → Iterable
  25. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  26. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  27. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  28. val numShards: Short
  29. val sparkeys: Map[Short, SparkeyReader]
  30. def spliterator(): Spliterator[Entry]
    Definition Classes
    Iterable
  31. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  32. def toString(): String
    Definition Classes
    AnyRef → Any
  33. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  34. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  35. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from SparkeyReader

Inherited from Closeable

Inherited from AutoCloseable

Inherited from Iterable[Entry]

Inherited from AnyRef

Inherited from Any

Ungrouped