Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package spotify
    Definition Classes
    com
  • package scio
    Definition Classes
    spotify
  • package extra
    Definition Classes
    scio
  • package sparkey

    Main package for Sparkey side input APIs.

    Main package for Sparkey side input APIs. Import all.

    import com.spotify.scio.extra.sparkey._

    To save an SCollection[(String, String)] to a Sparkey fileset:

    val s = sc.parallelize(Seq("a" -> "one", "b" -> "two"))
    s.saveAsSparkey("gs://<bucket>/<path>/<sparkey-prefix>")
    
    // with multiple shards, sharded by MurmurHash3 of the key
    s.saveAsSparkey("gs://<bucket>/<path>/<sparkey-dir>", numShards=2)

    A previously-saved sparkey can be loaded as a side input:

    sc.sparkeySideInput("gs://<bucket>/<path>/<sparkey-prefix>")

    A sharded collection of Sparkey files can also be used as a side input by specifying a glob path:

    sc.sparkeySideInput("gs://<bucket>/<path>/<sparkey-dir>/part-*")

    When the sparkey is needed only temporarily, the save step can be elided:

    val side: SideInput[SparkeyReader] = sc
      .parallelize(Seq("a" -> "one", "b" -> "two"))
      .asSparkeySideInput

    SparkeyReader can be used like a lookup table in a side input operation:

    val main: SCollection[String] = sc.parallelize(Seq("a", "b", "c"))
    val side: SideInput[SparkeyReader] = sc
      .parallelize(Seq("a" -> "one", "b" -> "two"))
      .asSparkeySideInput
    
    main.withSideInputs(side)
      .map { (x, s) =>
        s(side).getOrElse(x, "unknown")
      }

    A SparkeyMap can store any types of keys and values, but can only be used as a SideInput:

    val main: SCollection[String] = sc.parallelize(Seq("a", "b", "c"))
    val side: SideInput[SparkeyMap[String, Int]] = sc
      .parallelize(Seq("a" -> 1, "b" -> 2, "c" -> 3))
      .asLargeMapSideInput()
    
    val objects: SCollection[MyObject] = main
      .withSideInputs(side)
      .map { (x, s) => s(side).get(x) }
      .toSCollection

    To read a static Sparkey collection and use it as a typed SideInput, use TypedSparkeyReader. TypedSparkeyReader can also accept a Caffeine cache to reduce IO and deserialization load:

    val main: SCollection[String] = sc.parallelize(Seq("a", "b", "c"))
    val cache: Cache[String, MyObject] = ...
    val side: SideInput[TypedSparkeyReader[MyObject]] = sc
      .typedSparkeySideInput("gs://<bucket>/<path>/<sparkey-prefix>", MyObject.decode, cache)
    
    val objects: SCollection[MyObject] = main
      .withSideInputs(side)
      .map { (x, s) => s(side).get(x) }
      .toSCollection
    Definition Classes
    extra
  • package instances
    Definition Classes
    sparkey
  • InvalidNumShardsException
  • LargeHashSCollectionFunctions
  • PairLargeHashSCollectionFunctions
  • SparkeyIO
  • SparkeyPairSCollection
  • SparkeySCollection
  • SparkeyScioContext
  • SparkeySetSCollection
  • SparkeyUri
  • SparkeyWritable
c

com.spotify.scio.extra.sparkey

SparkeyScioContext

implicit final class SparkeyScioContext extends AnyVal

Enhanced version of ScioContext with Sparkey methods.

Source
package.scala
Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SparkeyScioContext
  2. AnyVal
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new SparkeyScioContext(self: ScioContext)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    Any
  2. final def ##: Int
    Definition Classes
    Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def cachedStringSparkeySideInput[T](basePath: String, cache: Cache[String, String]): SideInput[CachedStringSparkeyReader]

    Create a SideInput of CachedStringSparkeyReader from a SparkeyUri base path, to be used with SCollection.withSideInputs.

    Create a SideInput of CachedStringSparkeyReader from a SparkeyUri base path, to be used with SCollection.withSideInputs.

    Annotations
    @experimental()
  6. def getClass(): Class[_ <: AnyVal]
    Definition Classes
    AnyVal → Any
  7. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  8. def sparkeySideInput(basePath: String): SideInput[SparkeyReader]

    Create a SideInput of SparkeyReader from a SparkeyUri base path, to be used with SCollection.withSideInputs.

    Create a SideInput of SparkeyReader from a SparkeyUri base path, to be used with SCollection.withSideInputs. If the provided base path ends with "*", it will be treated as a sharded collection of Sparkey files.

    Annotations
    @experimental()
  9. def toString(): String
    Definition Classes
    Any
  10. def typedSparkeySideInput[T](basePath: String, decoder: (Array[Byte]) => T, cache: Cache[String, T] = null): SideInput[TypedSparkeyReader[T]]

    Create a SideInput of TypedSparkeyReader from a SparkeyUri base path, to be used with SCollection.withSideInputs.

    Create a SideInput of TypedSparkeyReader from a SparkeyUri base path, to be used with SCollection.withSideInputs. The provided decoder function will map from the underlying byte array to a JVM type, and the optional Cache object can be used to cache reads in memory after decoding.

    Annotations
    @experimental()

Inherited from AnyVal

Inherited from Any

Ungrouped