Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package spotify
    Definition Classes
    com
  • package scio
    Definition Classes
    spotify
  • package extra
    Definition Classes
    scio
  • package annoy

    Main package for Annoy side input APIs.

    Main package for Annoy side input APIs. Import all.

    import com.spotify.scio.extra.annoy._

    Two metrics are available, Angular and Euclidean.

    To save an SCollection[(Int, Array[Float])] to an Annoy file:

    val s = sc.parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))

    Save to a temporary location:

    val s1 = s.asAnnoy(Angular, 40, 10)

    Save to a specific location:

    val s1 = s.asAnnoy(Angular, 40, 10, "gs://<bucket>/<path>")

    SCollection[AnnoyUri] can be converted into a side input:

    val s = sc.parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))
    val side = s.asAnnoySideInput(metric, dimension, numTrees)

    There's syntactic sugar for saving an SCollection and converting it to a side input:

    val s = sc
      .parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))
      .asAnnoySideInput(metric, dimension, numTrees)

    An existing Annoy file can be converted to a side input directly:

    sc.annoySideInput(metric, dimension, numTrees, "gs://<bucket>/<path>")

    AnnoyReader provides nearest neighbor lookups by vector as well as item lookups:

    val data = (0 until 1000).map(x => (x, Array.fill(40)(r.nextFloat())))
    val main = sc.parallelize(data)
    val side = main.asAnnoySideInput(metric, dimension, numTrees)
    
    main.keys.withSideInput(side)
      .map { (i, s) =>
        val annoyReader = s(side)
    
        // get vector by item id, allocating a new Array[Float] each time
        val v1 = annoyReader.getItemVector(i)
    
        // get vector by item id, copy vector into pre-allocated Array[Float]
        val v2 = Array.fill(dim)(-1.0f)
        annoyReader.getItemVector(i, v2)
    
        // get 10 nearest neighbors by vector
        val results = annoyReader.getNearest(v2, 10)
      }
    Definition Classes
    extra
  • Angular
  • AnnoyMetric
  • AnnoyPairSCollection
  • AnnoyReader
  • AnnoySCollection
  • AnnoyScioContext
  • AnnoyUri
  • Euclidean
c

com.spotify.scio.extra.annoy

AnnoyPairSCollection

implicit final class AnnoyPairSCollection extends AnyVal

Source
package.scala
Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. AnnoyPairSCollection
  2. AnyVal
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new AnnoyPairSCollection(self: SCollection[(Int, Array[Float])])

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    Any
  2. final def ##: Int
    Definition Classes
    Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    Any
  4. def asAnnoy(metric: AnnoyMetric, dim: Int, nTrees: Int): SCollection[AnnoyUri]

    Write the key-value pairs of this SCollection as an Annoy file to a temporary location, building the trees in the index according to the parameters provided.

    Write the key-value pairs of this SCollection as an Annoy file to a temporary location, building the trees in the index according to the parameters provided.

    nTrees

    Number of trees to build. More trees means more precision & bigger indices. If nTrees is set to -1, the trees will automatically be built in such a way that they will take at most 2x the memory of the vectors.

    returns

    A singleton SCollection containing the AnnoyUri of the saved files

    Annotations
    @experimental()
  5. def asAnnoy(path: String, metric: AnnoyMetric, dim: Int, nTrees: Int): SCollection[AnnoyUri]

    Write the key-value pairs of this SCollection as an Annoy file to a specific location, building the trees in the index according to the parameters provided.

    Write the key-value pairs of this SCollection as an Annoy file to a specific location, building the trees in the index according to the parameters provided.

    path

    Can be either a local file or a GCS location e.g. gs://<bucket>/<path>

    metric

    One of Angular (cosine distance) or Euclidean

    dim

    Number of dimensions in vectors

    nTrees

    Number of trees to build. More trees means more precision & bigger indices. If nTrees is set to -1, the trees will automatically be built in such a way that they will take at most 2x the memory of the vectors.

    returns

    A singleton SCollection containing the AnnoyUri of the saved files

    Annotations
    @experimental()
  6. def asAnnoySideInput(metric: AnnoyMetric, dim: Int, nTrees: Int): SideInput[AnnoyReader]

    Write the key-value pairs of this SCollection as an Annoy file to a temporary location, building the trees in the index according to the parameters provided, then load the trees as a side input.

    Write the key-value pairs of this SCollection as an Annoy file to a temporary location, building the trees in the index according to the parameters provided, then load the trees as a side input.

    metric

    One of Angular (cosine distance) or Euclidean

    dim

    Number of dimensions in vectors

    nTrees

    Number of trees to build. More trees means more precision & bigger indices. If nTrees is set to -1, the trees will automatically be built in such a way that they will take at most 2x the memory of the vectors.

    returns

    SideInput[AnnoyReader]

    Annotations
    @experimental()
  7. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  8. def getClass(): Class[_ <: AnyVal]
    Definition Classes
    AnyVal → Any
  9. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  10. def toString(): String
    Definition Classes
    Any

Inherited from AnyVal

Inherited from Any

Ungrouped