Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package spotify
    Definition Classes
    com
  • package scio
    Definition Classes
    spotify
  • package extra
    Definition Classes
    scio
  • package annoy

    Main package for Annoy side input APIs.

    Main package for Annoy side input APIs. Import all.

    import com.spotify.scio.extra.annoy._

    Two metrics are available, Angular and Euclidean.

    To save an SCollection[(Int, Array[Float])] to an Annoy file:

    val s = sc.parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))

    Save to a temporary location:

    val s1 = s.asAnnoy(Angular, 40, 10)

    Save to a specific location:

    val s1 = s.asAnnoy(Angular, 40, 10, "gs://<bucket>/<path>")

    SCollection[AnnoyUri] can be converted into a side input:

    val s = sc.parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))
    val side = s.asAnnoySideInput(metric, dimension, numTrees)

    There's syntactic sugar for saving an SCollection and converting it to a side input:

    val s = sc
      .parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))
      .asAnnoySideInput(metric, dimension, numTrees)

    An existing Annoy file can be converted to a side input directly:

    sc.annoySideInput(metric, dimension, numTrees, "gs://<bucket>/<path>")

    AnnoyReader provides nearest neighbor lookups by vector as well as item lookups:

    val data = (0 until 1000).map(x => (x, Array.fill(40)(r.nextFloat())))
    val main = sc.parallelize(data)
    val side = main.asAnnoySideInput(metric, dimension, numTrees)
    
    main.keys.withSideInput(side)
      .map { (i, s) =>
        val annoyReader = s(side)
    
        // get vector by item id, allocating a new Array[Float] each time
        val v1 = annoyReader.getItemVector(i)
    
        // get vector by item id, copy vector into pre-allocated Array[Float]
        val v2 = Array.fill(dim)(-1.0f)
        annoyReader.getItemVector(i, v2)
    
        // get 10 nearest neighbors by vector
        val results = annoyReader.getNearest(v2, 10)
      }
    Definition Classes
    extra
  • Angular
  • AnnoyMetric
  • AnnoyPairSCollection
  • AnnoyReader
  • AnnoySCollection
  • AnnoyScioContext
  • AnnoyUri
  • Euclidean
c

com.spotify.scio.extra.annoy

AnnoySCollection

implicit final class AnnoySCollection extends AnyVal

Enhanced version of SCollection with Annoy methods

Source
package.scala
Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. AnnoySCollection
  2. AnyVal
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new AnnoySCollection(self: SCollection[AnnoyUri])

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    Any
  2. final def ##: Int
    Definition Classes
    Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    Any
  4. def asAnnoySideInput(metric: AnnoyMetric, dim: Int): SideInput[AnnoyReader]

    Load Annoy index stored at AnnoyUri in this SCollection.

    Load Annoy index stored at AnnoyUri in this SCollection.

    metric

    Metric (Angular, Euclidean) used to build the Annoy index

    dim

    Number of dimensions in vectors used to build the Annoy index

    returns

    SideInput[AnnoyReader]

    Annotations
    @experimental()
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. def getClass(): Class[_ <: AnyVal]
    Definition Classes
    AnyVal → Any
  7. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  8. def toString(): String
    Definition Classes
    Any

Inherited from AnyVal

Inherited from Any

Ungrouped