Annoy
** Deprecated since Scio 0.14.11 **
Scio integrates with Spotify’s Annoy, an approximate nearest neighbors library, via annoy-java and annoy4s.
Write
A keyed SCollection with Int keys and Array[Float] vector values can be saved with asAnnoy:
import com.spotify.scio.values.SCollection
import com.spotify.scio.extra.annoy._
val metric: AnnoyMetric = ???
val numDimensions: Int = ???
val numTrees: Int = ???
val itemVectors: SCollection[(Int, Array[Float])] = ???
itemVectors.asAnnoy("gs://output-path", metric, numDimensions, numTrees)
Side Input
An Annoy file can be read directly as a SideInput with annoySideInput:
import com.spotify.scio._
import com.spotify.scio.values.SideInput
import com.spotify.scio.extra.annoy._
val sc: ScioContext = ???
val metric: AnnoyMetric = ???
val numDimensions: Int = ???
val annoySI: SideInput[AnnoyReader] = sc.annoySideInput("gs://input-path", metric, numDimensions)
Alternatively, an SCollection can be converted directly to a SideInput with @scaladoc [asAnnoySideInput](com.spotify.scio.extra.annoy.package$$AnnoyPairSCollection#asAnnoySideInput(metric:com.spotify.scio.extra.annoy.package.AnnoyMetric,dim:Int):com.spotify.scio.values.SideInput[com.spotify.scio.extra.annoy.package.AnnoyReader]):
import com.spotify.scio.values.{SCollection, SideInput}
import com.spotify.scio.extra.annoy._
val metric: AnnoyMetric = ???
val numDimensions: Int = ???
val numTrees: Int = ???
val itemVectors: SCollection[(Int, Array[Float])] = ???
val annoySI: SideInput[AnnoyReader] = itemVectors.asAnnoySideInput(metric, numDimensions, numTrees)
An AnnoyReader provides access to item vectors and their nearest neighbors:
import com.spotify.scio.values.{SCollection, SideInput}
import com.spotify.scio.extra.annoy._
val annoySI: SideInput[AnnoyReader] = ???
val elements: SCollection[Int] = ???
elements
.withSideInputs(annoySI)
.map { case (element, ctx) =>
val annoyReader: AnnoyReader = ctx(annoySI)
val vec: Array[Float] = annoyReader.getItemVector(element)
element -> annoyReader.getNearest(vec, 1)
}
0.14.19-23-4daeffd-20251023T204536Z*