package annoy
Main package for Annoy side input APIs. Import all.
import com.spotify.scio.extra.annoy._
Two metrics are available, Angular and Euclidean.
To save an SCollection[(Int, Array[Float])]
to an Annoy file:
val s = sc.parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f)))
Save to a temporary location:
val s1 = s.asAnnoy(Angular, 40, 10)
Save to a specific location:
val s1 = s.asAnnoy(Angular, 40, 10, "gs://<bucket>/<path>")
SCollection[AnnoyUri]
can be converted into a side input:
val s = sc.parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f))) val side = s.asAnnoySideInput(metric, dimension, numTrees)
There's syntactic sugar for saving an SCollection and converting it to a side input:
val s = sc .parallelize(Seq( 1-> Array(1.2f, 3.4f), 2 -> Array(2.2f, 1.2f))) .asAnnoySideInput(metric, dimension, numTrees)
An existing Annoy file can be converted to a side input directly:
sc.annoySideInput(metric, dimension, numTrees, "gs://<bucket>/<path>")
AnnoyReader
provides nearest neighbor lookups by vector as well as item lookups:
val data = (0 until 1000).map(x => (x, Array.fill(40)(r.nextFloat()))) val main = sc.parallelize(data) val side = main.asAnnoySideInput(metric, dimension, numTrees) main.keys.withSideInput(side) .map { (i, s) => val annoyReader = s(side) // get vector by item id, allocating a new Array[Float] each time val v1 = annoyReader.getItemVector(i) // get vector by item id, copy vector into pre-allocated Array[Float] val v2 = Array.fill(dim)(-1.0f) annoyReader.getItemVector(i, v2) // get 10 nearest neighbors by vector val results = annoyReader.getNearest(v2, 10) }
- Source
- package.scala
- Alphabetic
- By Inheritance
- annoy
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- sealed abstract class AnnoyMetric extends AnyRef
- implicit final class AnnoyPairSCollection extends AnyVal
- class AnnoyReader extends AnyRef
AnnoyReader class for approximate nearest neighbor lookups.
AnnoyReader class for approximate nearest neighbor lookups. Supports vector lookup by item as well as nearest neighbor lookup by vector.
- implicit final class AnnoySCollection extends AnyVal
Enhanced version of SCollection with Annoy methods
- implicit final class AnnoyScioContext extends AnyVal
Enhanced version of ScioContext with Annoy methods.
- trait AnnoyUri extends Serializable
Represents the base URI for an Annoy tree, either on the local or a remote file system.
Value Members
- case object Angular extends AnnoyMetric with Product with Serializable
- case object Euclidean extends AnnoyMetric with Product with Serializable