Class StringIndex

java.lang.Object
com.spotify.voyager.jni.StringIndex
All Implemented Interfaces:
Closeable, AutoCloseable

public class StringIndex extends Object implements Closeable
Wrapper around com.spotify.voyager.jni.Index with a simplified interface which maps the index ID to a provided String.

StringIndex can only accommodate up to 2^31 - 1 (2.1B) items, despite typical Voyager indices allowing up to 2^63 - 1 (9e18) items.

  • Constructor Details

    • StringIndex

      public StringIndex(Index.SpaceType spaceType, int numDimensions)
      Instantiate a new empty index with the specified space type and dimensionality
      Parameters:
      spaceType - Type of space and distance calculation used when determining distance between embeddings in the index, @see com.spotify.voyager.jni.Index.SpaceType
      numDimensions - Number of dimensions of each embedding stored in the underlying HNSW index
    • StringIndex

      public StringIndex(Index.SpaceType spaceType, int numDimensions, long indexM, long efConstruction, long randomSeed, long maxElements, Index.StorageDataType storageDataType)
      Instantiate an empty index with the specified index parameters
      Parameters:
      spaceType - Type of space and distance calculation used when determining distance between embeddings in the index, @see com.spotify.voyager.jni.Index.SpaceType
      numDimensions - Number of dimensions of each embedding stored in the underlying HNSW index
      indexM - Number of connections made between nodes when inserting an element into the index. Increasing this value can improve recall at the expense of higher memory usage
      efConstruction - Search depth when inserting elements into the index. Increasing this value can improve recall (up to a point) at the cost of increased indexing time
      randomSeed - Random seed used during indexing
      maxElements - Initial size of the underlying HNSW index
      storageDataType - Type to store the embedding values as, @see com.spotify.voyager.jni.StorageDataType
  • Method Details

    • load

      public static StringIndex load(String indexFilename, String nameListFilename, Index.SpaceType spaceType, int numDimensions, Index.StorageDataType storageDataType)
      Load a previously constructed index from the provided file location. It is important that the dimensions, space type, and storage data type provided are the same that the index was constructed with.
      Parameters:
      indexFilename - Filename of the underlying HNSW index
      nameListFilename - Filename of the JSON encoded names list
      spaceType -
      numDimensions - Number of dimensions of each embedding stored in the underlying HNSW index
      storageDataType -
      Returns:
      reference to the loaded StringIndex
      See Also:
    • load

      public static StringIndex load(InputStream indexInputStream, InputStream nameListInputStream, Index.SpaceType spaceType, int numDimensions, Index.StorageDataType storageDataType)
      Load a previously constructed index from the provided input streams. It is important that the dimensions, space type, and storage data type provided are the same that the index was constructed with.
      Parameters:
      indexInputStream - input stream pointing to the underlying HNSW index
      nameListInputStream - input stream pointing to the JSON encoded names list
      spaceType -
      numDimensions - Number of dimensions of each embedding stored in the underlying HNSW index
      storageDataType -
      Returns:
      reference to the loaded StringIndex
      See Also:
    • load

      public static StringIndex load(String indexFilename, String nameListFilename)
      Load a previously constructed index from the provided file location. The space type, dimensions, and storage data type are read from the file metadata.
      Parameters:
      indexFilename - Filename of the underlying HNSW index
      nameListFilename - Filename of the JSON encoded names list
      Returns:
      reference to the loaded StringIndex
    • load

      public static StringIndex load(InputStream indexInputStream, InputStream nameListInputStream)
      Load a previously constructed index from the provided input stream. The space type, dimensions, and storage data type are read from the file metadata.
      Parameters:
      indexInputStream - input stream pointing to the underlying HNSW index
      nameListInputStream - input stream pointing to the JSON encoded names list
      Returns:
      reference to the loaded StringIndex
    • saveIndex

      public void saveIndex(String outputDirectory) throws IOException
      Save the underlying index and JSON encoded name list to the provided output directory
      Parameters:
      outputDirectory - directory to output files to
      Throws:
      IOException - when there is an error writing to JSON or saving to disk
    • saveIndex

      public void saveIndex(String outputDirectory, String indexFilename, String nameListFilename) throws IOException
      Throws:
      IOException
    • saveIndex

      public void saveIndex(OutputStream indexOutputStream, OutputStream namesListOutputStream) throws IOException
      Save the underlying HNSW index and JSON encoded names list to the provided output streams
      Parameters:
      indexOutputStream - output stream pointing to the location to save the HNSW index
      namesListOutputStream - output stream pointing to the location to save the JSON names list
      Throws:
      IOException - when there is an error writing to JSON or the output streams
    • addItem

      public void addItem(String name, float[] vector)
    • addItem

      public void addItem(String name, List<Float> vector)
    • addItems

      public void addItems(Map<String,List<Float>> vectors)
    • getNumElements

      public long getNumElements()
    • getVector

      public float[] getVector(String name)
    • query

      public StringIndex.QueryResults query(float[] queryVector, int numNeighbors, int ef)
      Find the nearest neighbors of the provided embedding.
      Parameters:
      queryVector - The vector to center the search around.
      numNeighbors - The number of neighbors to return. The number of results returned may be smaller than this value if the index does not contain enough items.
      ef - How many neighbors to explore during search when looking for nearest neighbors. Increasing this value can improve recall (up to a point) at the cost of increased search latency. The minimum value of this parameter is the requested number of neighbors, and the maximum value is the number of items in the index.
      Returns:
      a QueryResults object, containing the names of the neighbors and each neighbor's distance from the query vector, sorted in ascending order of distance
    • query

      public StringIndex.QueryResults[] query(float[][] queryVectors, int numNeighbors, int numThreads, int ef)
      Query for multiple target vectors in parallel.
      Parameters:
      queryVectors - Array of query vectors to search around
      numNeighbors - Number of neighbors to get for each target
      numThreads - Number of threads to use for the underlying index search. -1 uses all available CPU cores
      ef - Search depth in the graph
      Returns:
      Array of QueryResults, one for each target vector
    • close

      public void close() throws IOException
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Throws:
      IOException
    • resizeIndex

      public void resizeIndex(long newSize)
      Change the maximum number of elements currently storable by this Index. This operation reallocates the memory used by the index and can be quite slow, so it may be useful to set the maximum number of elements in advance if that number is known.
      Parameters:
      newSize - The new number of maximum elements to resize this Index to.
    • getMaxElements

      public long getMaxElements()
      Get the maximum number of elements currently storable by this Index. If more elements are added than getMaxElements(), the index will be automatically (but slowly) resized.
      Returns:
      The number of elements (vectors) that are currently storable in this Index.