Package com.spotify.voyager.jni
Class StringIndex
java.lang.Object
com.spotify.voyager.jni.StringIndex
- All Implemented Interfaces:
Closeable
,AutoCloseable
Wrapper around com.spotify.voyager.jni.Index with a simplified interface which maps the index ID
to a provided String.
StringIndex can only accommodate up to 2^31 - 1 (2.1B) items, despite typical Voyager indices allowing up to 2^63 - 1 (9e18) items.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic class
A wrapper class for nearest neighbor query results. -
Constructor Summary
ConstructorsConstructorDescriptionStringIndex
(Index.SpaceType spaceType, int numDimensions) Instantiate a new empty index with the specified space type and dimensionalityStringIndex
(Index.SpaceType spaceType, int numDimensions, long indexM, long efConstruction, long randomSeed, long maxElements, Index.StorageDataType storageDataType) Instantiate an empty index with the specified index parameters -
Method Summary
Modifier and TypeMethodDescriptionvoid
void
void
void
close()
long
Get the maximum number of elements currently storable by thisIndex
.long
float[]
static StringIndex
load
(InputStream indexInputStream, InputStream nameListInputStream) Load a previously constructed index from the provided input stream.static StringIndex
load
(InputStream indexInputStream, InputStream nameListInputStream, Index.SpaceType spaceType, int numDimensions, Index.StorageDataType storageDataType) Load a previously constructed index from the provided input streams.static StringIndex
Load a previously constructed index from the provided file location.static StringIndex
load
(String indexFilename, String nameListFilename, Index.SpaceType spaceType, int numDimensions, Index.StorageDataType storageDataType) Load a previously constructed index from the provided file location.query
(float[][] queryVectors, int numNeighbors, int numThreads, int ef) Query for multiple target vectors in parallel.query
(float[] queryVector, int numNeighbors, int ef) Find the nearest neighbors of the provided embedding.void
resizeIndex
(long newSize) Change the maximum number of elements currently storable by thisIndex
.void
saveIndex
(OutputStream indexOutputStream, OutputStream namesListOutputStream) Save the underlying HNSW index and JSON encoded names list to the provided output streamsvoid
Save the underlying index and JSON encoded name list to the provided output directoryvoid
-
Constructor Details
-
StringIndex
Instantiate a new empty index with the specified space type and dimensionality- Parameters:
spaceType
- Type of space and distance calculation used when determining distance between embeddings in the index, @see com.spotify.voyager.jni.Index.SpaceTypenumDimensions
- Number of dimensions of each embedding stored in the underlying HNSW index
-
StringIndex
public StringIndex(Index.SpaceType spaceType, int numDimensions, long indexM, long efConstruction, long randomSeed, long maxElements, Index.StorageDataType storageDataType) Instantiate an empty index with the specified index parameters- Parameters:
spaceType
- Type of space and distance calculation used when determining distance between embeddings in the index, @see com.spotify.voyager.jni.Index.SpaceTypenumDimensions
- Number of dimensions of each embedding stored in the underlying HNSW indexindexM
- Number of connections made between nodes when inserting an element into the index. Increasing this value can improve recall at the expense of higher memory usageefConstruction
- Search depth when inserting elements into the index. Increasing this value can improve recall (up to a point) at the cost of increased indexing timerandomSeed
- Random seed used during indexingmaxElements
- Initial size of the underlying HNSW indexstorageDataType
- Type to store the embedding values as, @see com.spotify.voyager.jni.StorageDataType
-
-
Method Details
-
load
public static StringIndex load(String indexFilename, String nameListFilename, Index.SpaceType spaceType, int numDimensions, Index.StorageDataType storageDataType) Load a previously constructed index from the provided file location. It is important that the dimensions, space type, and storage data type provided are the same that the index was constructed with.- Parameters:
indexFilename
- Filename of the underlying HNSW indexnameListFilename
- Filename of the JSON encoded names listspaceType
-numDimensions
- Number of dimensions of each embedding stored in the underlying HNSW indexstorageDataType
-- Returns:
- reference to the loaded StringIndex
- See Also:
-
load
public static StringIndex load(InputStream indexInputStream, InputStream nameListInputStream, Index.SpaceType spaceType, int numDimensions, Index.StorageDataType storageDataType) Load a previously constructed index from the provided input streams. It is important that the dimensions, space type, and storage data type provided are the same that the index was constructed with.- Parameters:
indexInputStream
- input stream pointing to the underlying HNSW indexnameListInputStream
- input stream pointing to the JSON encoded names listspaceType
-numDimensions
- Number of dimensions of each embedding stored in the underlying HNSW indexstorageDataType
-- Returns:
- reference to the loaded StringIndex
- See Also:
-
load
Load a previously constructed index from the provided file location. The space type, dimensions, and storage data type are read from the file metadata.- Parameters:
indexFilename
- Filename of the underlying HNSW indexnameListFilename
- Filename of the JSON encoded names list- Returns:
- reference to the loaded StringIndex
-
load
Load a previously constructed index from the provided input stream. The space type, dimensions, and storage data type are read from the file metadata.- Parameters:
indexInputStream
- input stream pointing to the underlying HNSW indexnameListInputStream
- input stream pointing to the JSON encoded names list- Returns:
- reference to the loaded StringIndex
-
saveIndex
Save the underlying index and JSON encoded name list to the provided output directory- Parameters:
outputDirectory
- directory to output files to- Throws:
IOException
- when there is an error writing to JSON or saving to disk
-
saveIndex
public void saveIndex(String outputDirectory, String indexFilename, String nameListFilename) throws IOException - Throws:
IOException
-
saveIndex
public void saveIndex(OutputStream indexOutputStream, OutputStream namesListOutputStream) throws IOException Save the underlying HNSW index and JSON encoded names list to the provided output streams- Parameters:
indexOutputStream
- output stream pointing to the location to save the HNSW indexnamesListOutputStream
- output stream pointing to the location to save the JSON names list- Throws:
IOException
- when there is an error writing to JSON or the output streams
-
addItem
-
addItem
-
addItems
-
getNumElements
public long getNumElements() -
getVector
-
query
Find the nearest neighbors of the provided embedding.- Parameters:
queryVector
- The vector to center the search around.numNeighbors
- The number of neighbors to return. The number of results returned may be smaller than this value if the index does not contain enough items.ef
- How many neighbors to explore during search when looking for nearest neighbors. Increasing this value can improve recall (up to a point) at the cost of increased search latency. The minimum value of this parameter is the requested number of neighbors, and the maximum value is the number of items in the index.- Returns:
- a QueryResults object, containing the names of the neighbors and each neighbor's distance from the query vector, sorted in ascending order of distance
-
query
public StringIndex.QueryResults[] query(float[][] queryVectors, int numNeighbors, int numThreads, int ef) Query for multiple target vectors in parallel.- Parameters:
queryVectors
- Array of query vectors to search aroundnumNeighbors
- Number of neighbors to get for each targetnumThreads
- Number of threads to use for the underlying index search. -1 uses all available CPU coresef
- Search depth in the graph- Returns:
- Array of QueryResults, one for each target vector
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
resizeIndex
public void resizeIndex(long newSize) Change the maximum number of elements currently storable by thisIndex
. This operation reallocates the memory used by the index and can be quite slow, so it may be useful to set the maximum number of elements in advance if that number is known.- Parameters:
newSize
- The new number of maximum elements to resize thisIndex
to.
-
getMaxElements
public long getMaxElements()Get the maximum number of elements currently storable by thisIndex
. If more elements are added thangetMaxElements()
, the index will be automatically (but slowly) resized.- Returns:
- The number of elements (vectors) that are currently storable in this
Index
.
-