ChunkDot, a library for multi-threaded matrix multiplication and cosine similarity, has extended its support to sparse matrices. This allows for scaling similarity calculations for sparse vector representations often seen in natural language processing, such as tokenization and n-grams. ChunkDot splits the embedding matrix into chunks and parallelizes the calculation of cosine similarity. The library is efficient for sparse matrix densities below ~0.03, with a 100x benefit for low-density matrices. Potential improvements for ChunkDot include extending functionality to two different input matrices and adding GPU support.
source update: Scale Up Bulk Similarity Calculations for Sparse Embeddings – Towards AI
Comments
There are no comments yet.