Reimplement similarity search using Solr #1714

ffont · 2023-12-13T15:33:18Z

We've been discussing for a long time how could we get rid of our custom similarity server (based on gaia) and move its current functionality to another service. Since last versions of Solr support nearest-neighbours (NN) search, we decided to move the similarity functionality to Solr. For this to happen, there are a number of considerations to take into account and steps to follow, some have already been done.

We mainly use gaia for similarity search, but we also use it for some advanced functionalities of the API through which users can implement complex search filters based on low-level audio descriptors (e.g., filter by pitch variance or bpm), and also can specify a number of descriptors which establish a custom similarity metric which only takes into account these descriptors to sort the results of a query. When moving to Solr, some of these features will be lost, and also users will need to change their app implementations to achieve similar results. Therefore, some actions will need to be taken to notify users about these changes and also provide alternative ways to achieve similar results.

Our current implementation of similarity search also uses gaia to apply some transformations to the audio descriptors that we calculate and convert them into a 100-dimension normalised vector used for the NN queries. If we get rid of gaia, we will need to move this functionality somewhere else so that we can continue generating these sound vectors. Also, we want to take this opportunity to introduce audio embeddings computed using pre-trained deep learning models to our similarity system, so we can test state of the art approaches to similarity search.

Here are some steps to carry out for the reimplementation of the similarity service:

The text was updated successfully, but these errors were encountered:

This means that now documents can be partially updated instead of always being completely replaced. This features is not used yet anywhere, but it will be useful when including similarity data to the search engine. #1714

ffont added New feature Something that doesn't yet exist in Freesound Improvement A functional improvement to an existing feature, that isn't urgently a bug labels Dec 13, 2023

ffont mentioned this issue Jan 24, 2024

Solr-based similarity search #1753

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplement similarity search using Solr #1714

Reimplement similarity search using Solr #1714

ffont commented Dec 13, 2023 •

edited

Loading

Reimplement similarity search using Solr #1714

Reimplement similarity search using Solr #1714

Comments

ffont commented Dec 13, 2023 • edited Loading

ffont commented Dec 13, 2023 •

edited

Loading