Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solr 9 #1614

Merged
merged 18 commits into from
Dec 19, 2023
Merged

Solr 9 #1614

merged 18 commits into from
Dec 19, 2023

Conversation

alastair
Copy link
Member

@alastair alastair commented Sep 20, 2022

Description

Adds a new solr 9 core and removes the old solr 4 core

The new PointInteger field in solr doesn't support being used as
a unique id field, so we switched id to a string. This requires
casting it back to an int in a few places when using it to retrieve
sounds from the database

date fields can be a DatePoint or DateRange, the Range type is used
for filtering within a range (e.g. created:[from TO to]), but the
point field must be used for ordering, so we add the created field
with both types and choose one depending on what we want to do.

Still some experiments to be done on the search results and similarity types that we will use.

Checklist:

  • fix tests that use hardcoded solr4.5 backend
  • Add test in test_search_engine_backend command for get_user_tags method
  • Add test in test_search_engine_backend command for get_pack_tags method
  • Update test_search_engine_backend to create a new core for running tests against
  • Add test for get_stream_sounds (which also performs a search in solr) and refactor some parts of it if needed
  • add a test in test_search_engine_backend scripts to test multiple-word queries.
  • Integrate alastair's local solr comparison script into search engine tests (combination of query + filter, including multiple filters and examples with OR on a filter)
  • add a test in test_search_engine_backend for geo queries
  • Add tests on date facet
  • handle difference between datepoint and daterange (created)
  • review difference in results on 5 vs 9 based on scoring
  • Configure default query operator to be AND instead of OR to replicate the default behaviour of solr < 5
  • Move from specific queryhandler to just /select with defaults set
  • see if we should use commitAfter instead of always commit
  • Use BooleanSimilarity for matches (doesn't include ranking based on number of times a word occurs in the document)
  • simplify SolrQuery builder
  • Update search documentation to suggest new style of geo search
  • Ensure that we're using json for sending and retrieving data from solr

The new PointInteger field in solr doesn't support being used as
a unique id field, so we switched id to a string. This requires
casting it back to an int in a few places when using it to retrieve
sounds from the database

date fields can be a DatePoint or DateRange, the Range type is used
for filtering within a range (e.g. created:[from TO to]), but the
point field must be used for ordering, so we add the created field
with both types and choose one depending on what we want to do.
Some sounds have the same tag multiple times, in upper and lower case.
This has the result of boosting sounds with multiple tag instances higher
in the search results (as solr sees that the term frequency is higher).
In order to prevent this unfair boost, unique and lower-case all tags
before adding to the index.
This won't prevent boosting from occurring if a word appears multiple times
in other search fields, such as description. (until we introduce the
BooleanSimilarity)
Specify all settings as query parameters instead of defaults
We're not interested in ranking based on field length or tf/idf of
the search term, we only want to know if the term is in the field
@ffont ffont merged commit daeba9e into master Dec 19, 2023
1 check passed
@ffont ffont deleted the solr9 branch December 19, 2023 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants