Releases · surfedushare/harvester

This release is the prerelease of the new search-client inside the harvester.

There are significant changes and these stand out especially:

If the data is invalid according to our new validation layer then a product/material will be considered "inactive". We'll need to hunt these down in the admin and correct the data in the source where possible (or decide to adjust validation)
For certain queries (with "leenwoorden") multilingual products are expected to rank higher without actually being more relevant. This is a drawback of the new index schema, that has no solution at the moment.
Moving from multiple indices with different languages to a single index with multiple languages means that the single index takes on quite a bit of complexity. See for instance the new search fields variable. I’ve tried to cut down on all this repetition by introducing a way to shorthand fields. This field notation can be interpolated to all appropriate language fields.
We’re still experimenting with Pydantic and although it looks very good on first use we still need to change a lot of things to fully leverage its potential.
Once Harvester uses the new search-client on production and provides the new Metadata tree it, then will be harder for people to update the filter translations. Translations for some values will need to be done twice.

Testing can be done on /api/v1/docs and needs to include the following.

Search with no entities parameter. This should result in identical search with previous releases.
Search with entities=products:default. This should result in searching with the new index
- English search for study vocabulary
- English search for consortium
- English and Dutch search for disciplines
- Switching language in Sharekit should result in improved search (when switching to correct language)
Metadata tree should return the same tree as previous releases when no entity has been specified.
Metadata tree should return a tree with different fields when entity=products:default has been given as a parameter.
- "study_vocabulary.keyword" field replaces "study_vocabulary"
- "disciplines_normalized.keyword" replaces "learning_material_disciplines_normalized"
- "language" replaces "language.keyword"
- "published_at" replaces "publisher_date" when trying to search alphabetically. The field "modified_at" can sort on last modified date.
- "licenses" replaces "copyright" and filters based on all licenses for all files that belong to a product/material.
- "technical_types" replaces "technical_type" and allows filtering on file type of all files that belong to a product/material.
Autocomplete as well as suggestions have not changed, but use entities=products:default as parameter to use the new index.
Stats has changed slightly. It will return counts per entity and you can test this by specifying entities=products:default as parameter.
The "find document" endpoints now require a SRN instead of an external_id.
There are a number of additional fields in API responses that can be used:
- "entity" contains a string with the entity type like: products or projects (NB: a material is a product)
- "score" contains the score given by the search engine to a result (default is 0.0).
- Authors contain a "is_external" boolean, but currently it's always set to false.
- For files "priority" has been added.
- For Publinova "types" contains all file types for a product and "licenses" contains all copyright licenses for the product.
And then there are minor API fields updates. These should be double checked whether they accidentally break functionality on Publinova:
- The "highlight" field might be null, but if "text" or "description" is set the other property will be null instead of undefined.
- The fields "published_at" and "modified_at" contain dates and no longer times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: surfedushare/harvester

Overwrite API

New search client

New search client

Sharekit new files structure

Django 4.2

New harvester (live)

New harvester

MBO educational level

Pipeline refactor v2

Pipeline refactor v1