Improve type search #641

wbazant · 2024-12-14T12:06:24Z

Closes #635

Different resolution:

show the synonyms when selecting

As Ethan points out, the synonyms can help find the match, but just using them in the background would produce a "matching but not sure why" experience. Meanwhile, they're quite interesting - a bit of trivia about plants of the form: okra is also known as Ladies' fingers - and might help also when browsing.

don't reorder search results

I proposed that solution originally, and from searching it seems to be borderline possible, but the library really didn't want me to do it, and it probably has drawbacks. Instead, match from the start only, and use common name + scientific name + all synonyms as possible starts.

ezwelty · 2025-01-26T13:22:10Z

@wbazant It's immediately fun to see the synonyms displayed. A reminder of all this data we have but aren't yet using. So thanks for bringing them to the fore!

The only request for change in this PR is fixing the design to handle many synonyms. Can we allow rows to expand?

I'm willing to accept that prefix matching will lead to the best result for most users/searches, although it may fail in some cases.

Search "mulberry" and expect to be able to choose from a list like "black mulberry", "red mulberry", "white mulberry" rather than just get "mulberry"
Search by cultivar name "Reinette ..." and find nothing. This could be solved by parsing cultivar names from scientific names (or perhaps return a list of cultivars from the API) and add them to the search bucket?
Search for anything whose common name starts with "common ....". This could be solved by always including the second part as a synonym (e.g. "common yarrow", "yarrow").

During testing, I realized that pending types cannot be distinguished, which could lead to confusion. It isn't so serious, because pending types will get merged, but it might be worth considering flagging them somehow. We decided to include them so that a user can add a new type and then use it for subsequent locations without having to wait that the type is approved (which could take months).

I also realized that matching fails because synonyms cannot realistically capture all permutations of no-space, space, and hyphenated versions (e.g. little-leaf linden, littleleaf linden, little leaf linden, small-leaved linden, ... lime, ...), which is a challenge in English, French, Portuguese (especially), and probably many others. Would it be crazy or helpful to ignore space and dash for matching?

Finally, while we're on the topic, there is the option of diacritic-insensitive matching for languages that use the latin script as a base. I have this function in Javascript for the purpose:
https://github.com/ezwelty/opentrees-harvester/blob/57110ccd51e5078665639ea593f799a7d59f9889/lib/helpers.js#L659

ezwelty · 2025-01-26T13:47:06Z

One more little idea, maybe interpunct instead of comma-separated for legibility and consistency with other lists?

Switch to basic infinite list and remove react-window

wbazant · 2025-01-31T16:28:00Z

Thanks for the detailed feedback! I did the following:

Can we allow rows to expand?

The list was virtualized because it's slow to render it all. I removed react-window and replaced it with a basic infinite list, like on the list page or activity page. Now the rows can have variable heights, and it looks better, thanks!

I've made the tokenizer more elaborate and added some rules:

Search "mulberry" and expect to be able to choose from a list like "black mulberry", "red mulberry", "white mulberry" rather than just get "mulberry"

Tried to generalize it as follows: if the parent's common name appears in the child's name, but not at the start (where we don't need to add it because it will show up during typing), then add parent's name to the search reference. BTW I noticed it fails for 'European plum/Prunus domestica' because the parent there is 'Stone fruit', and in general the taxonomy of plums isn't quite right. Not an issue for a regular user because Plum/Prunus is the third term and they'll probably go for that entry!

Search by cultivar name "Reinette ..."

Added cultivars as search terms

Search for anything whose common name starts with "common ....".

If commonName.toLowerCase().startsWith('common '), copy, strip "[Cc]ommon\s+", and add to the reference

pending types

Added (Pending Review) to common name- could be done in a fancier way but this should be clear enough

Would it be crazy or helpful to ignore space and dash for matching?

Good suggestion! I ignored [^\w\s], so dashes, apostrophes, etc. I started ignoring space, until I realised I want the space as a feature - 'elm ' shouldn't match 'elmleaf blackberry' - so we allow word ends in input

diacritic-insensitive matching

Thanks, I did that! Did toLowerCase and then toAscii on both input and reference.

maybe interpunct

Thanks! The interpunct looks better.

wbazant · 2025-01-31T16:38:19Z

I'll merge this in since the feature is now completely gold-plated, but it's something we can tweak and add rules as we come up with them!

wbazant added 2 commits December 14, 2024 11:20

Display synonyms when selecting type

6c187e5

Match search terms from token start

78937de

wbazant mentioned this pull request Dec 15, 2024

Change flat type selection in form to hierarchical type filter #230

Closed

wbazant added 3 commits January 29, 2025 10:51

Allow variable heights in type selection

11d85ac

Switch to basic infinite list and remove react-window

More elaborate tokenization

bd5c3c4

Larger increment for scrolling

66a791e

wbazant merged commit 3a81a16 into falling-fruit:main Jan 31, 2025
1 check passed

ezwelty mentioned this pull request Feb 6, 2025

Revisit the ignoring of characters in type search #684

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve type search #641

Improve type search #641

wbazant commented Dec 14, 2024

ezwelty commented Jan 26, 2025

ezwelty commented Jan 26, 2025

wbazant commented Jan 31, 2025 •

edited

Loading

wbazant commented Jan 31, 2025

Improve type search #641

Improve type search #641

Conversation

wbazant commented Dec 14, 2024

ezwelty commented Jan 26, 2025

ezwelty commented Jan 26, 2025

wbazant commented Jan 31, 2025 • edited Loading

wbazant commented Jan 31, 2025

wbazant commented Jan 31, 2025 •

edited

Loading