-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve type search #641
Improve type search #641
Conversation
@wbazant It's immediately fun to see the synonyms displayed. A reminder of all this data we have but aren't yet using. So thanks for bringing them to the fore! The only request for change in this PR is fixing the design to handle many synonyms. Can we allow rows to expand? I'm willing to accept that prefix matching will lead to the best result for most users/searches, although it may fail in some cases.
During testing, I realized that pending types cannot be distinguished, which could lead to confusion. It isn't so serious, because pending types will get merged, but it might be worth considering flagging them somehow. We decided to include them so that a user can add a new type and then use it for subsequent locations without having to wait that the type is approved (which could take months). I also realized that matching fails because synonyms cannot realistically capture all permutations of no-space, space, and hyphenated versions (e.g. little-leaf linden, littleleaf linden, little leaf linden, small-leaved linden, ... lime, ...), which is a challenge in English, French, Portuguese (especially), and probably many others. Would it be crazy or helpful to ignore space and dash for matching? Finally, while we're on the topic, there is the option of diacritic-insensitive matching for languages that use the latin script as a base. I have this function in Javascript for the purpose: |
One more little idea, maybe interpunct instead of comma-separated for legibility and consistency with other lists? |
Switch to basic infinite list and remove react-window
Thanks for the detailed feedback! I did the following:
The list was virtualized because it's slow to render it all. I removed react-window and replaced it with a basic infinite list, like on the list page or activity page. Now the rows can have variable heights, and it looks better, thanks! I've made the tokenizer more elaborate and added some rules:
Tried to generalize it as follows: if the parent's common name appears in the child's name, but not at the start (where we don't need to add it because it will show up during typing), then add parent's name to the search reference. BTW I noticed it fails for 'European plum/Prunus domestica' because the parent there is 'Stone fruit', and in general the taxonomy of plums isn't quite right. Not an issue for a regular user because Plum/Prunus is the third term and they'll probably go for that entry!
Added cultivars as search terms
If
Added
Good suggestion! I ignored [^\w\s], so dashes, apostrophes, etc. I started ignoring space, until I realised I want the space as a feature - 'elm ' shouldn't match 'elmleaf blackberry' - so we allow word ends in input
Thanks, I did that! Did toLowerCase and then toAscii on both input and reference.
Thanks! The interpunct looks better. |
I'll merge this in since the feature is now completely gold-plated, but it's something we can tweak and add rules as we come up with them! |
Closes #635
Different resolution:
As Ethan points out, the synonyms can help find the match, but just using them in the background would produce a "matching but not sure why" experience. Meanwhile, they're quite interesting - a bit of trivia about plants of the form: okra is also known as Ladies' fingers - and might help also when browsing.
I proposed that solution originally, and from searching it seems to be borderline possible, but the library really didn't want me to do it, and it probably has drawbacks. Instead, match from the start only, and use common name + scientific name + all synonyms as possible starts.