-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: False positives with partial matches #25
Comments
hi, HEmile I have a question:I downloaded the file from github and added it to the obsidian plugin library, and when I opened the plugin from obsidian, it said "Failed to load plugin obsidian-sidekick", I really don't know which file to add to the obsidian plugin library. Please advise, thank you~ |
Thanks for raising this.
No issues at all. You are doing me a great service testing for me as you are. I've been heads down focusing on supporting the selection of multiple words, single words, and "stemmed" single words in response to #19. This will change the behaviour that you are seeing again. I'll test the scenarios that you've raised to make sure you don't get the false positive examples you shared. |
I see, in that case it might be intentional.. Ideally, I'd prefer the option to turn this off. Stemming on single word (or final word) replacements does sound very useful though |
Can you try the latest version that I just released - 1.5.0? I've made sure that "stop words" are never indexed, so it should resolve the issue that you experienced. |
The partial match option is giving me too many matches to be useful in version 1.5.0. Here's an example from a basic note in my vault. The only match that might've had something useful was "principle" (and it didn't, but I can see where it could have). "30" is matching to every entry in my daily notes that happened to be on the 30th of a month. This might be more useful to me if partial matches were limited. In my vault, for example, if only one or two notes match a partial, that might be useful. If a dozen match, it's just a common word and unlikely to have a meaningful link. |
Thanks for the feedback @Jinnayah. It sounds like any numbers in the text should never be highlighted and considered a stop word. That's something I can do. Do you have any suggestions for how we can make partial matches more informed? There is feedback that people want it. But how do you think we can identify and surface a relevant partial match? Alternatively. Do you think that when #3 is implemented that this becomes less of an issue as you can simply build your ignore list for your vault? |
Being able to exclude certain notes wouldn't help me in this case, but being able to add my own stop words would. This might be the easiest and most flexible for most people. Another idea might be to have a threshold for matches. For example, here are the matches for 'principle' in the same file. Principle is one word of two for "Purcell Principle" and a journal entry, so there's a good chance those could be a match. It's one word out of 17 on the article about the Copernican Principle and 1 of 14 in the note about W.H.O., so it's less likely to be a match there. A threshold where the partial must match at least X% of the full name would filter out a lot of the false positives. (For my vault, it looks like a threshold around 15% of words would get rid of most false positives while still surfacing the good potential links.) BTW, I just noticed that the note itself is being flagged a possible link and probably shouldn't be. This note is named "Cognitive ease principle", and is coming up as an option for that phrase and for "principle". |
Good call. This was a bug. Fixed in 1.5.1. |
That's not a bad idea. I will implement your suggestion and we can give this a go testing to see the usefulness of this change. Will let you know when the change is made. |
I would honestly be most happy with an option to disable partial matches. My vault really isn't set up in a way that partial matches make sense, since they are mostly (pretty long) paper titles. |
Damn. That surely drives the point through. Are any of those suggestions remotely useful for your use case any chance? I've just come across RAKE which I'm going to trial out quickly to see if that is an even better solution than what I have at the moment. |
Not really. I don't think it shows all the recommendations, since it filled my whole screen. There probably are some relevant recommendations like 'generative models', but I don't see it probably because it's ordered alphabetically. Also the stemming is rather agressive, it seems to use 'generalized' for 'generator'. |
Hi, I'll chime in on this conversation rather than start a new one. I've just installed this for the first time, the idea and the way you're approaching it is awesome! The first note I threw at it though gave matches for:
|
@laurastephsmith I created a fork with a setting to disable the rather aggressive stemming. You can install it from here: https://github.com/HEmile/obsidian-sidekick/releases/tag/1.1.0 , hopefully that solves the problem! (It does for me). |
@HEmile oo thanks, I'll give it a go! |
I'm seeing some false positives in my vault. In this case, 'on' is highlighted with following replacements:
Here, 'matter' is highlighted
Same with 'zero'
I think this started happening on 1.4.1, and it wasn't like this in 1.4.0 .
(btw... apologies for all the issues I'm making in this repo. I think it will be super useful to me and this plugin will be of huge help to many in the community!)
The text was updated successfully, but these errors were encountered: