Consider Token Frequency when calculating ratio #117

JayPalm · 2021-08-21T00:14:50Z

JayPalm
Aug 21, 2021

It doesn't seem like any of the scorers consider the token frequency (how many times a word/substring appears), making it difficult to match phrases with repeating terms. For my current need, I'm able to simply filter out the reoccurring word, but it would be cool to have a frequency-sensitive scorer.

maxbachmann · 2021-08-24T16:28:55Z

maxbachmann
Aug 24, 2021
Maintainer

For my current need, I'm able to simply filter out the reoccurring word, but it would be cool to have a frequency-sensitive scorer.

Do you mean it should ignore duplicated words? Fuzz.token_set_ratio is a scorer which ignores duplicated words.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider Token Frequency when calculating ratio #117

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Consider Token Frequency when calculating ratio #117

JayPalm Aug 21, 2021

Replies: 1 comment

maxbachmann Aug 24, 2021 Maintainer

JayPalm
Aug 21, 2021

maxbachmann
Aug 24, 2021
Maintainer