Why do scorers return a different result than process.extract #165

maxbachmann · 2021-11-23T11:27:22Z

maxbachmann
Nov 23, 2021
Maintainer

This question was originally asked in #164

I am also a new user. I'm trying to figure out how to use this library with symbols inside the strings.

>>> from fuzzywuzzy import fuzz, process
>>> x = ['atk', 'atk%', 'atk!', 'atk$']

>>> process.extract('atk!', x)
[('atk', 100.0, 0), ('atk%', 100.0, 1), ('atk!', 100.0, 2), ('atk$', 100.0, 3)]

>>> fuzz.ratio('atk', 'atk%')
86

>>> process.extract('atk!', x, scorer=fuzz.ratio)
[('atk', 100.0, 0), ('atk%', 100.0, 1), ('atk!', 100.0, 2), ('atk$', 100.0, 3)]

@Dosx001 I moved this into a separate discussion, since it is an unrelated issue

Answered by maxbachmann

Nov 23, 2021

process.extract preprocesses strings by default (see https://maxbachmann.github.io/RapidFuzz/process.html#extract)

processor (Callable, optional) – Optional callable that reformats the strings. utils.default_process is used by default, which lowercases the strings and trims whitespace

You can disable this behavior by passing processor=None:

process.extract('atk!', x, scorer=fuzz.ratio, processor=None)

or enable it for scorers by passing processor=utils.default_process:

fuzz.ratio('atk', 'atk%', processor=utils.default_process)

View full answer

maxbachmann · 2021-11-23T11:31:33Z

maxbachmann
Nov 23, 2021
Maintainer Author

process.extract preprocesses strings by default (see https://maxbachmann.github.io/RapidFuzz/process.html#extract)

processor (Callable, optional) – Optional callable that reformats the strings. utils.default_process is used by default, which lowercases the strings and trims whitespace

You can disable this behavior by passing processor=None:

process.extract('atk!', x, scorer=fuzz.ratio, processor=None)

or enable it for scorers by passing processor=utils.default_process:

fuzz.ratio('atk', 'atk%', processor=utils.default_process)

0 replies

Dosx001 · 2021-11-24T00:25:09Z

Dosx001
Nov 24, 2021

@maxbachmann Thanks! This was what I needed.

Also at of curiosity why does RapidFuzz and FuzzyWuzzy work like this. Why do I need scorer=fuzz.ratio, processor=None. Why couldn't it work like this out the box?

1 reply

maxbachmann Nov 24, 2021
Maintainer Author

In RapidFuzz I want to stay compatible with FuzzyFuzzy. I selected the same defaults, so most oft the time people only have to change the import. I do not know why SeatGeek selected this default behavior in FuzzyWuzzy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do scorers return a different result than process.extract #165

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Why do scorers return a different result than process.extract #165

maxbachmann Nov 23, 2021 Maintainer

Replies: 2 comments · 1 reply

maxbachmann Nov 23, 2021 Maintainer Author

Dosx001 Nov 24, 2021

maxbachmann Nov 24, 2021 Maintainer Author

maxbachmann
Nov 23, 2021
Maintainer

Replies: 2 comments 1 reply

maxbachmann
Nov 23, 2021
Maintainer Author

Dosx001
Nov 24, 2021

maxbachmann Nov 24, 2021
Maintainer Author