problem parsing two-word ingredients that begin with lower-case 'a' #931

Stuckyville · 2019-01-28T00:18:39Z

When entering a two-word ingredient where the first word begins with lower-case 'a', the parser strips the leading 'a' and treats it like a quantity. For example, 'apple juice' becomes 'pple juice' with a quantity of of '1'. Further detail discussed at https://answers.launchpad.net/gourmet/+question/678095

saxon-s · 2019-01-28T00:55:46Z

Environment:
Gourmet 0.17.4 and master branch on Ubuntu and Windows.

Steps to reproduce:

Click "New" button for new recipe
Click "Ingredients" tab
Add each of the following ingredients individually to "Add ingredient" text field:
"apple juice"
"Apple juice"
"apricot"
"an avocado"
"a beet"
"a dozen eggs"
"a pair of Yubari King melons"

Expected Results:

Expect ingredients to be listed as:
"apple juice"
"Apple juice"
"apricot"
"1 avocado"
"1 beet"
"12 eggs"
"2 Yubari King melons"

Actual Results:

Instead, ingredients are listed as:
"1 pple juice"
"Apple juice"
"apricot"
"1 avocado"
"1 beet"
"12 eggs"
"2 Yubari King melons"

Analysis:
If the first word in an ingredient (more than one word) string starts with a lower case "a", the first letter ("a") of the first word is stripped off and substituted with quantity of "1", "a dozen" is substituted with quantity of "12" and "a pair" is substituted with quantity of "2".

Gourmet is designed to translate word numbers into equivalent numbers, for example:
"a" --> "1"
"an" --> "1"
"a couple" --> "2"
"a dozen" --> "12"
"twenty" --> "20"

Conclusion:

There appears to be a bug in the ingredient parser. The ingredient parser should only translate "a" to "1" if it is single character.
In addition, the ingredient parser is not translating capitalized words number correctly, for example:
"A dozen" is not translated to quantity of "12".

martinp26 · 2020-06-13T23:41:05Z

There are multiple problems here:

NUMBER_WORD_REGEXP is missing word boundaries around the individual regex elements, this leads to finding 'a' in the middle of words. Not sure if this would be enough.
The number words are also NOT put through translation. The German version still has "one" ... "ten" in the regex. This has the side effect of early terminating the search in the minutes translation "Minuten" -> "Minu" which then does not parse. Re-editing recipes leads to losing time annotations.

A simple workaround is this in gourmet/convert.py:

@@ -644,7 +648,7 @@ all_number_words.sort(
lambda x,y: ((len(y)>len(x) and 1) or (len(x)>len(y) and -1) or 0)
)

-NUMBER_WORD_REGEXP = '|'.join(all_number_words).replace(' ','\s+')
+NUMBER_WORD_REGEXP = None
FRACTION_WORD_REGEXP = '|'.join(filter(lambda n: NUMBER_WORDS[n]<1.0,
all_number_words)
).replace(' ','\s+')

I believe the NUMBER_FINDER.finditer(timestring) in timestring_to_seconds should not blindly look for the next num-like match, but only after the non-num words after the last match have been consumed.

"12 Minuten" is currently parsed as [12 Minu] [ten]

saxon-s · 2020-06-17T05:48:02Z

@martinp26 Thank you for investigating the issue and the simple workaround.

Unit detection was not considering localization in two places. Fix the simple issue in find_errors_in_progress() by translating units to compare against. The second error is more complex, details are in thinkle#931. Disable broken parsing of number words for now. Signed-off-by: Martin Pohlack <[email protected]>

All_number_words is not working perfectly here. The number words need to go through localization and also need word boundaries, otherwise they match other partial ingredients or time words in other languages. E.g., "ten" match the tail of the German word for minutes (Minuten). Disable broken parsing of number words for now. Fixes thinkle#931. Signed-off-by: Martin Pohlack <[email protected]>

martinp26 linked a pull request Jun 21, 2020 that will close this issue

Address #931 and similar issue when parsing times #999

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem parsing two-word ingredients that begin with lower-case 'a' #931

problem parsing two-word ingredients that begin with lower-case 'a' #931

Stuckyville commented Jan 28, 2019

saxon-s commented Jan 28, 2019 •

edited

Loading

martinp26 commented Jun 13, 2020 •

edited

Loading

saxon-s commented Jun 17, 2020

problem parsing two-word ingredients that begin with lower-case 'a' #931

problem parsing two-word ingredients that begin with lower-case 'a' #931

Comments

Stuckyville commented Jan 28, 2019

saxon-s commented Jan 28, 2019 • edited Loading

martinp26 commented Jun 13, 2020 • edited Loading

saxon-s commented Jun 17, 2020

saxon-s commented Jan 28, 2019 •

edited

Loading

martinp26 commented Jun 13, 2020 •

edited

Loading