Simpler description of the matching algorithm #715

macchiati · 2024-03-09T05:12:21Z

I think we can have a simpler exposition of the matching algorithm. Here is my take

Implementing the algorithm for matching requires two capabilities of functions. I think of these in terms of objects, but of course it can be done in a non-OO language.

Given a bound selector BS consisting of a value, a function, and a set of options, we need two operations.

BS.matches(Locale locale, String matchKey)
1. Returns FAILURE or SUCCESS
2. “*” as a matchKey is always success
3. Example: suppose BV is {1 :number maximumFractionDigits=0}, and the locale is ‘en’
  1. If the matchKey is either 1 or one, there is a match. If it is any other plural value or numeric value, it fails.
  2. If the locale is ‘fr’, then there is a match if the matchKey is 0, 1, or one.
BS.compare(Locale locale, String matchKey1, matchKey2)
1. Only to be called if both matchKey1 and matchKey2 have passed the ‘matches’ test.
2. Returns WORSE if matchKey1 is worse than matchKey2, similarly for SAME, and BETTER.
3. Must be a total order.
4. Example:
  1. For the BV above, “*” is worse than “one” is worse than “1”

Here is how the matching can be performed. Each variant has a list of keys plus a message.

Let bestVariant be empty
For each variant V
1. For each bound selector X[i]
  1. If X.matches(locale, V.key[i]) == FAILURE, continue to the next variant
2. // Now we know that all of the keys match
3. If bestVariant is empty, let bestVariant be V, continue to the next variant
4. Let possibleBetter = false
5. For each bound selector X[i]
  1. Let comparison = X.compare(locale, V.key[i], bestVariant.key[i])
  2. If comparison == WORSE, continue to the next variant
  3. If comparison == BETTER, let possibleBetter = true
6. If possibleBetter == true, then let bestVariant be V
Return bestVariant

Note: there are various further optimizations that can be done without changing the result. For example, it is possible to combine the two calls on matches and compare (but at the expense of complicating the algorithm a bit).

eemeli · 2024-03-09T08:41:17Z

This algorithm does not match exactly this part of our current solution:

message-format-wg/spec/formatting.md

Lines 488 to 489 in 07f6309

    
           The remaining _variants_ are sorted according to the _selector_'s _key_-ordering preference. 
        
           Earlier _selectors_ in the _matcher_'s list of _selectors_ have a higher priority than later ones.

To demonstrate this, consider the message

.match {1 :number} {1 :number}
* 1   {{star + exact}}
1 one {{exact + cat}}
* *   {{star + star}}

Here, all three variants will match the selectors. The current algorithm will select exact + cat, while the proposed simpler description will select star + exact, because for the second selector one will be considered worse than 1.

My preference here would be to simplify the algorithm further, and to encode a preferential relation like "1 one is better than * 1" in the order of the variants. Doing so would allow us to drop the BS.compare() method, and to do selection with:

For each variant V
1. For each bound selector X[i]
  1. If X[i].matches(locale, V.key[i]) == FAILURE, continue to the next variant
2. // Now we know that all of the keys match
3. Return V.

aphillips · 2024-03-09T14:57:55Z

@eemeli

I don't agree with your reasoning. star + exact is not a better match than exact + cat. You can see this if you switch the order of the selectors (and their columns). Wouldn't you be surprised that * selected above 1 in that case? The order of the selectors should be somewhat arbitrary: think of messages auto-migrated from MF1, for example. They might put the selectors in different orders depending on the tool in question.

What you're suggesting would reopen the old arguments that resulted in this design document.

eemeli · 2024-03-09T16:17:16Z

I don't agree with your reasoning. star + exact is not a better match than exact + cat. You can see this if you switch the order of the selectors (and their columns).

That's not what I meant. I meant that with first-choice selection, replicating our current behaviour would require the message to be expressed as

.match {1 :number} {1 :number}
1 one {{exact + cat}}
* 1   {{star + exact}}
* *   {{star + star}}

Wouldn't you be surprised that * selected above 1 in that case?

In all other programming and otherwise code-like languages that I regularly interact with, the switch and pattern-matching constructs select the first matching case. So to be honest, right now, the MF2 behaviour of not doing that is surprising.

The order of the selectors should be somewhat arbitrary: think of messages auto-migrated from MF1, for example. They might put the selectors in different orders depending on the tool in question.

Right now, yes, they might, because we have a really complex selection algorithm that's allowing for that transform to be really lazy. But adding a requirement on message migration tools to order the variants is not onerous. For plural selectors, I've now written implementations that do that in JavaScript and Python. The former is probably easier to read, and it's five lines of code.

What you're suggesting would reopen the old arguments that resulted in this design document.

Yes. And I brought this up because I observe in #706 (comment) and here that the solution we ended up with is sufficiently complex that we ourselves are not all sharing the same understanding of it. If it's not clear to us, it won't be clear to our users.

macchiati · 2024-03-09T17:15:30Z

Noting first that this discussion is post v45. That is, nothing discussed here should impede the progress of implementing the C++, Java, and Javascript tech preview implementations (having the spec jiggle around under the implementer's feet will do that).

It is certainly far simpler to take first matching variant. And it is more likely that implementations will implement it the same way, and be interoperable. it is also faster at runtime. It does have implications:

It puts the burden on the message authors, and importantly, the translation software. The latter must be able to expand and contract the variants depending on the function and locale.
To that end, we should also expect functions to support a static .compare(literal, literal, options, locale) without runtime context (no dependency on input parameters), so that tooling can ensure that such expansion/contraction doesn't mess up ordering and cause inadvertent masking.
In addition, that allows for diagnostics for message authors, so that they find out whether there are problems and correct the ordering if there are any mistakes.
It is slightly less powerful, because it could be that a particular function, given a runtime context would prefer matching A over B, but have a different preference between them given a different runtime context. However, I think that is enough of an edge case that it is less important than simplicity and speed.

mihnita · 2024-03-09T17:49:21Z

It puts the burden on the message authors, and importantly, the translation software

And that is exactly what bothers me most!
:-)

Note that we have custom selectors.
So nobody knows how to sort things for my selector, except my selector.
I might even fix bugs in time / improve my selector.
Would that require reordering of all my translations?

Most tools will not reorder, and will not expand.
At least in the beginning, and for a long time.
Adding such support requires big changes, at all kinds of levels (front-end, TMs & leverage, validation, UI, etc).
So most tools will wait for wide adoption of MF2 before acting.
But that will hamper adoption.

Please, let's not open this can of worms!

aphillips · 2024-03-09T21:50:04Z

Best-match (vs. first-match) is a consensus discussion we had (at great length) previously (initially settled in the 2023-03-27 call). Many of the arguments being made here are already in the design document I quoted above.

I'm open to considering changing the description of the algorithm, a la @macchiati's proposal. We should focus the discussion in this issue on that. The bar to re-opening consensus is higher and shouldn't be mixed with feedback on improving the current documentation. Please note that I am not saying that we don't want feedback on people's lived experience with the current matching algorithm.

aphillips · 2024-10-07T21:15:29Z

Duplicates #898. We're keeping the latter issue as it is more detailed.

macchiati added the Preview-Feedback Feedback gathered during the technical preview label Mar 9, 2024

macchiati mentioned this issue Mar 9, 2024

We need a list of possible error codes #706

Closed

aphillips added the formatting label Mar 9, 2024

aphillips mentioned this issue Oct 7, 2024

[FEEDBACK] Simpler formulation of Pattern Selection #898

Open

aphillips added the duplicate Duplicates another issue label Oct 7, 2024

aphillips closed this as completed Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simpler description of the matching algorithm #715

Simpler description of the matching algorithm #715

macchiati commented Mar 9, 2024

eemeli commented Mar 9, 2024 •

edited

Loading

aphillips commented Mar 9, 2024

eemeli commented Mar 9, 2024

macchiati commented Mar 9, 2024

mihnita commented Mar 9, 2024

aphillips commented Mar 9, 2024

aphillips commented Oct 7, 2024

Simpler description of the matching algorithm #715

Simpler description of the matching algorithm #715

Comments

macchiati commented Mar 9, 2024

eemeli commented Mar 9, 2024 • edited Loading

aphillips commented Mar 9, 2024

eemeli commented Mar 9, 2024

macchiati commented Mar 9, 2024

mihnita commented Mar 9, 2024

aphillips commented Mar 9, 2024

aphillips commented Oct 7, 2024

eemeli commented Mar 9, 2024 •

edited

Loading