Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler description of the matching algorithm #715

Closed
macchiati opened this issue Mar 9, 2024 · 7 comments
Closed

Simpler description of the matching algorithm #715

macchiati opened this issue Mar 9, 2024 · 7 comments
Labels
duplicate Duplicates another issue formatting Preview-Feedback Feedback gathered during the technical preview

Comments

@macchiati
Copy link
Member

I think we can have a simpler exposition of the matching algorithm. Here is my take

Implementing the algorithm for matching requires two capabilities of functions. I think of these in terms of objects, but of course it can be done in a non-OO language.

Given a bound selector BS consisting of a value, a function, and a set of options, we need two operations.

  1. BS.matches(Locale locale, String matchKey)
    1. Returns FAILURE or SUCCESS
    2. “*” as a matchKey is always success
    3. Example: suppose BV is {1 :number maximumFractionDigits=0}, and the locale is ‘en’
      1. If the matchKey is either 1 or one, there is a match. If it is any other plural value or numeric value, it fails.
      2. If the locale is ‘fr’, then there is a match if the matchKey is 0, 1, or one.
  2. BS.compare(Locale locale, String matchKey1, matchKey2)
    1. Only to be called if both matchKey1 and matchKey2 have passed the ‘matches’ test.
    2. Returns WORSE if matchKey1 is worse than matchKey2, similarly for SAME, and BETTER.
    3. Must be a total order.
    4. Example:
      1. For the BV above, “*” is worse than “one” is worse than “1”

Here is how the matching can be performed. Each variant has a list of keys plus a message.

  1. Let bestVariant be empty
  2. For each variant V
    1. For each bound selector X[i]
      1. If X.matches(locale, V.key[i]) == FAILURE, continue to the next variant
    2. // Now we know that all of the keys match
    3. If bestVariant is empty, let bestVariant be V, continue to the next variant
    4. Let possibleBetter = false
    5. For each bound selector X[i]
      1. Let comparison = X.compare(locale, V.key[i], bestVariant.key[i])
      2. If comparison == WORSE, continue to the next variant
      3. If comparison == BETTER, let possibleBetter = true
    6. If possibleBetter == true, then let bestVariant be V
  3. Return bestVariant

Note: there are various further optimizations that can be done without changing the result. For example, it is possible to combine the two calls on matches and compare (but at the expense of complicating the algorithm a bit).

@macchiati macchiati added the Preview-Feedback Feedback gathered during the technical preview label Mar 9, 2024
@eemeli
Copy link
Collaborator

eemeli commented Mar 9, 2024

This algorithm does not match exactly this part of our current solution:

The remaining _variants_ are sorted according to the _selector_'s _key_-ordering preference.
Earlier _selectors_ in the _matcher_'s list of _selectors_ have a higher priority than later ones.

To demonstrate this, consider the message

.match {1 :number} {1 :number}
* 1   {{star + exact}}
1 one {{exact + cat}}
* *   {{star + star}}

Here, all three variants will match the selectors. The current algorithm will select exact + cat, while the proposed simpler description will select star + exact, because for the second selector one will be considered worse than 1.

My preference here would be to simplify the algorithm further, and to encode a preferential relation like "1 one is better than * 1" in the order of the variants. Doing so would allow us to drop the BS.compare() method, and to do selection with:

  1. For each variant V
    1. For each bound selector X[i]
      1. If X[i].matches(locale, V.key[i]) == FAILURE, continue to the next variant
    2. // Now we know that all of the keys match
    3. Return V.

@aphillips
Copy link
Member

@eemeli

I don't agree with your reasoning. star + exact is not a better match than exact + cat. You can see this if you switch the order of the selectors (and their columns). Wouldn't you be surprised that * selected above 1 in that case? The order of the selectors should be somewhat arbitrary: think of messages auto-migrated from MF1, for example. They might put the selectors in different orders depending on the tool in question.

What you're suggesting would reopen the old arguments that resulted in this design document.

@eemeli
Copy link
Collaborator

eemeli commented Mar 9, 2024

I don't agree with your reasoning. star + exact is not a better match than exact + cat. You can see this if you switch the order of the selectors (and their columns).

That's not what I meant. I meant that with first-choice selection, replicating our current behaviour would require the message to be expressed as

.match {1 :number} {1 :number}
1 one {{exact + cat}}
* 1   {{star + exact}}
* *   {{star + star}}

Wouldn't you be surprised that * selected above 1 in that case?

In all other programming and otherwise code-like languages that I regularly interact with, the switch and pattern-matching constructs select the first matching case. So to be honest, right now, the MF2 behaviour of not doing that is surprising.

The order of the selectors should be somewhat arbitrary: think of messages auto-migrated from MF1, for example. They might put the selectors in different orders depending on the tool in question.

Right now, yes, they might, because we have a really complex selection algorithm that's allowing for that transform to be really lazy. But adding a requirement on message migration tools to order the variants is not onerous. For plural selectors, I've now written implementations that do that in JavaScript and Python. The former is probably easier to read, and it's five lines of code.

What you're suggesting would reopen the old arguments that resulted in this design document.

Yes. And I brought this up because I observe in #706 (comment) and here that the solution we ended up with is sufficiently complex that we ourselves are not all sharing the same understanding of it. If it's not clear to us, it won't be clear to our users.

@macchiati
Copy link
Member Author

Noting first that this discussion is post v45. That is, nothing discussed here should impede the progress of implementing the C++, Java, and Javascript tech preview implementations (having the spec jiggle around under the implementer's feet will do that).

It is certainly far simpler to take first matching variant. And it is more likely that implementations will implement it the same way, and be interoperable. it is also faster at runtime. It does have implications:

  1. It puts the burden on the message authors, and importantly, the translation software. The latter must be able to expand and contract the variants depending on the function and locale.
  2. To that end, we should also expect functions to support a static .compare(literal, literal, options, locale) without runtime context (no dependency on input parameters), so that tooling can ensure that such expansion/contraction doesn't mess up ordering and cause inadvertent masking.
  3. In addition, that allows for diagnostics for message authors, so that they find out whether there are problems and correct the ordering if there are any mistakes.
  4. It is slightly less powerful, because it could be that a particular function, given a runtime context would prefer matching A over B, but have a different preference between them given a different runtime context. However, I think that is enough of an edge case that it is less important than simplicity and speed.

@mihnita
Copy link
Collaborator

mihnita commented Mar 9, 2024

It puts the burden on the message authors, and importantly, the translation software

And that is exactly what bothers me most!
:-)

Note that we have custom selectors.
So nobody knows how to sort things for my selector, except my selector.
I might even fix bugs in time / improve my selector.
Would that require reordering of all my translations?

Most tools will not reorder, and will not expand.
At least in the beginning, and for a long time.
Adding such support requires big changes, at all kinds of levels (front-end, TMs & leverage, validation, UI, etc).
So most tools will wait for wide adoption of MF2 before acting.
But that will hamper adoption.

Please, let's not open this can of worms!

@aphillips
Copy link
Member

Best-match (vs. first-match) is a consensus discussion we had (at great length) previously (initially settled in the 2023-03-27 call). Many of the arguments being made here are already in the design document I quoted above.

I'm open to considering changing the description of the algorithm, a la @macchiati's proposal. We should focus the discussion in this issue on that. The bar to re-opening consensus is higher and shouldn't be mixed with feedback on improving the current documentation. Please note that I am not saying that we don't want feedback on people's lived experience with the current matching algorithm.

@aphillips
Copy link
Member

Duplicates #898. We're keeping the latter issue as it is more detailed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate Duplicates another issue formatting Preview-Feedback Feedback gathered during the technical preview
Projects
None yet
Development

No branches or pull requests

4 participants