Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A case in spec/functions/string.json is implementation dependent #998

Open
mihnita opened this issue Feb 4, 2025 · 9 comments · May be fixed by #999
Open

A case in spec/functions/string.json is implementation dependent #998

mihnita opened this issue Feb 4, 2025 · 9 comments · May be fixed by #999
Labels
registry Issue pertains to the function registry

Comments

@mihnita
Copy link
Collaborator

mihnita commented Feb 4, 2025

In the spec/functions/string.json file one test is implementation dependent:

    {
      "src": ".input {$foo :string} .match $foo 1 {{one}} * {{other}}",
      "params": [
        {
          "name": "foo",
          "value": 1
        }
      ],
      "exp": "one"
    },

The expected result seem to indicate that the numeric value 1 (from params) is somehow converted to a number.


Why I think that is the case?

The test file is a JSON file, and in most cases a number in a json file will be parsed into some sort of numeric type in the target programming language.
So the 1 in params[0].value will become some kind of numeric value.
This is true in Java using gson, and even in JavaScript:

var testunit = {
      "src": ".input {$foo :string} .match $foo 1 {{one}} * {{other}}",
      "params": [
        {
          "name": "foo",
          "value": 1
        }
      ],
      "exp": "one"
    }
console.log(typeof testunit.params[0].value)
## shows "number" (without quotes)

The spec for the :string function (string.md) says this:

The operand of :string is either any implementation-defined type
that is a string or for which conversion to a string is supported,
or any literal value.
All other values produce a Bad Operand error.

So for the test to pass it means that this case is true:
"or for which conversion to a string is supported"
(it is not an "implementation-defined type that is a string" and it is not a literal value)

It is unclear what "conversion to a string is supported" means exactly in the above description
(happens automatically, like in JS, or one calling a itoa in C or foo.toString() is Java means "it is supported")

We already agreed (in the discussions about fallbacks) that we don't want to do something like the Java .toString(), because it risks disclosing private information (think SSN or employee ID in a Person object).
So in Java / C / C++ / others a "conversion to a string is " NOT "supported"

In fact, the spec text itself says "any implementation-defined type
that is a string or for which conversion to a string is supported"

Note "implementation-defined"


TLDR: the test is implementation dependent.
So it does not belong in an implementation independent test suite.

The fix should be as easy as removing the test case.

@eemeli
Copy link
Collaborator

eemeli commented Feb 5, 2025

The smallest fix here would be to change the input to a string "1" rather than a number 1.

It's true that we don't specify any minimum for values "for which conversion to a string is supported", but it would be quite unfortunate if an implementation did not allow for an integer value to be serialized.

@aphillips
Copy link
Member

@eemeli It is kind of weird to have a test of the :string function that depends on it, though. While your proposed fix is correct, it's not any different from making the value of foo be "a" and having the key be a. I kind of prefer the test the way that it is, but with a note about type coercion to a string.

@mihnita
Copy link
Collaborator Author

mihnita commented Feb 5, 2025

There is already a "value": "1" test right above it, so to fix the test file we should just remove this case, agree.

I kind of prefer the test the way that it is, but with a note about type coercion to a string.

Then it will fail for some languages.
We should not put something that is implementation dependent in the spec test suite.


On fixing the spec, I don't think that :string should accept any kind of numeric values.
A :string takes a string only.

Otherwise we will have to specify exactly what that serialization looks like, how is going to treat extreme values (many decimals, 3.14E-12, and so on).

Already :number taking a string is a pain, :number making selection on strings is a pain (and not to mention wrong? the plural selection in MF1 is done without conversion to string, it is done by looking at the plural rules and knowing he values of n, v, v, i and so on)

And if we allow SOME programming languages to do type coercion from number to string, then this is not portable anymore.

Options to make this portable:

  1. languages with type coercion should to the conversion to string before passing that as a parameter
  2. forcing ALL implementations of MF2 to also implement a number-to-string conversion, and define exactly what that looks like

Option 2 is way messier to define and to implement.


We already have "magical conversions" hidden in some places that I am unhappy with.

String to number

We have it, and there is no way around that if we want to support something like `{|2| :number} for the price of one"

Number to string

We also have it for the selection on :number.
Which I think is unnecessary and wrong.
The selection is only defined on strings and nothing else.
Because we refused to have a numeric type.

But we in fact have it, only refuse to admit it...

@eemeli eemeli linked a pull request Feb 5, 2025 that will close this issue
@mihnita
Copy link
Collaborator Author

mihnita commented Feb 6, 2025

unfortunate if an implementation did not allow for an integer value to be serialized.

That is done by something like :number (for a locale sensitive "serialization")
And outside mf2 for :string


This is the same argument that was used to reject the idea that selection keys are can be numeric.

Why can't my implementation deserialize a 1.0 key to a number?

It looks like we pick and choose when to do type coercion and we have no consistency.

@aphillips
Copy link
Member

@mihnita wrote:

This is the same argument that was used to reject the idea that selection keys are can be numeric.

No. You have to keep the different levels of implementation concern apart. In the syntax, keys are always literals. The interpretation of keys as "values" is defined first by the selector function's specification and then by the implementation of that selector function. The literal 1 is not a number, but it is a permitted key value by the selectors :number and :integer, which try to exactly match any operand values with the numeric value of 1 against it. So such a "selection key" in these functions might be numeric in your implementation--for exact match (don't forget that plural keywords are also keys and are always enumerated string values).

But by the same token, the function :string can also have the key 1, which matches the string consisting of a single U+0031. There the string is not a number ever... because :string says so.

My point is that it is wrong to talk at the MF syntax level about keys being anything except literals. What your implementation's functions do with those literals are up to you (and the function spec, of course). Your implementation is free to interpret the key 1.0 as the numeric value 1 and match it to int x = 1. However, such behavior may not be portable because :number and :integer only define and require the integer key serialization and some different implementation might reject the key 1.0 or treat it as not matching integer 1 (because that behavior is implementation defined).

Why can't my implementation deserialize a 1.0 key to a number?

It can. We explicitly say:

Otherwise, the serialized form of the numeric value is implementation-defined.

We allow it for operands too.

You'll recall that there is no WG consensus on how to support floating point number selection.

It looks like we pick and choose when to do type coercion and we have no consistency.

Type coercion is left to implementations. We go out of our way to put type coercion into the implementers hands. We do not pick and choose in MF2, because we are specifically typeless.

In short, you're right that the test was faulty... as a generic test. But users should expect that native number types work for formatting and selection (or what's the point?)

@aphillips
Copy link
Member

I should add... I was wrong to say:

I kind of prefer the test the way that it is, but with a note about type coercion to a string.

The type coercion to a string would be inside the :string function and thus is implementation defined according to :string:

Other programming languages would define string and character sequence types or classes according to their local needs, including, where appropriate, coercion to string.

It doesn't belong in the :string generic test suite.

@macchiati
Copy link
Member

macchiati commented Feb 7, 2025

I agree with Addison. A key of 1 or |1| in the syntax is just a string, and it is up to the function for that key's column to interpret it, and determine whether it matches. If a :number function were extended to handle rationals, it could also handle selection on |2/3|.

An implementation of MF (including the default functions) is free to pre-parse an MF message, and convert the keys in an internal data structure into real datatypes for all the functions has control over. (If that weren't true, it would be a problem in the spec.)

So an implementation in Rust could, for example, support BigRational in :number, and preparse an MF message with |2/3| and convert it to a BigRational.

@mihnita
Copy link
Collaborator Author

mihnita commented Feb 7, 2025

The type coercion to a string would be inside the :string function and thus is implementation defined according to :string:

Then we can't say "the :string function takes a string"
It takes a string OR a numeric type.

I agree with Addison. A key of 1 or |1| in the syntax is just a string

This answer from Addison is not about the keys, it is about the :string function.

in the syntax is just a string

Everything is a string in a syntax, because we store the sources (for code, for mf2, etc) in text files.
But in any programming language we know of, once the syntax is parsed, these are not strings anymore.

Even in JavaScript (in the browser console):

var na = 1
var nb = 1.00
var nc = 0x01

var sa = '1'
var sb = '1.00'
var sc = '0x01'

typeof na
// < 'number'
typeof nb
// < 'number'
typeof nc
// < 'number'

typeof sa
// < 'string'
typeof sb
// < 'string'
typeof sc
// < 'string'

na == nb
// < true
na == nc
// < true
na === nb
// < true
na === nc
// < true

sa == sb
// < false
sa == sc
// < false
sa === sb
// < false
sa === sc
// < false

// type coercion done by JS
na == sa
// < true

// no type coercion, stricter equal
na === sa
// < false

and it is up to the function for that key's column to interpret it

That is not how any programming language works.
Once we parse the sources the resulting variables have a type, any trace of what that looked like in sources is gone.
Not even JS can tell the difference between na and nb in the example above.
The variables are numeric now.

If we take a Java example:

void someFunction(int x) { ... }

// The 1.0 is a "number literal", in source
// But once parsed the the type of n is int
int n = 1.0;

// not an error
someFunction(n);

// the "1.0" looks like a string numeral, in source
// But it is not, because it is between quotes
// So `s` is (obviously) a String
String s = "1.0";

// this is an error
someFunction(s);

@macchiati

This is not about keys, not directly.
It is about the :string function.

What Addison / Eemeli are arguing here is that :string should accept both a numeric and a String.
In the Java example above someFunction should accept both a number and a String.
That thing is not possible in Java (and other strongly typed languages).

You have to define two different functions, and one would do explicit conversion:

void someFunction(String s) { someFunction(Integer.of(s); }
void someFunction(int n) { }

Or do some ugly stuff like void someFunction(Object obj) { typeof + cast }

An implementation of MF (including the default functions) is free to pre-parse an MF message, and convert the keys in an internal data structure into real datatypes for all the functions has control over.

No, implementations are not free to do that.

Because the spec overreaches and dictates how this works internally, and what types are acceptable (nothing but strings).
And all algorithms are described in term of strings.

In fact, I did that (I had a real numeric type in the data model, and a number literal).

But that is a big cause of friction now, when the spec went on to define how to do function chaining, resolved values (mentioned by Eemeli in an email as "small changes" since LDML 46).

I had function chaining (and still have it), and worked.
It was an implementation detail how (but it worked even for custom functions).

By going in to specify how chaining works the spec forces all implementations to throw away a type system and accept strings everywhere.

And (right now) (for :math) the spec forces one to do the equivalent of:

StringOrNumber mathAdd(StringOrNumber op1, StringOrNumber op2)

But in reality that is not legal "by the spec", because the data model (which is part of the spec) only has Literal with a type of string. There is no Numeric type.

Even JS does not do that.
It has a real numeric type.

@macchiati
Copy link
Member

But in reality that is not legal "by the spec", because the data model (which is part of the spec) only has Literal with a type of string. There is no Numeric type.

I do not understand what you mean. In https://cldr-smoke.unicode.org/spec/main/ldml/tr35-messageFormat.html#messageformat-2-0-data-model it says specifically:

MessageFormat 2.0 Data Model

This section defines a data model representation of MessageFormat 2 messages.

Implementations are not required to use this data model for their internal representation of messages.
Neither are they required to provide an interface that accepts or produces
representations of this data model.
(my bolding)

Now, I think the wording could have been clearer, but it doesn't require that all internal data types are strings. For clarity, it should be:

MessageFormat 2.0 Data Model Interchange Representation

This section defines an optional data model representation of MessageFormat 2 messages for interchange.

And we can consider having this remain Final Candidate, since it isn't in the main section of the spec, and is optional.

Can you point to some other place in the spec: aside from functions, and data model section that disallow an implementation from using a numeric type internally?

@aphillips aphillips added the registry Issue pertains to the function registry label Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
registry Issue pertains to the function registry
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants