Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow codeUnitAt to be constant for constant strings #4146

Open
HosseinYousefi opened this issue Oct 29, 2024 · 10 comments
Open

Allow codeUnitAt to be constant for constant strings #4146

HosseinYousefi opened this issue Oct 29, 2024 · 10 comments
Labels
feature Proposed language feature that solves one or more problems

Comments

@HosseinYousefi
Copy link
Member

Let's take the example from https://dart.dev/language/branches#switch-expressions:

// Where slash, star, comma, semicolon, etc., are constant variables...
token = switch (charCode) {
  slash || star || plus || minus => operator(charCode),
  comma || semicolon => punctuation(charCode),
  >= digit0 && <= digit9 => number(),
  _ => throw FormatException('Invalid')
};

It assumes that slash, star, ... are constant variables, but how are they defined? I expect to be able to do:

const slash = '/'.codeUnitAt(0); // Error: Methods can't be invoked in constant expressions.
// ...

but I can't, and it's ugly and error-prone to find out what the char code for each of the characters are and put the integer in instead.

@HosseinYousefi HosseinYousefi added the feature Proposed language feature that solves one or more problems label Oct 29, 2024
@julemand101
Copy link

julemand101 commented Oct 29, 2024

@mateusfccp
Copy link
Contributor

mateusfccp commented Oct 29, 2024

@HosseinYousefi

What would be the difference between

const slash = '/'.codeUnitAt(0);

and what we have now?

const slash = '/';

It seems you want a char type of some type?

I'm not sure if it's even possible, as JS doesn't support it.

@HosseinYousefi
Copy link
Member Author

What would be the difference between

You can parse things like >= digit0 && <= digit9 => number(), in a switch statement as mentioned in the code example.
You can't do >= '0' && <= '9' for strings.

It seems you want a char type of some type?

No need for an extra type; int is fine. I just want to be able to write:

const digit0 = '0'.codeUnitAt(0);

Instead of

const digit0 = 48;
assert(digit0 == '0'.codeUnitAt(0));

@mateusfccp
Copy link
Contributor

I would rather have a char type, it could even be backed by int, like an extension type on int, but with a literal constructor.

I don't think it's so feasible for Dart today, but there's an issue for this: #886.

@lrhn
Copy link
Member

lrhn commented Oct 30, 2024

I made a package with character code points, package:charcode, since I couldn't get #886. That issue is from before I was on the language team.
The answer still stands: It would be nice, but it's unlikely to be high enough priority that the language will sacrifice the syntax for it.
(I recommend using the package to generate your char codes, rather than depending on it, but you can use its constants while developing. The package doesn't come with any support guarantee, but it also hasn't needed change, like, ever. It's just a bunch of top-level constant declarations.)

Making string.codeUnitAt constant seems more viable. It's still questionable to use code units. Maybe you should use .runes.first instead.
In practice it is almost always used for ASCII characters in parsers, which is a small use-case that can adequately be covered by a single helper library.

@HosseinYousefi
Copy link
Member Author

since I couldn't get #886.

Feel free to close this then.

I made a package with character code points, package:charcode

Thanks, I didn't know about this package.

It's still questionable to use code units.

It won't matter for ASCII characters only, which is my usecase. But yeah making .runes constant is also viable.

@lrhn
Copy link
Member

lrhn commented Oct 30, 2024

It's not that I haven't wanted "a".codeUnitAt(0) to be constant myself, often enough, since I write my share of small parsers.
I just know writing parsers is fairly niche, and most other people shouldn't be using (UTF-16) code units for anything.

There is no end to the extension to the constant sub-language that it's possible to add, but the language team is generally choosing not to add small constant features. Every little thing is a little thing, but the cummulative complexity adds up. Anything that can be constant needs to be specified more precisely, including when it fails (because that can cause compile-time errors). The analyzer needs to be able to evaluate it, because it evaluates constants, but it doesn't otherwise compile or run code.

Allowing something to be constant might make some later changes harder, if those are not compatible with being constant. (I don't see what that could possibly be for codeUnitAt, but sometimes only the future can tell. Say we change strings to also be able to be UTF-8 strings, and codeUnitAt can now give you UTF-8 code units. That ... probably still just works.)
All in all, the barrier of entry for becoming a constant expression is pretty high in practice, even for things that look simple. More constant expressions are usually only added in concert with some other related feature (null safety introduced some new cases), not as a small feature by themselves.
(And we all dream of, some day, overhauling the constant subsystem for something "better", whatever that might be.)

@ykmnkmi
Copy link

ykmnkmi commented Oct 30, 2024

Why not write an int constant constructor?

final class int {
  external const factory int.charCodeAt(String string, [int index = 0]);
}

const int b = int.charCodeAt('b');

@HosseinYousefi
Copy link
Member Author

Why not write an int constant constructor?

  • The concepts of int and char code of a String are not connected for it to be a factory on int.
  • Why should the normal way of getting a char code be one way and the constant way be in a totally different way?
  • If both of these things have to be implemented there is no difference between making this or making string.codeUnitAt constant. For example, string.length is already constant for constant strings.

@lrhn
Copy link
Member

lrhn commented Oct 30, 2024

A magical external const int.codeUnitOf(String oneCodePointString); could work.

We generally try not to add that kind of constructors, which are not really constant, they just pretend to and require the compliler to do computation at compile-time. (We used to have a const Symbol constructor that had to check if the input was a valid identifier or one of a few other forms. The VM never implemented the checking because they didn't special-case the constructor.)

It's definitely an easier "language feature" than a code unit/code point syntax like #'c'.
Possibly about the same complexity as making String.codeUnitAt be allowed in constants (both require compilers and analyzer to do computation at compile time).

I do agree that the connection between int and String are not such that I think it would fit in int.
But we could do extension type CodeUnit(int _) implements int { external const CodeUnit.of(String s); }.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
None yet
Development

No branches or pull requests

5 participants