Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proper_types:char() and proper_types:string() may generate invalid Unicode #318

Open
mikpe opened this issue Jan 17, 2025 · 2 comments
Open

Comments

@mikpe
Copy link

mikpe commented Jan 17, 2025

I had a proper test fail, because proper_types:string() returned [55296] which is [16#D800], and that's not a valid Unicode string, causing unicode:characters_to_list/1 (and others) to fail on it.

I believe the root cause is that proper_types:char() returns any integer between 0 and 16#10ffff, without taking care to avoid invalid Unicode code points.

@kostis
Copy link
Collaborator

kostis commented Jan 17, 2025

Thanks reporting this.

The main problem here is that the language of Erlang types specifies char() as 0..16#10ffff and string() as [char()], so the default generators for these (non-primitive) types are consistent with their official, but perhaps over-approximating, definitions. I am not sure it's OK to change the definition of the PropEr generators without also changing the definitions of the corresponding types in the Erlang specification.

One idea that comes to mind is to extend the proper_unicode module, which was contributed long ago by a user, to also include a string() generator for Unicode strings. (That module currently only provides generators for Unicode binaries.) Any thoughts (or some other idea) on this?

@mikpe
Copy link
Author

mikpe commented Jan 17, 2025

Oh, right, if Erlang specifies string() to be a proper (no pun intended) superset of the set of Unicode strings, then client code (our test case in this instance) is in error for assuming the string() actually is valid Unicode. So we should probably filter out non-Unicode code points first.

Now, if proper_types had a unicode_string() generator, that would be awesome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants