proper_types:char() and proper_types:string() may generate invalid Unicode #318

mikpe · 2025-01-17T09:41:54Z

I had a proper test fail, because proper_types:string() returned [55296] which is [16#D800], and that's not a valid Unicode string, causing unicode:characters_to_list/1 (and others) to fail on it.

I believe the root cause is that proper_types:char() returns any integer between 0 and 16#10ffff, without taking care to avoid invalid Unicode code points.

The text was updated successfully, but these errors were encountered:

kostis · 2025-01-17T12:55:46Z

Thanks reporting this.

The main problem here is that the language of Erlang types specifies char() as 0..16#10ffff and string() as [char()], so the default generators for these (non-primitive) types are consistent with their official, but perhaps over-approximating, definitions. I am not sure it's OK to change the definition of the PropEr generators without also changing the definitions of the corresponding types in the Erlang specification.

One idea that comes to mind is to extend the proper_unicode module, which was contributed long ago by a user, to also include a string() generator for Unicode strings. (That module currently only provides generators for Unicode binaries.) Any thoughts (or some other idea) on this?

mikpe · 2025-01-17T14:52:18Z

Oh, right, if Erlang specifies string() to be a proper (no pun intended) superset of the set of Unicode strings, then client code (our test case in this instance) is in error for assuming the string() actually is valid Unicode. So we should probably filter out non-Unicode code points first.

Now, if proper_types had a unicode_string() generator, that would be awesome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proper_types:char() and proper_types:string() may generate invalid Unicode #318

proper_types:char() and proper_types:string() may generate invalid Unicode #318

mikpe commented Jan 17, 2025

kostis commented Jan 17, 2025

mikpe commented Jan 17, 2025

proper_types:char() and proper_types:string() may generate invalid Unicode #318

proper_types:char() and proper_types:string() may generate invalid Unicode #318

Comments

mikpe commented Jan 17, 2025

kostis commented Jan 17, 2025

mikpe commented Jan 17, 2025