Add map/filtermap functions to `sets`, `ordsets` and `gb_sets` #7183

Maria-12648430 · 2023-04-27T16:33:30Z

This PR provides mapping functionality for sets (sets, ordsets and gb_sets).

Often when I deal with sets, I have the need to map one to another. Currently, this is only possible by folding or by converting to a list, mapping over that, and converting back to a set.

github-actions · 2023-04-27T16:34:30Z

CT Test Results

      2 files     89 suites 42m 49s ⏱️
1 894 tests 1 845 ✔️ 48 💤 1 ❌
2 188 runs 2 137 ✔️ 50 💤 1 ❌

For more details on these failures, see this check.

Results for commit e557a17.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

bjorng · 2023-04-28T05:40:06Z

lib/stdlib/src/sets.erl

+map(F, #{}=D) when is_function(F, 1) ->
+    maps:from_keys(map_1(F, maps:iterator(D)), ?VALUE);
+map(F, #set{}=D) when is_function(F, 1) ->
+    map_set(F, D).
+
+map_1(Fun, Iter) ->
+    case maps:next(Iter) of
+        {K, _, NextIter} ->
+            [Fun(K) | map_1(Fun, NextIter)];
+        none ->
+            []
+    end.


This is simpler and should also be slightly more efficient:

Suggested change

map(F, #{}=D) when is_function(F, 1) ->

maps:from_keys(map_1(F, maps:iterator(D)), ?VALUE);

map(F, #set{}=D) when is_function(F, 1) ->

map_set(F, D).

map_1(Fun, Iter) ->

case maps:next(Iter) of

{K, _, NextIter} ->

[Fun(K) | map_1(Fun, NextIter)];

none ->

[]

end.

map(F, #{}=D) when is_function(F, 1) ->

maps:from_keys([F(K) || K := _ <- D], ?VALUE);

map(F, #set{}=D) when is_function(F, 1) ->

map_set(F, D).

Oh, yeah, those will be available then =)

But wouldn't it be even simpler (dunno about efficient) to write it like this?

map(F, #{}=D) when is_function(F, 1) -> #{F(K) => ?VALUE || K := _ <- D};

filter/2 could be rewritten in a similar way, like

filter(F, #{}=D) when is_function(F, 1) -> #{K => ?VALUE || K := _ <- D, F(K)};

Yes, simpler but less efficient. Basically, the compiler rewrites a map comprehension to a list comprehension followed by maps:to_list/1. For this example, it would be:

maps:from_list([{F(K),?VALUE} || K := _ <- D])

(Map generators, on the other hand, are more efficient than the corresponding calls to the iterator functions in the maps module because the compiler inlines most of the code for handling iterators.)

Here is the output from a run of erlperf showing that using maps:from_keys/1 is faster than using a map comprehension:

erlperf --init_runner_all '{fun(I) -> I end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' \ 'r({F,D}) when is_function(F, 1) -> maps:from_keys([F(K) || K := _ <- D], []).' \ 'r({F,D}) when is_function(F, 1) -> #{F(K) => [] || K := _ <- D}.' Code || QPS Time Rel r({F,D}) when is_function(F, 1) -> maps:from_keys([F(K) || K := _ <- D], []). 1 916 1092 us 100% r({F,D}) when is_function(F, 1) -> #{F(K) => [] || K := _ <- D}. 1 817 1225 us 89%

Ok, thanks for the explanation, I'll change it as suggested (and add a short comment to explain why we're using maps:from_keys vs a map comprehension).

As for filter/2, this could still be simplified in a likewise manner, right?

filter(F, #{}=D) when is_function(F, 1) -> maps:from_keys([K || K := _ <- D, F(K)], ?VALUE).

If you agree, should we do it in this PR while we're at it, or open a new one?

EDIT: A difference with this approach is that if F returns anything but true for a given K, the element will not be included in the filtered set, whereas the current behavior is to crash if F returns a non-boolean.

The last commit uses a map generator as suggested.

Yes, filter/2 can be simplified in the same way. The simplified version is a little bit faster, especially when many elements are discarded:

erlperf --init_runner_all '{fun(I) -> false end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' 'r({F,D}) -> sets:filter(F, D).' 'r({F,D}) -> t:filter(F, D).' Code || QPS Time Rel r({F,D}) -> t:filter(F, D). 1 9393 107 us 100% r({F,D}) -> sets:filter(F, D). 1 8441 119 us 90% $ erlperf --init_runner_all '{fun(I) -> true end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' 'r({F,D}) -> sets:filter(F, D).' 'r({F,D}) -> t:filter(F, D).' Code || QPS Time Rel r({F,D}) -> sets:filter(F, D). 1 983 1017 us 100% r({F,D}) -> t:filter(F, D). 1 947 1057 us 96%

The two implementations actually have similar error behaviors:

1> sets:filter(fun(_) -> whatever end, sets:from_list([a,b,c], [{version,2}])). ** exception error: no case clause matching whatever in function sets:filter_1/2 (sets.erl, line 476) in call from sets:filter/2 (sets.erl, line 469) 2> t:filter(fun(_) -> whatever end, sets:from_list([a,b,c], [{version,2}])). ** exception error: bad filter whatever in function t:'-filter/2-lc$^0/1-0-'/2 (t.erl, line 6) in call from t:filter/2 (t.erl, line 6)

(For historical reasons, filters in comprehensions fail in different ways depending on whether they are guard expressions or non-guard expressions. This will be documented in OTP 26).

We prefer a separate pull request for the update of filter/2.

Ok, in that case, this PR is done as far as I can tell.

Yes, but you can squash the commits.

juhlig · 2023-05-03T05:05:18Z

@bjorng I still need a version for the since tags.

bjorng · 2023-05-03T06:17:49Z

OTP Technical Board will have to approve this PR, which will probably happen after the release of OTP 26. I will create the ticket when it has been approved.

juhlig · 2023-05-03T07:40:48Z

Squashed 👍

TD5 · 2023-05-15T13:21:58Z

Would it be possible to add a test to capture precisely the expectation around the semantics of a mapping where the mapped values result in a key clash?

juhlig · 2023-05-15T19:33:41Z

Would it be possible to add a test to capture precisely the expectation around the semantics of a mapping where the mapped values result in a key clash?

Not sure what you mean by "key clash". This is about sets, they don't have keys but elements. If multiple values in the input set map to an identical (sets) or equal (ordsets, gb_sets) value, that is fine, the output set will contain only that value.

For example, mapping a set like [a, b, c] with a function like fun (_) -> x end will result in a set containing just x, and just once.

For ordsets and gb_sets which work by testing for equality (==), if the function maps a value from the input set to, say, 1 (integer) and another to 1.0, only one of them will appear in the output set. Which one of the two is undefined, as the traversal order of sets is undefined. So, for example, if you are mapping an input set like [a, b] with a function like fun (a) -> 1; (b) -> 1.0 end, you will end up with an output set containing only either 1 or 1.0 (not both), but which one it is is undefined.
For sets, which works by testing for identity (=:=), your output set will contain both 1 and 1.0, however.

TD5 · 2023-05-16T09:53:56Z

@juhlig yep - precisely that. Thanks for clarifying that here. I was wondering whether it was something that could be neatly captured in some test cases.

juhlig · 2023-05-16T13:04:10Z

Updated commit adding filtermap/2 functions, thereby completing the usual family of filter, map and filtermap.

juhlig · 2023-05-16T13:20:12Z

@juhlig yep - precisely that. Thanks for clarifying that here. I was wondering whether it was something that could be neatly captured in some test cases.

@TD5 still not sure what kind of test you imagine, but anyway, the entire sets_SUITE could IMO use an overhaul or rewrite even... it tests all three sets modules in the same suite, mixes in randomness here and there, and relies on another helper module for god-knows-what. All that makes it pretty messy and haywire-ish, and I'm not sure that the test cases are all that convincing...

Maybe @Maria-12648430 would like to try her hand at a property-based suite one of these days 🙂

Maria-12648430 · 2023-05-17T15:13:54Z

Maybe @Maria-12648430 would like to try her hand at a property-based suite one of these days slightly_smiling_face

Presto, #7256 coming right up 😀

juhlig · 2023-05-17T15:17:49Z

Presto, #7256 coming right up 😀

Awesome 😳

bjorng · 2023-05-22T03:37:50Z

the entire sets_SUITE could IMO use an overhaul or rewrite even... it tests all three sets modules in the same suite, mixes in randomness here and there, and relies on another helper module for god-knows-what

Testing all three sets modules in one test suite was deliberate. I didn't want to write three test suites to test three set modules with many similarities. Therefore, I wrote one test suite with a helper module to hide the differences.

The helper module made more sense as a parameterized module. When parameterized modules were removed from the language, I had to rewrite the helper module to return funs. It would probably have been better if I had removed the module and included the helper function in the test suite itself. I should probably do that as a pull request, and perhaps rename the test suite to generic_sets_SUITE or something similar to make it clearer that it tests all set modules and not just sets.

Maria-12648430 · 2023-05-23T07:29:20Z

Testing all three sets modules in one test suite was deliberate. I didn't want to write three test suites to test three set modules with many similarities.

I can understand the motivation (I'm currently trying to extend #7256 to encompass all set implementations, too), but I must say that it makes it pretty hard to grasp what is going on 😅

lib/stdlib/src/gb_sets.erl

Maria-12648430 · 2023-05-30T08:46:27Z

@bjorng thanks, I applied your suggestions.

Maria-12648430 · 2023-05-30T08:47:07Z

... and will squash the commits again later 😉

Maria-12648430 · 2023-05-30T12:51:35Z

Commits squashed

bjorng

The OTP Technical Board has approved this PR. Use OTP-18622 for the since tags.

Co-authored-by: Jan Uhlig <[email protected]> Co-authored-by: Björn Gustavsson <[email protected]>

Maria-12648430 · 2023-06-05T08:27:14Z

@bjorng great, I just added the since tags 🙂

bjorng · 2023-06-07T11:17:47Z

Thanks for your pull request.

bjorng self-assigned this Apr 28, 2023

bjorng added team:VM Assigned to OTP team VM enhancement labels Apr 28, 2023

bjorng added this to the OTP-27.0 milestone Apr 28, 2023

bjorng reviewed Apr 28, 2023

View reviewed changes

juhlig force-pushed the add_sets_map branch from c43fc39 to 00a6a85 Compare May 3, 2023 07:40

juhlig mentioned this pull request May 3, 2023

Use map generator in sets:filter/2 for version 2 sets #7196

Merged

juhlig force-pushed the add_sets_map branch from 00a6a85 to 90cc550 Compare May 16, 2023 13:02

Maria-12648430 mentioned this pull request May 17, 2023

Property-based tests for sets, ordsets and gb_sets #7256

Merged

Maria-12648430 changed the title ~~Add map functions to sets, ordsets and gb_sets~~ Add map/filtermap functions to sets, ordsets and gb_sets May 22, 2023

bjorng reviewed May 30, 2023

View reviewed changes

lib/stdlib/src/gb_sets.erl Outdated Show resolved Hide resolved

lib/stdlib/src/gb_sets.erl Outdated Show resolved Hide resolved

Maria-12648430 force-pushed the add_sets_map branch from d0b79d4 to e557a17 Compare May 30, 2023 12:47

bjorng approved these changes Jun 2, 2023

View reviewed changes

bjorng mentioned this pull request Jun 2, 2023

Add equality check functions to sets, ordsets and gb_sets #7232

Merged

Add map/filtermap functions to sets, ordsets and gb_sets

be1d1f7

Co-authored-by: Jan Uhlig <[email protected]> Co-authored-by: Björn Gustavsson <[email protected]>

Maria-12648430 force-pushed the add_sets_map branch from e557a17 to be1d1f7 Compare June 5, 2023 08:25

bjorng added the testing currently being tested, tag is used by OTP internal CI label Jun 6, 2023

bjorng merged commit 2d62f71 into erlang:master Jun 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add map/filtermap functions to `sets`, `ordsets` and `gb_sets` #7183

Add map/filtermap functions to `sets`, `ordsets` and `gb_sets` #7183

Maria-12648430 commented Apr 27, 2023

github-actions bot commented Apr 27, 2023 •

edited

Loading

bjorng Apr 28, 2023

juhlig Apr 29, 2023

juhlig Apr 29, 2023

bjorng May 2, 2023

juhlig May 2, 2023

juhlig May 2, 2023 •

edited

Loading

juhlig May 2, 2023

bjorng May 2, 2023

juhlig May 2, 2023

bjorng May 3, 2023

juhlig commented May 3, 2023

bjorng commented May 3, 2023

juhlig commented May 3, 2023

TD5 commented May 15, 2023

juhlig commented May 15, 2023

TD5 commented May 16, 2023 •

edited

Loading

juhlig commented May 16, 2023

juhlig commented May 16, 2023

Maria-12648430 commented May 17, 2023

juhlig commented May 17, 2023

bjorng commented May 22, 2023

Maria-12648430 commented May 23, 2023

Maria-12648430 commented May 30, 2023

Maria-12648430 commented May 30, 2023

Maria-12648430 commented May 30, 2023

bjorng left a comment

Maria-12648430 commented Jun 5, 2023

bjorng commented Jun 7, 2023

Add map/filtermap functions to sets, ordsets and gb_sets #7183

Add map/filtermap functions to sets, ordsets and gb_sets #7183

Conversation

Maria-12648430 commented Apr 27, 2023

github-actions bot commented Apr 27, 2023 • edited Loading

CT Test Results

Artifacts

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juhlig May 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

juhlig commented May 3, 2023

bjorng commented May 3, 2023

juhlig commented May 3, 2023

TD5 commented May 15, 2023

juhlig commented May 15, 2023

TD5 commented May 16, 2023 • edited Loading

juhlig commented May 16, 2023

juhlig commented May 16, 2023

Maria-12648430 commented May 17, 2023

juhlig commented May 17, 2023

bjorng commented May 22, 2023

Maria-12648430 commented May 23, 2023

Maria-12648430 commented May 30, 2023

Maria-12648430 commented May 30, 2023

Maria-12648430 commented May 30, 2023

bjorng left a comment

Choose a reason for hiding this comment

Maria-12648430 commented Jun 5, 2023

bjorng commented Jun 7, 2023

Add map/filtermap functions to `sets`, `ordsets` and `gb_sets` #7183

Add map/filtermap functions to `sets`, `ordsets` and `gb_sets` #7183

github-actions bot commented Apr 27, 2023 •

edited

Loading

juhlig May 2, 2023 •

edited

Loading

TD5 commented May 16, 2023 •

edited

Loading