-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add map/filtermap functions to sets
, ordsets
and gb_sets
#7183
Conversation
CT Test Results 2 files 89 suites 42m 49s ⏱️ For more details on these failures, see this check. Results for commit e557a17. ♻️ This comment has been updated with latest results. To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass. See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally. Artifacts// Erlang/OTP Github Action Bot |
lib/stdlib/src/sets.erl
Outdated
map(F, #{}=D) when is_function(F, 1) -> | ||
maps:from_keys(map_1(F, maps:iterator(D)), ?VALUE); | ||
map(F, #set{}=D) when is_function(F, 1) -> | ||
map_set(F, D). | ||
|
||
map_1(Fun, Iter) -> | ||
case maps:next(Iter) of | ||
{K, _, NextIter} -> | ||
[Fun(K) | map_1(Fun, NextIter)]; | ||
none -> | ||
[] | ||
end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is simpler and should also be slightly more efficient:
map(F, #{}=D) when is_function(F, 1) -> | |
maps:from_keys(map_1(F, maps:iterator(D)), ?VALUE); | |
map(F, #set{}=D) when is_function(F, 1) -> | |
map_set(F, D). | |
map_1(Fun, Iter) -> | |
case maps:next(Iter) of | |
{K, _, NextIter} -> | |
[Fun(K) | map_1(Fun, NextIter)]; | |
none -> | |
[] | |
end. | |
map(F, #{}=D) when is_function(F, 1) -> | |
maps:from_keys([F(K) || K := _ <- D], ?VALUE); | |
map(F, #set{}=D) when is_function(F, 1) -> | |
map_set(F, D). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, those will be available then =)
But wouldn't it be even simpler (dunno about efficient) to write it like this?
map(F, #{}=D) when is_function(F, 1) ->
#{F(K) => ?VALUE || K := _ <- D};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter/2
could be rewritten in a similar way, like
filter(F, #{}=D) when is_function(F, 1) ->
#{K => ?VALUE || K := _ <- D, F(K)};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, simpler but less efficient. Basically, the compiler rewrites a map comprehension to a list comprehension followed by maps:to_list/1
. For this example, it would be:
maps:from_list([{F(K),?VALUE} || K := _ <- D])
(Map generators, on the other hand, are more efficient than the corresponding calls to the iterator functions in the maps
module because the compiler inlines most of the code for handling iterators.)
Here is the output from a run of erlperf
showing that using maps:from_keys/1
is faster than using a map comprehension:
erlperf --init_runner_all '{fun(I) -> I end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' \
'r({F,D}) when is_function(F, 1) -> maps:from_keys([F(K) || K := _ <- D], []).' \
'r({F,D}) when is_function(F, 1) -> #{F(K) => [] || K := _ <- D}.'
Code || QPS Time Rel
r({F,D}) when is_function(F, 1) -> maps:from_keys([F(K) || K := _ <- D], []). 1 916 1092 us 100%
r({F,D}) when is_function(F, 1) -> #{F(K) => [] || K := _ <- D}. 1 817 1225 us 89%
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the explanation, I'll change it as suggested (and add a short comment to explain why we're using maps:from_keys
vs a map comprehension).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for filter/2
, this could still be simplified in a likewise manner, right?
filter(F, #{}=D) when is_function(F, 1) ->
maps:from_keys([K || K := _ <- D, F(K)], ?VALUE).
If you agree, should we do it in this PR while we're at it, or open a new one?
EDIT: A difference with this approach is that if F
returns anything but true
for a given K
, the element will not be included in the filtered set, whereas the current behavior is to crash if F
returns a non-boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit uses a map generator as suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, filter/2
can be simplified in the same way. The simplified version is a little bit faster, especially when many elements are discarded:
erlperf --init_runner_all '{fun(I) -> false end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' 'r({F,D}) -> sets:filter(F, D).' 'r({F,D}) -> t:filter(F, D).'
Code || QPS Time Rel
r({F,D}) -> t:filter(F, D). 1 9393 107 us 100%
r({F,D}) -> sets:filter(F, D). 1 8441 119 us 90%
$ erlperf --init_runner_all '{fun(I) -> true end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' 'r({F,D}) -> sets:filter(F, D).' 'r({F,D}) -> t:filter(F, D).'
Code || QPS Time Rel
r({F,D}) -> sets:filter(F, D). 1 983 1017 us 100%
r({F,D}) -> t:filter(F, D). 1 947 1057 us 96%
The two implementations actually have similar error behaviors:
1> sets:filter(fun(_) -> whatever end, sets:from_list([a,b,c], [{version,2}])).
** exception error: no case clause matching whatever
in function sets:filter_1/2 (sets.erl, line 476)
in call from sets:filter/2 (sets.erl, line 469)
2> t:filter(fun(_) -> whatever end, sets:from_list([a,b,c], [{version,2}])).
** exception error: bad filter whatever
in function t:'-filter/2-lc$^0/1-0-'/2 (t.erl, line 6)
in call from t:filter/2 (t.erl, line 6)
(For historical reasons, filters in comprehensions fail in different ways depending on whether they are guard expressions or non-guard expressions. This will be documented in OTP 26).
We prefer a separate pull request for the update of filter/2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, in that case, this PR is done as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you can squash the commits.
@bjorng I still need a version for the |
OTP Technical Board will have to approve this PR, which will probably happen after the release of OTP 26. I will create the ticket when it has been approved. |
Squashed 👍 |
Would it be possible to add a test to capture precisely the expectation around the semantics of a mapping where the mapped values result in a key clash? |
Not sure what you mean by "key clash". This is about sets, they don't have keys but elements. If multiple values in the input set map to an identical ( For example, mapping a set like For |
@juhlig yep - precisely that. Thanks for clarifying that here. I was wondering whether it was something that could be neatly captured in some test cases. |
Updated commit adding |
@TD5 still not sure what kind of test you imagine, but anyway, the entire Maybe @Maria-12648430 would like to try her hand at a property-based suite one of these days 🙂 |
Presto, #7256 coming right up 😀 |
Awesome 😳 |
Testing all three sets modules in one test suite was deliberate. I didn't want to write three test suites to test three set modules with many similarities. Therefore, I wrote one test suite with a helper module to hide the differences. The helper module made more sense as a parameterized module. When parameterized modules were removed from the language, I had to rewrite the helper module to return funs. It would probably have been better if I had removed the module and included the helper function in the test suite itself. I should probably do that as a pull request, and perhaps rename the test suite to |
sets
, ordsets
and gb_sets
sets
, ordsets
and gb_sets
I can understand the motivation (I'm currently trying to extend #7256 to encompass all set implementations, too), but I must say that it makes it pretty hard to grasp what is going on 😅 |
@bjorng thanks, I applied your suggestions. |
... and will squash the commits again later 😉 |
d0b79d4
to
e557a17
Compare
Commits squashed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OTP Technical Board has approved this PR. Use OTP-18622 for the since tags.
Co-authored-by: Jan Uhlig <[email protected]> Co-authored-by: Björn Gustavsson <[email protected]>
e557a17
to
be1d1f7
Compare
@bjorng great, I just added the since tags 🙂 |
Thanks for your pull request. |
This PR provides mapping functionality for sets (
sets
,ordsets
andgb_sets
).Often when I deal with sets, I have the need to map one to another. Currently, this is only possible by folding or by converting to a list, mapping over that, and converting back to a set.