Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add map/filtermap functions to sets, ordsets and gb_sets #7183

Merged
merged 1 commit into from
Jun 7, 2023

Conversation

Maria-12648430
Copy link
Contributor

This PR provides mapping functionality for sets (sets, ordsets and gb_sets).

Often when I deal with sets, I have the need to map one to another. Currently, this is only possible by folding or by converting to a list, mapping over that, and converting back to a set.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 27, 2023

CT Test Results

       2 files       89 suites   42m 49s ⏱️
1 894 tests 1 845 ✔️ 48 💤 1
2 188 runs  2 137 ✔️ 50 💤 1

For more details on these failures, see this check.

Results for commit e557a17.

♻️ This comment has been updated with latest results.

To speed up review, make sure that you have read Contributing to Erlang/OTP and that all checks pass.

See the TESTING and DEVELOPMENT HowTo guides for details about how to run test locally.

Artifacts

// Erlang/OTP Github Action Bot

@bjorng bjorng self-assigned this Apr 28, 2023
@bjorng bjorng added team:VM Assigned to OTP team VM enhancement labels Apr 28, 2023
@bjorng bjorng added this to the OTP-27.0 milestone Apr 28, 2023
Comment on lines 492 to 503
map(F, #{}=D) when is_function(F, 1) ->
maps:from_keys(map_1(F, maps:iterator(D)), ?VALUE);
map(F, #set{}=D) when is_function(F, 1) ->
map_set(F, D).

map_1(Fun, Iter) ->
case maps:next(Iter) of
{K, _, NextIter} ->
[Fun(K) | map_1(Fun, NextIter)];
none ->
[]
end.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is simpler and should also be slightly more efficient:

Suggested change
map(F, #{}=D) when is_function(F, 1) ->
maps:from_keys(map_1(F, maps:iterator(D)), ?VALUE);
map(F, #set{}=D) when is_function(F, 1) ->
map_set(F, D).
map_1(Fun, Iter) ->
case maps:next(Iter) of
{K, _, NextIter} ->
[Fun(K) | map_1(Fun, NextIter)];
none ->
[]
end.
map(F, #{}=D) when is_function(F, 1) ->
maps:from_keys([F(K) || K := _ <- D], ?VALUE);
map(F, #set{}=D) when is_function(F, 1) ->
map_set(F, D).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, those will be available then =)

But wouldn't it be even simpler (dunno about efficient) to write it like this?

map(F, #{}=D) when is_function(F, 1) ->
    #{F(K) => ?VALUE || K := _ <- D};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter/2 could be rewritten in a similar way, like

filter(F, #{}=D) when is_function(F, 1) ->
    #{K => ?VALUE || K := _ <- D, F(K)};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, simpler but less efficient. Basically, the compiler rewrites a map comprehension to a list comprehension followed by maps:to_list/1. For this example, it would be:

maps:from_list([{F(K),?VALUE} || K := _ <- D])

(Map generators, on the other hand, are more efficient than the corresponding calls to the iterator functions in the maps module because the compiler inlines most of the code for handling iterators.)

Here is the output from a run of erlperf showing that using maps:from_keys/1 is faster than using a map comprehension:

erlperf --init_runner_all '{fun(I) -> I end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' \
'r({F,D}) when is_function(F, 1) -> maps:from_keys([F(K) || K := _ <- D], []).' \
'r({F,D}) when is_function(F, 1) -> #{F(K) => [] || K := _ <- D}.' 
Code                                                                                  ||        QPS       Time   Rel
r({F,D}) when is_function(F, 1) -> maps:from_keys([F(K) || K := _ <- D], []).          1        916    1092 us  100%
r({F,D}) when is_function(F, 1) -> #{F(K) => [] || K := _ <- D}.                       1        817    1225 us   89%

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the explanation, I'll change it as suggested (and add a short comment to explain why we're using maps:from_keys vs a map comprehension).

Copy link
Contributor

@juhlig juhlig May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for filter/2, this could still be simplified in a likewise manner, right?

filter(F, #{}=D) when is_function(F, 1) ->
    maps:from_keys([K || K := _ <- D, F(K)], ?VALUE).

If you agree, should we do it in this PR while we're at it, or open a new one?

EDIT: A difference with this approach is that if F returns anything but true for a given K, the element will not be included in the filtered set, whereas the current behavior is to crash if F returns a non-boolean.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit uses a map generator as suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, filter/2 can be simplified in the same way. The simplified version is a little bit faster, especially when many elements are discarded:

erlperf --init_runner_all '{fun(I) -> false end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' 'r({F,D}) -> sets:filter(F, D).' 'r({F,D}) -> t:filter(F, D).'
Code                                   ||        QPS       Time   Rel
r({F,D}) -> t:filter(F, D).             1       9393     107 us  100%
r({F,D}) -> sets:filter(F, D).          1       8441     119 us   90%
$ erlperf --init_runner_all '{fun(I) -> true end, sets:from_list(lists:seq(1, 10_000), [{version,2}])}.' 'r({F,D}) -> sets:filter(F, D).' 'r({F,D}) -> t:filter(F, D).'
Code                                   ||        QPS       Time   Rel
r({F,D}) -> sets:filter(F, D).          1        983    1017 us  100%
r({F,D}) -> t:filter(F, D).             1        947    1057 us   96%

The two implementations actually have similar error behaviors:

1> sets:filter(fun(_) -> whatever end, sets:from_list([a,b,c], [{version,2}])).
** exception error: no case clause matching whatever
     in function  sets:filter_1/2 (sets.erl, line 476)
     in call from sets:filter/2 (sets.erl, line 469)
2> t:filter(fun(_) -> whatever end, sets:from_list([a,b,c], [{version,2}])).   
** exception error: bad filter whatever
     in function  t:'-filter/2-lc$^0/1-0-'/2 (t.erl, line 6)
     in call from t:filter/2 (t.erl, line 6)

(For historical reasons, filters in comprehensions fail in different ways depending on whether they are guard expressions or non-guard expressions. This will be documented in OTP 26).

We prefer a separate pull request for the update of filter/2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in that case, this PR is done as far as I can tell.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you can squash the commits.

@juhlig
Copy link
Contributor

juhlig commented May 3, 2023

@bjorng I still need a version for the since tags.

@bjorng
Copy link
Contributor

bjorng commented May 3, 2023

OTP Technical Board will have to approve this PR, which will probably happen after the release of OTP 26. I will create the ticket when it has been approved.

@juhlig
Copy link
Contributor

juhlig commented May 3, 2023

Squashed 👍

@TD5
Copy link
Contributor

TD5 commented May 15, 2023

Would it be possible to add a test to capture precisely the expectation around the semantics of a mapping where the mapped values result in a key clash?

@juhlig
Copy link
Contributor

juhlig commented May 15, 2023

Would it be possible to add a test to capture precisely the expectation around the semantics of a mapping where the mapped values result in a key clash?

Not sure what you mean by "key clash". This is about sets, they don't have keys but elements. If multiple values in the input set map to an identical (sets) or equal (ordsets, gb_sets) value, that is fine, the output set will contain only that value.

For example, mapping a set like [a, b, c] with a function like fun (_) -> x end will result in a set containing just x, and just once.

For ordsets and gb_sets which work by testing for equality (==), if the function maps a value from the input set to, say, 1 (integer) and another to 1.0, only one of them will appear in the output set. Which one of the two is undefined, as the traversal order of sets is undefined. So, for example, if you are mapping an input set like [a, b] with a function like fun (a) -> 1; (b) -> 1.0 end, you will end up with an output set containing only either 1 or 1.0 (not both), but which one it is is undefined.
For sets, which works by testing for identity (=:=), your output set will contain both 1 and 1.0, however.

@TD5
Copy link
Contributor

TD5 commented May 16, 2023

@juhlig yep - precisely that. Thanks for clarifying that here. I was wondering whether it was something that could be neatly captured in some test cases.

@juhlig
Copy link
Contributor

juhlig commented May 16, 2023

Updated commit adding filtermap/2 functions, thereby completing the usual family of filter, map and filtermap.

@juhlig
Copy link
Contributor

juhlig commented May 16, 2023

@juhlig yep - precisely that. Thanks for clarifying that here. I was wondering whether it was something that could be neatly captured in some test cases.

@TD5 still not sure what kind of test you imagine, but anyway, the entire sets_SUITE could IMO use an overhaul or rewrite even... it tests all three sets modules in the same suite, mixes in randomness here and there, and relies on another helper module for god-knows-what. All that makes it pretty messy and haywire-ish, and I'm not sure that the test cases are all that convincing...

Maybe @Maria-12648430 would like to try her hand at a property-based suite one of these days 🙂

@Maria-12648430
Copy link
Contributor Author

Maybe @Maria-12648430 would like to try her hand at a property-based suite one of these days slightly_smiling_face

Presto, #7256 coming right up 😀

@juhlig
Copy link
Contributor

juhlig commented May 17, 2023

Presto, #7256 coming right up 😀

Awesome 😳

@bjorng
Copy link
Contributor

bjorng commented May 22, 2023

the entire sets_SUITE could IMO use an overhaul or rewrite even... it tests all three sets modules in the same suite, mixes in randomness here and there, and relies on another helper module for god-knows-what

Testing all three sets modules in one test suite was deliberate. I didn't want to write three test suites to test three set modules with many similarities. Therefore, I wrote one test suite with a helper module to hide the differences.

The helper module made more sense as a parameterized module. When parameterized modules were removed from the language, I had to rewrite the helper module to return funs. It would probably have been better if I had removed the module and included the helper function in the test suite itself. I should probably do that as a pull request, and perhaps rename the test suite to generic_sets_SUITE or something similar to make it clearer that it tests all set modules and not just sets.

@Maria-12648430 Maria-12648430 changed the title Add map functions to sets, ordsets and gb_sets Add map/filtermap functions to sets, ordsets and gb_sets May 22, 2023
@Maria-12648430
Copy link
Contributor Author

Testing all three sets modules in one test suite was deliberate. I didn't want to write three test suites to test three set modules with many similarities.

I can understand the motivation (I'm currently trying to extend #7256 to encompass all set implementations, too), but I must say that it makes it pretty hard to grasp what is going on 😅

lib/stdlib/src/gb_sets.erl Outdated Show resolved Hide resolved
lib/stdlib/src/gb_sets.erl Outdated Show resolved Hide resolved
@Maria-12648430
Copy link
Contributor Author

@bjorng thanks, I applied your suggestions.

@Maria-12648430
Copy link
Contributor Author

... and will squash the commits again later 😉

@Maria-12648430
Copy link
Contributor Author

Commits squashed

Copy link
Contributor

@bjorng bjorng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OTP Technical Board has approved this PR. Use OTP-18622 for the since tags.

Co-authored-by: Jan Uhlig <[email protected]>
Co-authored-by: Björn Gustavsson <[email protected]>
@Maria-12648430
Copy link
Contributor Author

@bjorng great, I just added the since tags 🙂

@bjorng bjorng added the testing currently being tested, tag is used by OTP internal CI label Jun 6, 2023
@bjorng bjorng merged commit 2d62f71 into erlang:master Jun 7, 2023
@bjorng
Copy link
Contributor

bjorng commented Jun 7, 2023

Thanks for your pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement team:VM Assigned to OTP team VM testing currently being tested, tag is used by OTP internal CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants