Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional Zip and Unzip methods #48

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

TyMick
Copy link
Contributor

@TyMick TyMick commented Oct 27, 2021

Happy Hackoberfest! 😅

I had the desire to unzip one sequence into three in a recent case, so I was wondering, would a three-way Zip and two- and three-way Unzips be worthy additions to EnumerableUtility?

A summary of new methods added:

IEnumerable<TResult> ZipTruncate<T1, T2, TResult>(
	this IEnumerable<T1> first,
	IEnumerable<T2> second,
	Func<T1, T2, TResult> resultSelector);

IEnumerable<(T1, T2, T3)> Zip<T1, T2, T3>(
	this IEnumerable<T1> first,
	IEnumerable<T2> second,
	IEnumerable<T3> third);

IEnumerable<TResult> Zip<T1, T2, T3, TResult>(
	this IEnumerable<T1> first,
	IEnumerable<T2> second,
	IEnumerable<T3> third,
	Func<T1, T2, T3, TResult> resultSelector);

(IEnumerable<T1>, IEnumerable<T2>) Unzip<T1, T2>(this IEnumerable<(T1, T2)> source);

(IEnumerable<T1>, IEnumerable<T2>, IEnumerable<T3>) Unzip<T1, T2, T3>(this IEnumerable<(T1, T2, T3)> source);

(IEnumerable<T1>, IEnumerable<T2>) Unzip<TSource, T1, T2>(
	this IEnumerable<TSource> source,
	Func<TSource, T1> firstSelector,
	Func<TSource, T2> secondSelector);

(IEnumerable<T1>, IEnumerable<T2>, IEnumerable<T3>) Unzip<TSource, T1, T2, T3>(
	this IEnumerable<TSource> source,
	Func<TSource, T1> firstSelector,
	Func<TSource, T2> secondSelector,
	Func<TSource, T3> thirdSelector);

I didn't add a two-way Zip with a resultSelector parameter because one of those was already included in .NET Standard 2.0.

/// <param name="secondSelector">A function for selecting the second return sequence's elements from the source sequence's elements.</param>
/// <returns>A tuple of sequences, separating the source sequence according to the selector arguments.</returns>
[SuppressMessage("StyleCop.CSharp.ReadabilityRules", "SA1414:Tuple types in signatures should have element names", Justification = "By design.")]
public static (IEnumerable<T1>, IEnumerable<T2>) Unzip<TSource, T1, T2>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I can imagine a "lazy" implementation that returns a tuple of two objects that only enumerate the source sequence as they are enumerated, it seems unlikely that we'd build that. I'm wondering whether it would just be worth returning (IReadOnlyList<T1>, IReadOnlyList<T2>) since the sequences will be materialised.

Copy link
Member

@bgrainger bgrainger Oct 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, a lazy implementation could be Unzip(source, firstSelector, secondSelector) => (source.Select(firstSelector), source.Select(secondSelector); I'm wondering if that should just be the implementation, since it adds the benefit of laziness (and just depending on framework code).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does enumerate source twice, which may be undesirable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding my earlier comments: Let's keep the signature (to allow for future laziness) and the current implementation (which enumerates source once).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that you mention it, that gives me an idea: what if we enumerate the source once at the beginning of the method, then return the .Select results as you mentioned? Then half of the process can be lazy, at least.

var enumeratedSource = source.ToArray();
return (enumeratedSource.Select(firstSelector), enumeratedSource.Select(secondSelector));

I've made that change in a1bfb97—let me know if that looks good. 👍🏼

In that case, would it be good to make the source parameter an IReadOnlyCollection instead, to make the required enumeration clearer to the caller? Or is keeping it an IEnumerable worth the flexibility? Maybe we could just add a remark to the doc comment that this enumerates source before the unzipping operation.

So that the unzip operation itself can be lazy.
Comment on lines +898 to +901
var enumeratedSource = source?.ToArray() ?? throw new ArgumentNullException(nameof(source));
return (
enumeratedSource.Select(firstSelector ?? throw new ArgumentNullException(nameof(firstSelector))),
enumeratedSource.Select(secondSelector ?? throw new ArgumentNullException(nameof(secondSelector)))
Copy link
Contributor Author

@TyMick TyMick Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I remove these ?? throw new ArgumentNullExceptions since ToArray and Select already throw the same thing for null arguments? Or is it better to add the throw at this level, too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants