Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Set for nub ord #1

Closed
wants to merge 8 commits into from

Conversation

MonoidMusician
Copy link

I haven't tested it yet but I'm pretty sure it works … I made separate commits to make it easy to follow but feel free to squash of course!

@milesfrain
Copy link
Owner

milesfrain commented Jan 13, 2021

@MonoidMusician, Thanks for the help with this.

I'll definitely incorporate your set-based approach for lazy lists.
I want to do some more benchmarking to determine if this is the best approach for strict lists though (even though I appreciate the compact code).

Here's some initial benchmarking results on each branch:
https://github.com/milesfrain/purescript-lists/blob/nub-ord-bench/ord-bench.txt
https://github.com/milesfrain/purescript-lists/blob/nub-ord-set-bench/ord-set-bench.txt

Turns out neither version is stack safe. The original sorting approach fails at 100k elements, and this set approach fails at 10k elements. The sorting version is also noticeably faster after 100 elements. These results are highly-dependent on the number of duplicates though, and I suspect the set-based algo will outperform sorting if there's a high percentage of duplicates. We can more carefully control the distribution of duplicates with something like arrayGen (proposed in another benchmarking PR).

My next step is going to be making the sorting version stack safe.

Edit: I wonder if there's any value in writing a more optimized variant of List.sortBy (e.g. collapseSortBy) that also lets you provide a "collapsing" function that decides which element to keep if the comparison function returns EQ. This would also probably only be useful in situations with lots of duplicates, which is also where the set-based approach shines too.

@milesfrain
Copy link
Owner

milesfrain commented Jan 13, 2021

Working on stack safety in this other branch. #2
Pretty confused why the composition of seemingly-stack-safe functions becomes not stack safe anymore. Do I need to use lists larger than 100k elements?

Edit: Turns out this is a problem with sortBy. Reported in purescript#192

@MonoidMusician
Copy link
Author

Ah yeah, I’m glad you’re benchmarking it. I realized it wasn’t stack safe, but I saw that nubEq wasn’t either so I wasn’t sure if we needed that.

It’s interesting that the sorting would be faster, it seems like that does a lot more work but idk.

@milesfrain
Copy link
Owner

I incorporated your changes into purescript#179.
Once we address some of the other stack safety issues in the library, I'd like to do some more benchmarking.
Let me know if you have any ideas for making lazy nub stack-safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants