Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pure=true default to align with Base #76

Open
ParadaCarleton opened this issue Dec 5, 2021 · 12 comments
Open

pure=true default to align with Base #76

ParadaCarleton opened this issue Dec 5, 2021 · 12 comments

Comments

@ParadaCarleton
Copy link

I see why the pure keyword was added, but having pure=false contradicts Julia's behavior, which assumes mapped functions are pure. For instance:

julia> using SparseArrays

julia> A = sparse([1, 1, 2, 3], [1, 3, 2, 3], [3, 1, 2, 3])
3×3 SparseMatrixCSC{Int64, Int64} with 4 stored entries:
 3  ⋅  1
 ⋅  2  ⋅
 ⋅  ⋅  3

julia> map(x -> x^2 + rand(), A)
3×3 SparseMatrixCSC{Float64, Int64} with 9 stored entries:
 9.54687  0.85208  1.86548
 0.85208  4.31839  0.85208
 0.85208  0.85208  9.26578

This default behavior is also inconsistent with most programmers' expectations, as map is a functional construct and therefore tends to assume side-effect free functions. This could lead to bugs; it also means that any code using map on a vector of unknown type can't take advantage of the performance enhancements provided by PooledArrays, since pure is not a keyword for the map method in base. As a result, I propose defaulting to pure=true.

@bkamins
Copy link
Member

bkamins commented Dec 5, 2021

This was not something we added because of an abstract thinking about a problem.
It was the opposite - we changed it because users reported errors.

In particular your reasoning, while plausible is not consistent with map contract in Julia Base, which is:

Transform collection c by applying f to each element.

which clearly states that f must be applied to each element of c. If Julia Base changes this contract we can consider changing it here. Given this contract, I would say that your example from SparseArrays is a bug not a feature.

Additionally, I would expect most users of PooledArrays.jl to not be programmers but regular data scientists who are typically not aware of such subtleties unfortunately so we preferred to be on a safe side with the design.

@ParadaCarleton
Copy link
Author

It's worth noting that in addition to sparse arrays, FillArrays has been using the efficient/pure implementation for a long time now, and I'm definitely wary of a diverging standard from other, very commonly-used packages.

@bkamins
Copy link
Member

bkamins commented Feb 17, 2022

We could change this since in DataFrames.jl we are now safeguarded against this problem with other means.

@nalimilan - what do you think?

@nalimilan
Copy link
Member

I've commented at JuliaSparse/SparseArrays.jl#4 (comment).

TBH I doubt the pure=false default could create any bugs, as doing map(x -> x^2 + rand(), A) and assuming purity is a really weird thing to do -- especially given that this isn't clearly documented. But I agree we should use the same assumptions in all packages once we can clearly document the expectations in Base.

@bkamins
Copy link
Member

bkamins commented Feb 17, 2022

OK - let us wait for core devs and then decide.

@nalimilan
Copy link
Member

nalimilan commented Feb 24, 2022

I wonder whether the recent addition of various kinds of function purity (JuliaLang/julia#43852) could be used here, so that by default we would treat the function as not pure (as currently), but for functions declared as pure (technically, I guess :consistent) we would only run them once for each value in the pool. Or maybe that's too technically advanced so that users won't take advantage of it.

@bkamins
Copy link
Member

bkamins commented Feb 24, 2022

I think it makes sense, as I hope e.g. functions in Base and in most important packages can be annotated this way.

@ParadaCarleton
Copy link
Author

I've commented at JuliaSparse/SparseArrays.jl#4 (comment).

TBH I doubt the pure=false default could create any bugs, as doing map(x -> x^2 + rand(), A) and assuming purity is a really weird thing to do -- especially given that this isn't clearly documented. But I agree we should use the same assumptions in all packages once we can clearly document the expectations in Base.

I don't think it can cause any true bugs, but I first commented when I noticed a massive performance regression in a package of mine that made it essentially unusable with big PooledArrays.

I think it makes sense, as I hope e.g. functions in Base and in most important packages can be annotated this way.

That would be good.

@bkamins
Copy link
Member

bkamins commented Feb 25, 2022

I don't think it can cause any true bugs

We added this functionality because of the bugs reported by users.

However, I know that in many cases it causes a significant performance drop, so we need a good solution here.

@nalimilan
Copy link
Member

I don't think it can cause any true bugs, but I first commented when I noticed a massive performance regression in a package of mine that made it essentially unusable with big PooledArrays.

Do you think using the purity declarations added by JuliaLang/julia#43852) would be an option in that use case?

@ParadaCarleton
Copy link
Author

I don't think it can cause any true bugs

We added this functionality because of the bugs reported by users.

However, I know that in many cases it causes a significant performance drop, so we need a good solution here.

It looks like it is, indeed, intended behavior for map to assume purity. In that case, the current behavior should be considered a bug, since someone looking for map(x->rand()) might want to make sure map is producing identical outputs for identical inputs -- e.g. if mapping each participant's name to a randomly-generated ID.

@bkamins
Copy link
Member

bkamins commented Apr 3, 2022

When JuliaLang/julia#44233 in Base is decided can you please ping us here. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants