Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

type-mismatch problem when push! to data frame introduced by CSV.jl v.0.5.2 changes #1812

Closed
disleyland opened this issue May 17, 2019 · 19 comments

Comments

@disleyland
Copy link

disleyland commented May 17, 2019

in CSV.jl v.0.4.3, the eltype of a column in a data frame used to something like Array.Union{missing, string}, but in the latest CSV.jl update (0.5.2), the DataFrame column is now CSV.Column{String,String}. This means push!(df,["string","string"]) for example throws up a type-mismatch error. I've tried to set types created by the CSV.read/file process, but the new eltypes are all CSV.Column's. So I'm currently having to manually convert all columns. Am I missing a trick?

@bkamins
Copy link
Member

bkamins commented May 17, 2019

What is your question exactly? Most probably you should use DataFrame(CSV.File("filename")) to create your data frames that can be mutated. By default CSV.jl returns read-only columns now.

@disleyland
Copy link
Author

sorry hit enter before typing

@disleyland
Copy link
Author

ah I didn't know this trick! let me try

@disleyland
Copy link
Author

ok thanks. this now creates the eltype WeakRefStrings.StringArray. Is this eltype compatible with functions that worked with Array.Union{missing, string} or might I find some breaking in legacy code you think?

@bkamins
Copy link
Member

bkamins commented May 17, 2019

I think nothing should break as WeakRefStrings.StringArray{String,1} is subtype of AbstractVector{String}. I will open an issue in CSV.jl repo though as probably this requires clarification in the docs.

@disleyland
Copy link
Author

disleyland commented May 17, 2019 via email

@bkamins
Copy link
Member

bkamins commented May 17, 2019

This is the same.

CSV.File(file) |> DataFrame makes sure you can mutate the output but is slower while CSV.File(file) |> DataFrame! is faster (which probably is relevant only for very large files), but returns immutable columns.

@disleyland
Copy link
Author

disleyland commented May 17, 2019 via email

@bkamins
Copy link
Member

bkamins commented May 17, 2019

Yes - your general problem is that in your code you assume Vector{Union{String,Missing}} and you should assume AbstractVector{Union{String,Missing}}. But - as I noted in JuliaData/CSV.jl#435 I think we should provide the way to return "plain old vector" if the user wishes so.

@nalimilan
Copy link
Member

Please report if you encounter methods that work on Vector{Union{String,Missing}} but not on PooledArray{String} or StringArray{String}. These just need to be fixed.

@disleyland
Copy link
Author

disleyland commented May 17, 2019 via email

@bkamins
Copy link
Member

bkamins commented May 17, 2019

@quinnj - is it possible now in CSV.jl to simply make all columns accept Missing in union even if source data did not contain any missing values? (I think not, but please confirm).

@disleyland if what I ask above is not possible then use allowmissing! on your DataFrame.

@nalimilan
Copy link
Member

Yes AFAIK one needs to call allowmissing! manually. That's a bit annoying, but operations are much faster if you don't expect missing values to be present, so that's a good default.

@bkamins
Copy link
Member

bkamins commented May 17, 2019

Yes - but then the question is if this is a design we leave, or some kwarg to CSV.File should be added to handle this case.

@quinnj
Copy link
Member

quinnj commented May 17, 2019

fwiw, I'm trying to avoid unnecessary keyword arguments in CSV.jl that have pretty trivial solutions otherwise. It complicates the package surface area and makes large code changes much more painful trying to re-support everything.

@bkamins
Copy link
Member

bkamins commented May 17, 2019

OK

@nalimilan
Copy link
Member

Yes, I'm not sure an argument is really more convenient than calling allowmissing! in that case. Though the deprecation could point to allowmissing!/disallowmissing!.

@disleyland
Copy link
Author

disleyland commented May 17, 2019 via email

@disleyland
Copy link
Author

disleyland commented May 17, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants