-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
type-mismatch problem when push! to data frame introduced by CSV.jl v.0.5.2 changes #1812
Comments
What is your question exactly? Most probably you should use |
sorry hit enter before typing |
ah I didn't know this trick! let me try |
ok thanks. this now creates the eltype WeakRefStrings.StringArray. Is this eltype compatible with functions that worked with Array.Union{missing, string} or might I find some breaking in legacy code you think? |
I think nothing should break as |
Thanks for the help. I was using the CSV.jl instructions as well which didn’t work: CSV.File(file) |> DataFrame! . Bad that their approach to DataFrame creation differ from the DataFrame(CSV.File) approach
Richard
… On 17 May 2019, at 11:18, Bogumił Kamiński ***@***.***> wrote:
I think nothing should break as WeakRefStrings.StringArray{String,1} is subtype of AbstractVector{String}. I will open an issue in CSV.jl repo though as probably this requires clarification in the docs.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub <#1812?email_source=notifications&email_token=ABDMHLJOMTNJ5QNVOB7EVLDPV2A6LA5CNFSM4HNT2P22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVULYOQ#issuecomment-493403194>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDMHLNLYNFVHFX3GSBXG2TPV2A6LANCNFSM4HNT2P2Q>.
|
This is the same.
|
Ah I spoke too soon. There are possible breakages.
In the earlier version, some columns were converted as Array{Union{Missing, String},1}, are now being turned into PooledArrays.PooledArray{String,UInt32,1,Array{UInt32,1}}
… On 17 May 2019, at 11:24, Bogumił Kamiński ***@***.***> wrote:
This is the same.
CSV.File(file) |> DataFrame makes sure you can mutate the output but is slower while CSV.File(file) |> DataFrame! is faster (which probably is relevant only for very large files), but returns immutable columns.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub <#1812?email_source=notifications&email_token=ABDMHLMN7PMY62UV2QL6E2DPV2BXFA5CNFSM4HNT2P22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVUMGZI#issuecomment-493405029>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDMHLNZEXNOT7OHHSYGEJTPV2BXFANCNFSM4HNT2P2Q>.
|
Yes - your general problem is that in your code you assume |
Please report if you encounter methods that work on |
problem i am having with CSV.jl is that in v.0.4.3, the default type was always Union{Missing, String}, Unions{Missing, Int64} etc. Now when I turn CSVs into dataframes I am getting discrepancies in the types, and they all so far seem to be unhappy with missing and thus failing to add rows.
So this bit of code is constantly breaking, eg., push!(df,[“string”,”string”,missing])
… On 17 May 2019, at 15:26, Milan Bouchet-Valat ***@***.***> wrote:
Please report if you encounter methods that work on Vector{Union{String,Missing}} but not on PooledArray{String} or StringArray{String}. These just need to be fixed.
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub <#1812?email_source=notifications&email_token=ABDMHLM6J4ZXQXMPKSVC2UDPV26AHA5CNFSM4HNT2P22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVU44JI#issuecomment-493473317>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDMHLLCAWX52USVJPFYU7LPV26AHANCNFSM4HNT2P2Q>.
|
@quinnj - is it possible now in CSV.jl to simply make all columns accept @disleyland if what I ask above is not possible then use |
Yes AFAIK one needs to call |
Yes - but then the question is if this is a design we leave, or some kwarg to |
fwiw, I'm trying to avoid unnecessary keyword arguments in CSV.jl that have pretty trivial solutions otherwise. It complicates the package surface area and makes large code changes much more painful trying to re-support everything. |
OK |
Yes, I'm not sure an argument is really more convenient than calling |
Yes. I think splashing a notice on CSV.jl that there is a big change in 0.5.2 where the implication is not only that CSV.read generate a "read-only” file with its own eltype, but also that conversion to a DataFrame is no longer automatically setup to missing data if it wasn’t there already, so beware an use allowmissing!(df). I’ve just never known about these things because I’ve never had to. Now I do :)
And in my case, this is really a migration issue from Julia v.6 to 1.1 issue where old CSVs had NA in them (which is why CSV conversion is not auto-detecting a need for missing)
… On 17 May 2019, at 16:49, Jacob Quinn ***@***.***> wrote:
fwiw, I'm trying to avoid unnecessary keyword arguments in CSV.jl that have pretty trivial solutions otherwise. It complicates the package surface area and makes large code changes much more painful trying to re-support everything.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1812?email_source=notifications&email_token=ABDMHLMM3WK7PUARP2P6IIDPV3HYZA5CNFSM4HNT2P22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVVEEXI#issuecomment-493503069>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDMHLIXPQV6EO2TD3V6OTLPV3HYZANCNFSM4HNT2P2Q>.
|
+1 on the depreciation message idea
… On 17 May 2019, at 16:58, Milan Bouchet-Valat ***@***.***> wrote:
Yes, I'm not sure an argument is really more convenient than calling allowmissing! in that case. Though the deprecation could point to allowmissing!/disallowmissing!.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#1812?email_source=notifications&email_token=ABDMHLK6FGY5NAUDROQJZ6TPV3IZBA5CNFSM4HNT2P22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVVE4XA#issuecomment-493506140>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDMHLJB7G3IN7NFQ7EIB7LPV3IZBANCNFSM4HNT2P2Q>.
|
in CSV.jl v.0.4.3, the eltype of a column in a data frame used to something like Array.Union{missing, string}, but in the latest CSV.jl update (0.5.2), the DataFrame column is now CSV.Column{String,String}. This means push!(df,["string","string"]) for example throws up a type-mismatch error. I've tried to set types created by the CSV.read/file process, but the new eltypes are all CSV.Column's. So I'm currently having to manually convert all columns. Am I missing a trick?
The text was updated successfully, but these errors were encountered: