-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DiskArrays for Variable
's
#205
Conversation
Hmm I dont get how your second example could ever work? Oh the 3x3 is a chunk? |
Thanks a lot for taking a look at this issue! :-) I have an old branch https://github.com/Alexander-Barth/NCDatasets.jl/tree/DiskArrays that might be useful. What do you think about these changes? diff --git a/src/NCDatasets.jl b/src/NCDatasets.jl
index 2140f5e..bd1da5a 100644
--- a/src/NCDatasets.jl
+++ b/src/NCDatasets.jl
@@ -38,7 +38,7 @@ import CommonDataModel: AbstractDataset, AbstractVariable,
groupnames, group,
dimnames, dim,
attribnames, attrib
-import DiskArrays: readblock!, writeblock!
+import DiskArrays: readblock!, writeblock!, AbstractDiskArray
function __init__()
NetCDF_jll.is_available() && init_certificate_authority()
diff --git a/src/types.jl b/src/types.jl
index c89fb29..a637ce2 100644
--- a/src/types.jl
+++ b/src/types.jl
@@ -61,7 +61,7 @@ end
# Variable (as stored in NetCDF file, without using
# add_offset, scale_factor and _FillValue)
-mutable struct Variable{NetCDFType,N,TDS<:AbstractNCDataset} <: AbstractNCVariable{NetCDFType, N}
+mutable struct Variable{NetCDFType,N,TDS<:AbstractNCDataset} <: AbstractDiskArray{NetCDFType, N}
ds::TDS
varid::Cint
dimids::NTuple{N,Cint}
diff --git a/src/variable.jl b/src/variable.jl
index 425df70..ba02c72 100644
--- a/src/variable.jl
+++ b/src/variable.jl
@@ -446,8 +446,8 @@ function readblock!(v::Variable{T,N}, aout, indexes::TR...) where {T,N} where TR
data = Array{T,N}(undef,jlshape)
datamode(v.ds)
- aout[indexes...] .= nc_get_vars!(v.ds.ncid,v.varid,start,count,stride,data)
- return data
+ nc_get_vars!(v.ds.ncid,v.varid,start,count,stride,aout)
+ return aout
end
function writeblock!(v::Variable{T,N},data::T,indexes::StepRange{Int,Int}...) where {T,N} (I just commit them to see). I am wondering if the changes are not rather limited to Variable (as other variable type just wrap the base NetCDF Variable type). |
Unfortunately, we still have an infinite recursion when loading a scalar: https://github.com/Alexander-Barth/NCDatasets.jl/blob/pr-DiskArray/test/test_scalar.jl#L16 |
Think I slightly misunderstood the example but indexing with a colon should work but should always return a vector? Not a scalar? If we dont follow base array indexing sharing code is difficult. |
the data in question is a zero-dimensional array. This does work in base julia: zero_dim_array = dropdims([1],dims=1)
zero_dim_array[:]
For some reasons my commits do not show in this PR, strange. |
Yes it should work, returning a vector. But youre testing that the result is a scalar? (Its also possible there are bugs for zero dimensions in DiskArrays.jl) |
You're probably right. At the moment,
- aout[indexes...] .= nc_get_vars!(v.ds.ncid,v.varid,start,count,stride,data)
- return data
+ nc_get_vars!(v.ds.ncid,v.varid,start,count,stride,aout)
+ return aout This would be better since we don't allocate the julia> var[2:4,1:5,1]
ERROR: TypeError: non-boolean (Missing) used in boolean context
Stacktrace:
[1] CFtransform_missing
@ ~/.julia/dev/CommonDataModel/src/cfvariable.jl:235 [inlined]
[2] CFtransform
@ ~/.julia/dev/CommonDataModel/src/cfvariable.jl:277 [inlined]
[3] macro expansion
@ ~/.julia/dev/CommonDataModel/src/cfvariable.jl:318 [inlined]
[4] macro expansion
@ ./simdloop.jl:77 [inlined]
[5] CFtransformdata!
@ ~/.julia/dev/CommonDataModel/src/cfvariable.jl:317 [inlined]
[6] CFtransformdata
@ ~/.julia/dev/CommonDataModel/src/cfvariable.jl:326 [inlined]
[7] readblock!(::CommonDataModel.CFVariable{Union{Missing, Float32}, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float32, Tuple{Float32}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, ::Array{Union{Missing, Float32}, 3}, ::UnitRange{Int64}, ::UnitRange{Int64}, ::Vararg{UnitRange{Int64}})
@ CommonDataModel ~/.julia/dev/CommonDataModel/src/cfvariable.jl:390
[8] getindex_disk(::CommonDataModel.CFVariable{Union{Missing, Float32}, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float32, Tuple{Float32}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, ::UnitRange{Int64}, ::Vararg{Any})
@ DiskArrays ~/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:31
[9] getindex(::CommonDataModel.CFVariable{Union{Missing, Float32}, 3, NCDatasets.Variable{Float32, 3, NCDataset{Nothing}}, NCDatasets.Attributes{NCDataset{Nothing}}, NamedTuple{(:fillvalue, :missing_values, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor), Tuple{Float32, Tuple{Float32}, Nothing, Nothing, Nothing, Nothing, Nothing}}}, ::UnitRange{Int64}, ::UnitRange{Int64}, ::Int64)
@ DiskArrays ~/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:177
[10] top-level scope
@ REPL[25]:1 Note that indexing directly on the variable works though: julia> var.var[2:4,1:5,1]
3×5 Matrix{Float32}:
1.0f20 1.0f20 1.0f20 1.0f20 1.0f20
1.0f20 1.0f20 1.0f20 1.0f20 1.0f20
1.0f20 1.0f20 1.0f20 1.0f20 1.0f20 I think it comes from the fact that when this method is called, Except from this, reading seems to work for this particular use case. I will try to get writing work, I hope it won't be too much trickier!
Strange indeed! Do you commit to my forked |
If you want to add diskarrays behavioir just to Variable, you can do that using the macros provided in DiskArrays.jl, rather than subtyping AbstractDiskArray. But Im not sure you would want any disk based array types to not have disk arrays behaviour, lazy chunked broadcasting is a pretty nice thing to have, instead if broadcasts just not working at all. |
I am wondering if (at least as a start) we just convert the base type NCDatasets.Variable to derive from DiskArray. Maybe the macros that @rafaqz mentioned can then be used for wrapped types like CFVariable without requiring that they derive from DiskArray? You guys have a better understand of DiskArrays works... Is there any documentation about the macros?
I used the commands from https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally to test the PR:
and I assumed that any commits with show up here, but maybe that work only for PR that I initiate. In any case, if things become simplier for you, if you are a dev on this repo just let me know. @rafaqz In general, yes some the test values should change. The issue of the recursion happened before but maybe it is related to this: I will already change the test in the master version. |
I tried to run the test and I spotted a few general issues:
|
See here for disk arrays macros. Above that application you can see all the sub macros. But you probably want the main one: https://github.com/meggart/DiskArrays.jl/blob/main/src/DiskArrays.jl#L37 So We could run your tests in CI so we can all see the failures and discuss them? I think thats the best way for me to contribute to the discussion.
Edit sorry I just realised this isn't the CommonDataModel thread so not JuliaGeo. @Alexander-Barth can you change the package settings so that @tcarion can run CI with these pushes without you clicking approve every time? I would like to help with resoloving the bugs. |
Concerning the error type "2. When infinite dimensions are implied" there are some changes API, for example when previously we could do
we now need to do:
which mimics the behaviour of NetCDF.jl. For the other cases, however, I don't think (or hope :-) ) that we need to break current API. Another point is that Thanks @rafaqz for the info!
To be certain that I understand: we can either derive Variable from AbstractDiskArrays or we can use the macro to inherit the behavior (chunking,...) . Is this correct? Or do we need to do both? What do you think about the approach to make just NCDatasets.Variable as a DiskArrays and all types wrapping NCDatasets.Variable would use method forwarding (for I changed the setting in github action; I hope it works now. (If not maybe @tcarion can make a trivial change the to README.md so that he is not considered as a first time contributor to NCDatasets for github.) |
I tried and I get a julia> @macroexpand(@implement_getindex(Variable))
quote
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:177 =#
(DiskArrays.Base).getindex(var"#574#a"::DiskArrays.Variable, var"#575#i"...) = begin
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:177 =#
DiskArrays.getindex_disk(var"#574#a", var"#575#i"...)
end
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:179 =#
function (DiskArrays.Base).getindex(var"#578#a"::DiskArrays.Variable, var"#579#i"::DiskArrays.ChunkIndex)
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:179 =#
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:180 =#
var"#576#cs" = DiskArrays.eachchunk(var"#578#a")
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:181 =#
var"#577#inds" = var"#576#cs"[(var"#579#i").I]
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:182 =#
return DiskArrays.wrapchunk((var"#579#i").chunktype, var"#578#a"[var"#577#inds"...], var"#577#inds")
end
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:184 =#
function (DiskArrays.DiskArrays).ChunkIndices(var"#580#a"::DiskArrays.Variable; offset = false)
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:184 =#
#= /home/tcarion/.julia/packages/DiskArrays/zaCXL/src/diskarray.jl:185 =#
return DiskArrays.ChunkIndices((DiskArrays.Base).OneTo.(DiskArrays.size(DiskArrays.eachchunk(var"#580#a"))), if offset
DiskArrays.OffsetChunks()
else
DiskArrays.OneBasedChunks()
end)
end
end I'm pretty unexperienced with macros, but I'd say it's something that should be fixed in DiskArrays, right?
Ok I'll keep that in mind when trying to make the tests work, thanks!
Thank you, I'll try after my next commit, I'll let you know if it didn't work :) |
@Alexander-Barth
Yes the macro is what we apply to
Honestly I think eventually every chunked disk-based array object in the julia ecosystem should use DiskArrays.jl so we have a consistent standard. But we should of course compromise on that to just get things working! that's why we have the macros. |
@tcarion yeah this macro hasn't been very widely used, it seems to not be escaped properly. Try using Otherwise I will fix it later on. |
I tried and the issue stays:
Thanks for looking at it! |
Im working on a fix now |
Turns oit there a few other fixes needed for this to work. |
I agree, we have already |
I've registered a patch version of DiskArrays.jl with fixed macros, hopefully that gets things working. |
@Alexander-Barth can we get this merged? |
Sorry I did not see your message (I got a lot of emails from github recently that I missed that you are ready with the PR). Thanks a lot for @tcarion for the hard work and @rafaqz and @meggart for their help! |
For the release notes:
|
This doesn't depend on CommonDataModel changes, as we cant do this for These are kind of demonstration PRs... ultimately we need to make It would fix issues like this and allow deleting But it will hit your I'm happy to do this work if its something you want. Otherwise merging this is useful as a stopgap so we can hack together temporary DiskArrays support elsewhere on top of |
I confirm that JuliaGeo/CommonDataModel.jl#9 is not needed for this PR to work. JuliaGeo/CommonDataModel.jl#9 was intended to try to make all The release notes seem nice and I think they cover everything that has changed, thank you for writing them! |
Conflicts: src/variable.jl
Unfortunately, I see some slow down for the benchmark on my machine. Current master runs in 181.193 ms while this PR 307.836 ms. Do you see this as well? I am using DiskArrays v0.3.14 (current version). I tried this PR with another package of our package (DIVAnd) and upgrade was quite smooth. For info: this commit 7edf68f disable the opendap tests which causes current test failures (opendap server seems to be down). Master
PR
|
function Base.getindex(v::Variable,indexes::Int...) | ||
# This method needs to be duplicated instead of using an Union. Otherwise a DiskArrays fallback is called instead which impacts performances | ||
# (see https://github.com/Alexander-Barth/NCDatasets.jl/pull/205#issuecomment-1589575041) | ||
function readblock!(v::Variable, aout, indexes::TI...) where TI <: Union{AbstractUnitRange,StepRange} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function readblock!(v::Variable, aout, indexes::TI...) where TI <: Union{AbstractUnitRange,StepRange} | |
function readblock!(v::Variable, aout, indexes::Union{<:AbstractUnitRange,<:StepRange}...) |
This dispatch was not correct so we were using the fallback for AbstractVector
.
I get roughly the same performance as master
after this change.
@Alexander-Barth see the fix above for your performance question. Took a few hours to track down 😅 - that's a very subtle bug in dispatch on Hopefully this is good to merge now after that change is accepted. |
Great! I confirm that the timing are now good:
It seems that I have now a new warning that I did not see before:
Any ideas where this can come from? I am getting them with DiskArrays v0.3.16 and v0.3.17. |
Great! I think those warnings are my fault form recent addition of methods, will fix. |
I'm not getting the other error |
@Alexander-Barth this shouldnt have a warning now |
With DiskArrays, v0.3.18, there seam to be macro issue: Line 71 on NCDatasets.jl is Any idea which is could be the case?
Wild guess: maybe on |
Feel free to PR your fix ;) |
Ok fixed, will be registered soon. That method is only for preventing ambiguity before throwing an error, so isn't caught in the DiskArrays tests because we don't use the macro on Anyway, there are some overheads to using these macros rather than inheritance, we were actually close to deleting the macros in DiskArrays.jl before this PR. |
@Alexander-Barth this should be good to merge |
I still see a method warning (DiskArrays v0.3.19):
But with these two commits 08a4888 and 7edf68f (unrelated to DiskArrays) all NCDatasets test pass on julia 1.9 and 1.6. Can this warning be fixed before I merge? Thanks a lot! |
Thank you both for your help! Sorry I'm not very available lastly... On 1.10, this becomes an error: _
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.10.0-beta2 (2023-08-17)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using NCDatasets
Precompiling NCDatasets
✗ NCDatasets
0 dependencies successfully precompiled in 7 seconds. 29 already precompiled.
ERROR: The following 1 direct dependency failed to precompile:
NCDatasets [85f8d34a-cbdd-5861-8df4-14fed0d494ab]
Failed to precompile NCDatasets [85f8d34a-cbdd-5861-8df4-14fed0d494ab] to "/home/tcarion/.julia/compiled/v1.10/NCDatasets/jl_c4pSNN".
WARNING: Method definition _reshape(DiskArrays.AbstractDiskArray{var"#s69", 1} where var"#s69", Tuple{Int64}) in module DiskArrays at /home/tcarion/.julia/packages/DiskArrays/fshK3/src/reshape.jl:83 overwritten in module NCDatasets on the same line (check for duplicate calls to `include`).
ERROR: LoadError: Method overwriting is not permitted during Module precompile. |
Weird, Im not getting that locally or in CI. Will try to reproduce |
Ok, this was there for an ambiguity method. Will be fixed soon. Unfortunately we cant fix the ambiguity for NCDatasets.Variable in the macro because we cant know the type parameters of arbitrary types. Using the macro makes these things difficult to handle, really we shouldn't use them. |
The last fix is registered, hopefully that does it |
Thanks a lot! Everything works now fine too on my side. |
Amazing so excited to have the basics of this working now. |
Can we get a new version tagged so we can work on rafaqz/Rasters.jl#416? |
In my tests, unfortunately the slow I traced it down how julia handles function specificity. Surprisingly we get:
The fix is to define the following function:
Unfortunately, even avoiding batchgetindex NCDatasets 0.13.1 with DiskArrays
NCDatasets 0.12.17 without DiskArrays
Any ideas how we can improve would be appreciated :-) Commit: b52a71a |
This is an attempt to make the subtypes of
AbstractVariable
behave likeDiskArrays
. See JuliaGeo/CommonDataModel.jl#8.I'm not perfectly sure of what I'm doing, but simple cases already work:
But of course it doesn't always work:
I will try to solve this in the coming days, any help is of course welcomed, and as I said above, I'm unsure about the approach I'm following :)
cc @rafaqz @Alexander-Barth