Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not unnecessarily return a union in extract #790

Merged
merged 6 commits into from
Oct 12, 2024

Conversation

tiemvanderdeure
Copy link
Contributor

when extracting with skipmissing=false, the geometry type will currently always be a Union{Missing, ...}, even if none of the provided geometries were missing.

This PR makes it so that the type of the geometries in input and output will match.

I had to change _rowtype to do this, and changed it a little more to make it easier to reuse (which I want to do for Rasters.sample).

I also added some missing text to the docstring.

@tiemvanderdeure
Copy link
Contributor Author

Not 100% sure if this is a bugfix or a slightly breaking change, though.

src/methods/extract.jl Outdated Show resolved Hide resolved
src/methods/extract.jl Outdated Show resolved Hide resolved
tiemvanderdeure and others added 3 commits October 11, 2024 16:13
Co-authored-by: Rafael Schouten <[email protected]>
Co-authored-by: Rafael Schouten <[email protected]>
@rafaqz rafaqz mentioned this pull request Oct 11, 2024
@tiemvanderdeure
Copy link
Contributor Author

Just copied the _rowtype changes to this branch so we can figure out type stability in this PR and how Rasters.sample should work in the other.

@tiemvanderdeure
Copy link
Contributor Author

Just to explain this latest optimization (and I don't know if this is the optimal way to go about it). Before this PR we would convert whatever we extracted from the Raster/RasterStack in a for loop, inside _maybe_add_fields. This generates an allocation for every single point that is extracted.

After this PR the point is first extracted and stored in a Vector with the same type (as if skipmissing = false), then this whole vector is converted to the skipmissing type with a broadcast. This generates one single allocation.

MWE:

dimz = (X(9.0:1.0:10.0), Y(0.1:0.1:0.2))
rast = Raster(Union{Int,Missing}[1 2; 3 4], dimz; name=:test, missingval=missing)
mypoints = [missing, (9.0, 0.1), (9.0, 0.2), (10.0, 0.3), (10.0, 0.2)]
mymanypoints = repeat(mypoints, 500_000)
@profview extract(rast, mymanypoints, skipmissing = true)
@time extract(rast, mymanypoints, skipmissing = true);

Before:
billede
0.268031 seconds (3.00 M allocations: 148.774 MiB, 39.32% gc time)

After:
billede
0.156913 seconds (13 allocations: 129.700 MiB, 7.20% gc time)

@rafaqz
Copy link
Owner

rafaqz commented Oct 12, 2024

3M allocations to 13 is pretty nice!

@rafaqz rafaqz merged commit 8cf4942 into rafaqz:main Oct 12, 2024
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants