-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rasterize issues #529
Comments
Ahh yes the problem is Try I guess we should not use threads for (sum should beat gdal easily or there is somethong else wrong) |
switching to Rasters.rasterize! takes 324.580515 seconds (1.60 M allocations: 336.273 GiB, 10.41% gc time, 1.50% compilation time) |
Any idea why |
There is something very wrong to still be 10x slower on No idea about reproject, you dont include a stack trace. Why not read the code matching your stack trace and see ;) Also side note Im not sure what |
Finally at a computer. Your raster is huuge!!!
Do you mean for it to be ? If so great, nice test case for performance of huge things. I haven't tried rasterizing into anything within 2 orders of magnitude of this size. I am guessing some performance heuristics will need to change for it to be fast. reprojectFor reprojecting the shape file you are trying to use glacier_rep = GeometryOps.reproject(glacier; target_crs=EPSG(3857)) Unfortunately Proj.jl also has a bug where it wont accept shapefiles rasterizeFirst, try using You may also want to use But after that, we are just using too much RAM. But this is fun, lets make it beat GDAL by 10x even at this scale. |
And finally, this already seems really fast for me at a quarter of your raster size: @time Rasters.rasterize!(count, A, glacier_rep; shape=:polygon) The full raster size kills my julia session 😅
|
Ok, problem solved!! its threading. Each thread needs a raster to write to so at this scale that means using all your RAM. Its faster to just not thread at all. @time Rasters.rasterize!(count, A, glacier_rep; threaded=false, shape=:polygon) For me this is >6x faster than GDAL. (which is kinda suspicious actually) We could look at memory and try to see how many threads we can safely use and maybe make this faster again. There is also a worrying warning that the polygons are not actually affecting pixels. Probably they are too small... you may need
|
Wow.. this is lightning fast now (9s vs 28s but a bit of apples to oranges) ... but you're right none of the plygons are intersecting which is strange cuz they do... The polygons are in lat lon projection and the grid is in UTM 09N ... I thought from reading the docs that this could be handled internally but maybe it's necessary to reproject the shapefile polygons prior to rasterizing? I see above that you reprojected to 3857 when we need to reproject to 32609... I'll try that now. |
Maybe an informative error message could be added. |
It doesn't handle crs conversion automatically - often there is no crs attached to polygons anyway. But we should in future when its possible and GeometryOps.jl is actually registered and useable. Its just slow getting all of these little checks and conversions working and perfect... And yeah, thats the kind of performance you should expect... 6x GDAL without threading. Normally with smaller files and threading its 30x faster |
Just wanted to point out that GDAL will just return zeros with no warning, but if you read the warnings Rasters checks that every polygon actually changes the raster and gives you a warning and a vector of misses in the raster metadata |
@rafaqz I got it to work with: epsg = GeoFormatTypes.EPSG("EPSG:32609")
glacier_rep = GeometryOps.reproject(glacier; target_crs=epsg); and foo = Rasters.rasterize!(count, A, glacier_rep.geom; threaded=false, shape=:polygon) sum(.!(foo.data .== 0))
23343762 Thanks a ton for all of the help and patches I noted that:
|
|
@rafaqz would it be more intuitive to have the user first convert the Shapefile.jl objects to GeoInterface then reproject... this would indicate to the user that to manipulate Shapefile.jl objects the first step is to convert it to a GeoInterface and abandon the Shapefile.jl object? |
Maybe... I've actually been making all the packages accept anything!! And just return their own types. LibGEOS.jl does this too. To me needing a conversion step everywhere is a bit annoying to write, and takes more RAM - the current method only makes the destination copy, there is no intermediate. If you use GeoInterface.jl geometry types this stuff barely matters - they can wrap any other geometry objects, even combining objects from different packages works. |
Ya, makes sense. |
I'm having issues rasterizing a shapefile using Rasters. The .shp file contains 18855 polygons. Part of the issue is out of memory errors and part of the issue is reprojection. Here's what I did:
Get and Unzip Data
the shapefiles can be downloaded from NSIDC: https://nsidc.org/data/nsidc-0770/versions/7
but you will require an Earth Data Login
for convenience I've temporarily put the files here
Load polygons
define a Raster to which the shapefile will be rasterized
rasterize polygons
This leads to an out of memory error
try a divide and concur approach to overcome memory limit
first let's reproject the polygons to the desired projection... once we do that then we loop over subsets of the full domain and rasterize
reutrns: ERROR: StackOverflowError:
Alternatively we can us GDAL
at the command line this takes about 10s
The text was updated successfully, but these errors were encountered: