-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spatial Index for improving performance #120
Comments
If your polygon dataset can fit into memory, build an in-memory quadtree index on the polygons using the Geometry API, by adapting for Spark the MapReduce sample in the GIS-Tools-for-Hadoop. |
Thanks for your reply. The sample using quadtree index does help and I will try to use the Geometry API for Spark. |
@seamusdu How did you find the running the Spatial Framework on Spark in the end, it is an option I'm looking at at the moment? |
Cross-reference re Spark: #97 (works with JsonSerde as of v1.2) |
@seamusdu I am doing the same thing and wrapper spatial join query with index in |
Has anyone tried to make a benchmarking with number of points and time that took to process them? Or even a comparison between Hive and MapReduce(with spatial indexing)? |
@guillemfrancisco There is a little bit of info in comment under - https://stackoverflow.com/questions/38963487/how-to-optimize-scan-of-1-huge-file-table-in-hive-to-confirm-check-if-lat-long |
I am trying to use HiveContext within Spark to use this spatial framework and it does work. However, once I use a large dataset, it seems that the performance will decline dramatically. I am trying to count points within polygons. Hence, I wonder whether you have done any performance test, which can probably explain the performance of this framework. Also, have you ever considered creating a spatial index, which might improve the performance of spatial operations.
Thanks.
The text was updated successfully, but these errors were encountered: