Spatial Index for improving performance #120

seamusdu · 2016-11-29T03:26:57Z

I am trying to use HiveContext within Spark to use this spatial framework and it does work. However, once I use a large dataset, it seems that the performance will decline dramatically. I am trying to count points within polygons. Hence, I wonder whether you have done any performance test, which can probably explain the performance of this framework. Also, have you ever considered creating a spatial index, which might improve the performance of spatial operations.

Thanks.

randallwhitman · 2016-11-29T16:55:58Z

Ideas and cross-references:
#28
http://stackoverflow.com/questions/38963487/how-to-optimize-scan-of-1-huge-file-table-in-hive-to-confirm-check-if-lat-long/
http://gis.stackexchange.com/questions/178732/geospatial-queries-and-indexes-in-memory/
http://thunderheadxpler.blogspot.com/2013/10/bigdata-spatial-joins.html
http://getindata.com/blog/post/geospatial-analytics-on-hadoop/
https://cwiki.apache.org/confluence/display/Hive/Spatial+queries
#82

randallwhitman · 2016-12-10T01:25:02Z

If your polygon dataset can fit into memory, build an in-memory quadtree index on the polygons using the Geometry API, by adapting for Spark the MapReduce sample in the GIS-Tools-for-Hadoop.

seamusdu · 2016-12-12T01:16:15Z

Hi @randallwhitman

Thanks for your reply. The sample using quadtree index does help and I will try to use the Geometry API for Spark.

stevebuckingham · 2017-04-19T15:37:26Z

@seamusdu How did you find the running the Spatial Framework on Spark in the end, it is an option I'm looking at at the moment?

randallwhitman · 2017-04-20T15:58:51Z

Cross-reference re Spark: #97 (works with JsonSerde as of v1.2)

harryprince · 2019-01-31T08:14:55Z

@seamusdu I am doing the same thing and wrapper spatial join query with index in geospark R package.

guillemfrancisco · 2019-09-23T15:54:00Z

Has anyone tried to make a benchmarking with number of points and time that took to process them? Or even a comparison between Hive and MapReduce(with spatial indexing)?

randallwhitman · 2019-09-23T16:44:22Z

@guillemfrancisco There is a little bit of info in comment under - https://stackoverflow.com/questions/38963487/how-to-optimize-scan-of-1-huge-file-table-in-hive-to-confirm-check-if-lat-long

randallwhitman added the enhancement label Nov 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spatial Index for improving performance #120

Spatial Index for improving performance #120

seamusdu commented Nov 29, 2016 •

edited

Loading

randallwhitman commented Nov 29, 2016

randallwhitman commented Dec 10, 2016

seamusdu commented Dec 12, 2016

stevebuckingham commented Apr 19, 2017

randallwhitman commented Apr 20, 2017 •

edited

Loading

harryprince commented Jan 31, 2019

guillemfrancisco commented Sep 23, 2019

randallwhitman commented Sep 23, 2019

Spatial Index for improving performance #120

Spatial Index for improving performance #120

Comments

seamusdu commented Nov 29, 2016 • edited Loading

randallwhitman commented Nov 29, 2016

randallwhitman commented Dec 10, 2016

seamusdu commented Dec 12, 2016

stevebuckingham commented Apr 19, 2017

randallwhitman commented Apr 20, 2017 • edited Loading

harryprince commented Jan 31, 2019

guillemfrancisco commented Sep 23, 2019

randallwhitman commented Sep 23, 2019

seamusdu commented Nov 29, 2016 •

edited

Loading

randallwhitman commented Apr 20, 2017 •

edited

Loading