-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize _remove_polygon_holes function for improved performance with complex geometries #1200
Comments
Thanks for this suggestion. Could you run some more comprehensive tests to 1) provide before/after timings, and 2) ensure identical geometric result across many feature geometries in several different places? For example you could just adapt the code from #1157 (comment) |
Thank you for considering this change. I've prepared a Colab notebook comparing the existing implementation with the suggested approach. The results show that:
This is illustrated in the lower half of the notebook, where we test locations containing complex lake polygons. |
It doesn't look like your code can properly handle the island within a hole use case. For example: outer1 = Polygon(((0,0), (4,0), (4,4), (0,4)))
inner1 = Polygon(((1,1), (2,1), (2,3), (1,3)))
inner2 = Polygon(((2,1), (3,1), (3,3), (2,3)))
outer2 = Polygon(((1.5,1.5), (2.5,1.5), (2.5,2.5), (1.5,2.5)))
outer_polygons = [outer1, outer2]
inner_polygons = [inner1, inner2]
result1 = ox.features._remove_polygon_holes(outer_polygons, inner_polygons)
result2 = optimized_remove_polygon_holes(outer_polygons, inner_polygons)
assert result1.equals(result2) # raises AssertionError That's why that inner loop is there: to check which of the inner polygons are contained by which of the outer polygons. There may be an optimization opportunity here, but the current proposed implementation misses this use case. |
It's possible that preparing the def optimized_remove_polygon_holes(
outer_polygons: list[Polygon],
inner_polygons: list[Polygon],
) -> Polygon | MultiPolygon:
if len(inner_polygons) == 0:
# if there are no holes to remove, geom is the union of outer polygons
geometry = unary_union(outer_polygons)
else:
# otherwise, remove from each outer poly each inner poly it contains
polygons_with_holes = []
for outer in outer_polygons:
prepare(outer)
holes = [inner for inner in inner_polygons if outer.contains(inner)]
polygons_with_holes.append(outer.difference(unary_union(holes)))
geometry = unary_union(polygons_with_holes)
# ensure returned geometry is a Polygon or MultiPolygon
if isinstance(geometry, (Polygon, MultiPolygon)):
return geometry
return Polygon() Something along those lines may be worth testing for parity of timings and results. EDIT: from my quick initial timing test, it looks like this is indeed much faster for your "lakes" example. |
One other minor performance gain (that will also avoid PerformanceWarnings) would be to change these lines to something like: gdf = (
gpd.GeoDataFrame(
data=_process_features(elements, set(tags.keys())),
geometry="geometry",
crs=settings.default_crs,
)
.set_index(idx)
.sort_index()
) |
Nice pickup, I believe you are correct, I had not realized this was a valid situation, sorry about that.
I like this solution, I have tested this locally and it appears to work well for my test locations (showing similar performance to my initial suggestion), while also producing the same output as the existing approach on all examples including your islands within a hole example. |
Would you like to take these code snippets and open a PR? |
Sure, I would be happy to, I've implemented the changes as discussed and opened a new pull request (#1205) titled "Optimize _remove_polygon_holes and _create_gdf for performance". The PR includes the following changes:
I've tested these changes locally, and they appear to work well for various test cases, including the complex "lakes" example and the island-within-a-hole scenario you provided. The optimizations maintain the same output as the existing approach while significantly improving performance for complex geometries. Regarding the mypy failure on osmnx/plot.py:856, I've noted in the PR description that this is a pre-existing issue unrelated to the changes in this PR. I used --no-verify to bypass the GitHub precommit hooks for this reason. This is my first PR, so I apologize if I've done anything incorrectly. |
Closed by #1205 |
Contributing guidelines
Documentation
Existing issues
What problem does your feature proposal solve?
The proposed feature addresses a performance issue in the 'features._remove_polygon_holes' function. When dealing with complex outer polygons and numerous inner polygons, the current implementation can be extremely slow, sometimes taking hours to complete. By replacing the inner loop with a single difference operation using a multi-polygon, we can significantly improve the function's efficiency.
What is your proposed solution?
The proposed change is to optimize the '_remove_polygon_holes' function in OSMnx by replacing the current nested loop approach with a more efficient single-pass operation. The key modifications are:
Implementation plan:
This change aims to significantly reduce processing time for complex polygons with many holes, while maintaining the function's current behavior and output.
What alternatives have you considered?
An alternative solution considered was to implement a spatial index (such as R-tree) for the inner polygons. This approach aimed to quickly identify which inner polygons potentially intersect with each outer polygon, theoretically reducing the number of containment checks required. However, upon testing, this method did not yield a noticeable performance improvement in practice. Additionally, implementing this solution would have introduced an extra dependency to the project.
Additional context
This example demonstrates how the current version of _remove_polygon_holes can be slow when processing complex polygons with many holes. By replacing it with the optimized version, which uses a single difference operation with a MultiPolygon, the performance is significantly improved.
Try running this it should take a very long time to complete, as it intersects a complex polygon.
Now if we patch in the proposed solution it is much faster
The text was updated successfully, but these errors were encountered: