Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up bottleneck in save_graph_xml #645

Closed
parkertimmins opened this issue Jan 26, 2021 · 3 comments · Fixed by #646
Closed

Speed up bottleneck in save_graph_xml #645

parkertimmins opened this issue Jan 26, 2021 · 3 comments · Fixed by #646

Comments

@parkertimmins
Copy link
Contributor

parkertimmins commented Jan 26, 2021

save_graph_xml has an O(edges^2) operation that can be O(edges) when writing edges to the xml tree.

The following two lines are O(n^2) in the number of edges. The loop over the edges is O(n) and the creation of the boolean filter is O(n), where n is number of edges. I don't fully understand when pandas is/isn't using an index, but it seems that in the second line, at a minimum the boolean filter creation must be O(n).

osmnx/osmnx/osm_xml.py

Lines 298 to 299 in c2f55a3

for e in gdf_edges["id"].unique():
all_way_edges = gdf_edges[gdf_edges["id"] == e]

Is your feature proposal related to a problem?
Yes, save_graph_xml take a long time for large networks.

Describe the solution you'd like to propose
Group the edges data frame by 'id' and iterate over the group dataframes. Replace above two lines with:
for _, all_way_edges in gdf_edges.groupby("id"):

Describe alternatives you've considered
No alternatives considered.

Additional context
This provides a large speedup. Ran two tests to benchmark the change.

Code

ox.config(log_console=True, all_oneway=True)
osm_xml = ox.graph.graph_from_xml('input.osm', simplify=True)
ox.save_graph_xml(osm_xml, filepath='output.osm')

Dataset 1 (input is macedonia from http://download.geofabrik.de/europe.html)

  • 145028 nodes and 222518 edges after simplification
  • Time w/ unique/boolean filter: 38m56.033s
  • Time w/ groupby: 7m27.074s

Dataset 2

  • 294880 nodes and 462555 edges
  • Time w/ unique/boolean filter: >160m (ran this before planning to make change, so didn't get exact timing)
  • Time w/ groupby: 11m26.315s
@parkertimmins parkertimmins changed the title Speed up bottleneck in save_osm_xml Speed up bottleneck in save_graph_xml Jan 26, 2021
@gboeing
Copy link
Owner

gboeing commented Jan 27, 2021

@parkertimmins thanks, this looks like a great optimization. Would you like to open a PR for review? I might try to get @mxndrwgrdnr to take a look at the PR too, as he developed this original functionality and may have some clever thoughts.

@parkertimmins
Copy link
Contributor Author

@gboeing Great, yep, I'll open up a PR.
Osmnx is a really cool library by the way!

@gboeing
Copy link
Owner

gboeing commented Mar 2, 2024

See further performance enhancements in #1135

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants