Speed up bottleneck in save_graph_xml #645

parkertimmins · 2021-01-26T19:37:54Z

save_graph_xml has an O(edges^2) operation that can be O(edges) when writing edges to the xml tree.

The following two lines are O(n^2) in the number of edges. The loop over the edges is O(n) and the creation of the boolean filter is O(n), where n is number of edges. I don't fully understand when pandas is/isn't using an index, but it seems that in the second line, at a minimum the boolean filter creation must be O(n).

osmnx/osmnx/osm_xml.py

Lines 298 to 299 in c2f55a3

    
           for e in gdf_edges["id"].unique(): 
        
               all_way_edges = gdf_edges[gdf_edges["id"] == e]

Is your feature proposal related to a problem?
Yes, save_graph_xml take a long time for large networks.

Describe the solution you'd like to propose
Group the edges data frame by 'id' and iterate over the group dataframes. Replace above two lines with:
for _, all_way_edges in gdf_edges.groupby("id"):

Describe alternatives you've considered
No alternatives considered.

Additional context
This provides a large speedup. Ran two tests to benchmark the change.

Code

ox.config(log_console=True, all_oneway=True)
osm_xml = ox.graph.graph_from_xml('input.osm', simplify=True)
ox.save_graph_xml(osm_xml, filepath='output.osm')

Dataset 1 (input is macedonia from http://download.geofabrik.de/europe.html)

145028 nodes and 222518 edges after simplification
Time w/ unique/boolean filter: 38m56.033s
Time w/ groupby: 7m27.074s

Dataset 2

294880 nodes and 462555 edges
Time w/ unique/boolean filter: >160m (ran this before planning to make change, so didn't get exact timing)
Time w/ groupby: 11m26.315s

The text was updated successfully, but these errors were encountered:

gboeing · 2021-01-27T01:26:59Z

@parkertimmins thanks, this looks like a great optimization. Would you like to open a PR for review? I might try to get @mxndrwgrdnr to take a look at the PR too, as he developed this original functionality and may have some clever thoughts.

parkertimmins · 2021-01-27T01:31:32Z

@gboeing Great, yep, I'll open up a PR.
Osmnx is a really cool library by the way!

gboeing · 2024-03-02T16:02:50Z

See further performance enhancements in #1135

parkertimmins added the enhancement label Jan 26, 2021

parkertimmins changed the title ~~Speed up bottleneck in save_osm_xml~~ Speed up bottleneck in save_graph_xml Jan 26, 2021

parkertimmins mentioned this issue Jan 27, 2021

Replace slow pandas operation with faster groupby #646

Merged

gboeing closed this as completed in #646 Jan 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up bottleneck in save_graph_xml #645

Speed up bottleneck in save_graph_xml #645

parkertimmins commented Jan 26, 2021 •

edited

Loading

gboeing commented Jan 27, 2021

parkertimmins commented Jan 27, 2021

gboeing commented Mar 2, 2024

Speed up bottleneck in save_graph_xml #645

Speed up bottleneck in save_graph_xml #645

Comments

parkertimmins commented Jan 26, 2021 • edited Loading

gboeing commented Jan 27, 2021

parkertimmins commented Jan 27, 2021

gboeing commented Mar 2, 2024

parkertimmins commented Jan 26, 2021 •

edited

Loading