You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
save_graph_xml has an O(edges^2) operation that can be O(edges) when writing edges to the xml tree.
The following two lines are O(n^2) in the number of edges. The loop over the edges is O(n) and the creation of the boolean filter is O(n), where n is number of edges. I don't fully understand when pandas is/isn't using an index, but it seems that in the second line, at a minimum the boolean filter creation must be O(n).
Is your feature proposal related to a problem?
Yes, save_graph_xml take a long time for large networks.
Describe the solution you'd like to propose
Group the edges data frame by 'id' and iterate over the group dataframes. Replace above two lines with: for _, all_way_edges in gdf_edges.groupby("id"):
Describe alternatives you've considered
No alternatives considered.
Additional context
This provides a large speedup. Ran two tests to benchmark the change.
@parkertimmins thanks, this looks like a great optimization. Would you like to open a PR for review? I might try to get @mxndrwgrdnr to take a look at the PR too, as he developed this original functionality and may have some clever thoughts.
save_graph_xml has an O(edges^2) operation that can be O(edges) when writing edges to the xml tree.
The following two lines are O(n^2) in the number of edges. The loop over the edges is O(n) and the creation of the boolean filter is O(n), where n is number of edges. I don't fully understand when pandas is/isn't using an index, but it seems that in the second line, at a minimum the boolean filter creation must be O(n).
osmnx/osmnx/osm_xml.py
Lines 298 to 299 in c2f55a3
Is your feature proposal related to a problem?
Yes, save_graph_xml take a long time for large networks.
Describe the solution you'd like to propose
Group the edges data frame by 'id' and iterate over the group dataframes. Replace above two lines with:
for _, all_way_edges in gdf_edges.groupby("id"):
Describe alternatives you've considered
No alternatives considered.
Additional context
This provides a large speedup. Ran two tests to benchmark the change.
Code
Dataset 1 (input is macedonia from http://download.geofabrik.de/europe.html)
Dataset 2
The text was updated successfully, but these errors were encountered: