Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve OSM XML file saving #1135

Merged
merged 38 commits into from
Mar 2, 2024
Merged

Improve OSM XML file saving #1135

merged 38 commits into from
Mar 2, 2024

Conversation

gboeing
Copy link
Owner

@gboeing gboeing commented Feb 21, 2024

Targeting the forthcoming OSMnx v2.0.0 release (see also #1106), this PR wholly refactors the _osm_xml module and fixes several bugs, significantly speeds up the save_graph_xml function, and streamlines the function's signature.

The PR does the following:

  • Speed up the save_graph_xml function runtime by a factor of 5x or so, depending on graph size.
  • Fix a bug documented in save_graph_xml enumerates way IDs starting at 0 instead of using the actual OSM ID? #876 where true way OSM IDs are not being saved.
  • Fix a bug where edge merging grouped edges into ways on the "id" column, but this column was not the way OSM ID, so each group just contained a single edge.
  • Fix a bug where the old topological sort only worked when the entire way was a single cycle (such as is the case with a roundabout way), but could not sort other ways merely containing a cycle (or multiple cycles, such as here and here) within them. See also save_graph_xml fails if roundabout present #970.
  • Fix a bug by adding the missing "bounds" sub-element to the XML file and adding a "visible" attribute to each node and way sub-element (see this example).
  • Remove the node_tags, node_attrs, edge_tags, and edge_attrs parameters from the save_graph_xml function in favor of using the settings module directly.
  • Remove the oneway, api_version, and precision parameters from the save_graph_xml function in favor of hard-coded sensible defaults. See also Improvements to save_as_xml function #917 and improvements to osm_xml module #961
  • Remove the merge_edges parameter from the save_graph_xml function and instead always operate as though this were True.
  • Add an encoding parameter to the save_graph_xml function to configure file encoding and make it consistent with the other file saving functions
  • Change the save_graph_xml function's data parameter to G and its argument type to nx.MultiDiGraph instead of nx.MultiDiGraph | tuple[gpd.GeoDataFrame, gpd.GeoDataFrame (that is, the function no longer accepts a tuple of GeoDataFrames: you can only pass a MultiDiGraph).
  • Change the save_graph_xml function's edge_tag_aggs parameter to way_tag_aggs and its argument type to dict instead of tuple.
  • Require that the graph be unsimplified when saving as OSM XML, because ways have a one-to-many cardinality with edges but graph simplification makes the cardinality many-to-many instead, which breaks the grouping of edges into ways.
  • Remove the settings module's osm_xml_node_attrs, osm_xml_node_tags, osm_xml_way_attrs, and osm_xml_way_tags settings because they are either unnecessary or redundant (see Improve OSM XML file saving #1135 (comment))
  • Create an XSD file to validate the XML output against (see schema).

Deprecate these now-removed function params and settings (with FutureWarning) for v1 in #1138.

@gboeing
Copy link
Owner Author

gboeing commented Feb 21, 2024

As a tangential aside, it would be nice someday to replace all the OSMnx XML read/write functionality with PBF read/write functionality instead and outsource it to a dedicated third-party library. See also pyrosm/pyrosm#228.

@gboeing gboeing mentioned this pull request Feb 21, 2024
13 tasks
Copy link

codecov bot commented Feb 21, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.01%. Comparing base (6f89cd6) to head (9cbbf68).

Additional details and impacted files
@@            Coverage Diff             @@
##               v2    #1135      +/-   ##
==========================================
+ Coverage   97.93%   98.01%   +0.07%     
==========================================
  Files          25       25              
  Lines        2427     2422       -5     
==========================================
- Hits         2377     2374       -3     
+ Misses         50       48       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stl-maxgardner
Copy link

@gboeing I think its safe (and advisable) to hardcode oneway=False, and to operate as if merge_edges=True. Both settings were parameterized to maximize flexibility but they're probably not necessary. You might consider just defining a ONEWAY = False constant at the top of the module and using that within _save_graph_xml() rather than hardcoding the value into the method, just to make the assumption more explicit.

@gboeing
Copy link
Owner Author

gboeing commented Feb 21, 2024

Thanks @stl-maxgardner . Did this change above also make sense:

Require that the graph be unsimplified when saving with merge_edges=True, because ways have a one-to-many cardinality with edges but graph simplification makes the cardinality many-to-many instead, which breaks the grouping of edges into ways.

If we make the function always operate as though merge_edges=True, users moving forward will be unable to save a simplified graph as XML. But I believe this would be appropriate given the nature of the data.

@stl-maxgardner
Copy link

Yes I think so!

@gboeing
Copy link
Owner Author

gboeing commented Feb 22, 2024

@stl-maxgardner one more question for you.

In 54170cd I proposed that we remove the settings module's osm_xml_node_attrs, osm_xml_node_tags, osm_xml_way_attrs, and osm_xml_way_tags settings. The reasoning is as follows...

The osm_xml_node_attrs and osm_xml_way_attrs settings are unnecessary because the node and way XML elements should have a consistent set of attributes as defined in the OSM XML spec. That is, the same attributes are needed every time so this shouldn't be configurable. These attributes are "changeset", "timestamp", "uid", "user", "version", "visible" and "id" for both nodes/ways, plus "lat" and "lon" for nodes.

The osm_xml_node_tags and osm_xml_way_tags settings are unnecessary because they are redundant with the settings modules useful_tags_node and useful_tags_way settings. That is, the latter are used to retain desired tags from the raw Overpass data and lets the user specify what data the graph nodes and edges should have. OSMnx should just save to XML the graph that the user has created, rather than offering another set of settings for changing what gets saved. If the user wants to only save a subset of these tags, they can just change the useful_tags_node and useful_tags_way settings, or just edit their graph with 1-2 lines of code prior to calling the save_graph_xml function.

Does that all make sense and seem reasonable to you?

@gboeing
Copy link
Owner Author

gboeing commented Feb 26, 2024

Some timings comparing this xml branch to the main branch with a small graph and a large graph... Given a code snippet like:

import osmnx as ox
ox.settings.all_oneway = True
G = ox.graph.graph_from_place(place, network_type="drive", simplify=False)

%%timeit
ox.io.save_graph_xml(G, filepath="./test.xml")

The %%timeit magic gives the following timings:

# place = "Piedmont, CA, USA"
# main branch: 1100 ms ± 4.94 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# xml branch:   167 ms ± 411 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# ~6.6x speed-up

# place = "Los Angeles, CA, USA"
# main branch: 113.0 s ± 790 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# xml branch:   26.2 s ± 82.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# ~4.3x speed-up

@gboeing gboeing changed the title [WIP] Improve OSM XML file saving Improve OSM XML file saving Feb 26, 2024
@gboeing
Copy link
Owner Author

gboeing commented Feb 26, 2024

An XSD schema file was added in 06edb3d and successfully validates a variety of test OSM XML files saved by OSMnx, exported from Overpass turbo, and exported from OpenStreetMap directly.

@gboeing
Copy link
Owner Author

gboeing commented Feb 26, 2024

@mxndrwgrdnr and @davidmurray you have both contributed to OSMnx's XML-saving functionality in the past. Would you be willing to review this PR before it's merged later this week?

Short story: the save_graph_xml function is much faster now, its signature is greatly streamlined, several bugs are fixed, and the saved XML is now validated against an XSD schema file in the test suite. I appreciate your help!

@gboeing
Copy link
Owner Author

gboeing commented Mar 2, 2024

Added quality-of-life improvements, warning, and additional tests for saving projected/consolidated graphs as OSM XML in ca9744b.

@gboeing
Copy link
Owner Author

gboeing commented May 3, 2024

The first pre-release OSMnx v2 beta has been released. Testers needed! See #1123 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants