-
Notifications
You must be signed in to change notification settings - Fork 830
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customizable Graph Simplification #625
Comments
Hi @rohanaras, thanks for the detailed proposal. Can you identify what specific changes will need to be made? It's a bit tough to pick out individual changes from all the code you've provided. They key thing I'm trying to figure out is if this is proposal can be small, narrowly targeted, and have clear re-use potential for many other folks.
Note that the |
Yeah sure—what I'm proposing essentially boils down to the following changes:
def _get_paths_to_simplify(G, strict=True, endpoint_func=_is_endpoint, **kwargs):
if endpoint_func == _is_endpoint:
kwargs['strict'] = strict def _get_paths_to_simplify(G, endpoint_func=_is_endpoint, **kwargs):
if endpoint_func == _is_endpoint and 'strict' not in kwargs:
kwargs['strict'] = True endpoints = set([n for n in G.nodes() if endpoint_func(G, n, **kwargs)])
def simplify_graph(G, endpoint_func=_is_endpoint, remove_rings=True, **kwargs):
if endpoint_func == _is_endpoint and 'strict' not in kwargs:
kwargs['strict'] = True for path in _get_paths_to_simplify(G, endpoint_func=endpoint_func, **kwargs):
|
Thanks @rohanaras. This is a bit niche for incorporating into the project so I'm going to close this for now. If this ends up being something that several other users also need, we can revisit it. |
Hi @gboeing, Thank you so much for this immensely useful library ! I believe that the feature proposed by @rohanaras is relevant and is close to what I wanted to submit on my side. I am working with others on bicycle networks using OSMnx, and we have observed that a lot of bicycle-related attributes can end abruptly. Because categorical attributes are merged when the network is simplified, it's hard to know where does the bicycle network start and end. Visually, we need a function that can do this: And while this is critical for bicycle attributes because of their sparsity, this can be true for most attributes. I heard other peers in NetSci/GIS needing the same feature for their own work, and for instance issue #846 shows in another way the broader need to keep some attributes or nodes. We created some time ago a working (but now deprecated) custom set of functions by tuning the OSMnx simplification (in our custom fork used in this article) but I feel that this might be a useful feature for everyone. An easy and general way to discriminate attributes while simplifying that I found is by adding another rule to the
If the selected attributes are different between the edges connected to a node, then the node is an endpoint. Then, by adding an additional optional argument attributes as |
@csebastiao I'm open to considering this, as it sounds like several people may have found it useful. Looking back at the original issue, @rohanaras had suggested accepting a user-defined Your proposal of adding a single argument @csebastiao let me confirm first that I understand the issue fully. Current OSMnx "strict" simplification is purely topological: it merges adjacent edges based on incident node degrees (or self-looping). And current OSMnx "non-strict" simplification incorporates a little bit of attribute information (osmid values) when determining whether or not to merge adjacent edges. The problem is that given how bike infrastructure is digitized on OSM, additional attributes are necessary to create an endpoint between two edges denoting where the bike infrastructure begins/ends. Otherwise, you end up with a long merged edge with bike infrastructure ending somewhere in the middle of it. Does this understanding correctly represent both your and @rohanaras's issues? |
Hi @gboeing, thanks for your fast reply ! That is correct. To give an example, if I extract the street network of Copenhagen (and Frederiksberg), I have without simplification 123241 nodes. With the strict simplification, I am left with 47758 nodes. With the non-strict simplification, I am left with 56804 nodes. With the simplification discriminating for an arbitrary attribute based on cycling OSM tags, I am left with 48275 nodes. As far as I have tested, the set of nodes of the last graph is a subset of the nodes that are in the simplified graph in non-strict mode, which make sense as edges on the same road have different osmids because they have different attributes, so the different cycling attributes are part of them. The non-strict mode is doing the discrimination of edges/finding endpoints with the osmid attribute, which is not the relevant attribute for our use case. We want to be able to do the same thing but for arbitrary attributes, so some kind of generalization of the non-strict mode. That way we can only keep edges and nodes that are relevant to us based on what we want to study, whatever that is. |
fully endorsed, that would be very useful. @martinfleis @jGaboardi @dabreegster @lukasballo @Robinlovelace @MicheleTirico tagging you here since we all have discussed this at some point in time... (see the graph plot by @csebastiao above for a quick grasp of what the idea here is - keeping interstititial nodes at graph simplification IF there is an edge attribute change) |
@anastassiavybornova Thanks for the ping! Yes, I agree that this feature would be immensely useful for both the stuff we discussed earlier and some other projects I am involved with. |
I'm reopening this issue. @csebastiao thanks for your further details. @anastassiavybornova @jGaboardi thanks for your input -- it sounds like this would be useful for many people. Let's brainstorm what implementation might look like. One idea I had was that we could generalize the # current definition
strict: bool = True
# proposed redefinition
strict: str | None = None The proposed redefinition would accept an edge attribute name as a string. If If a change like this to the parameter would resolve the issue, we could target the v2 branch. See #1106. |
I like the idea to generalize rule 4 ! The only difference in behavior is that in the initial proposal I made one could input a list of edge attributes, in case we want to discriminate on multiple attributes. The way I wrote it could actually only handle list-like type as an input, but that can easily be avoided. To see why it might make sense to discriminate on multiple attributes, let's take as an example a graph This is a straight road of 6 nodes and 5 edges. We give to edges 3 different attributes If one could simplify while discriminating on A and B for instance, we would keep nodes {1, 2, 3, 4, 6}. And any combination of 2 attributes will give a different set of nodes. Discriminating on all attributes is the same as discriminating using osmid. And a simple "merge" of some kind of two attributes is not the same as discriminating separately on both. Until now I my use cases I usually give as input a single attribute for simplification, one created with the simplification in mind. But it might be useful to be more general, especially since it would not be more complex to the user to be able to give as an input either a string with the name of the attribute or a list of strings. On a side note, one should be careful to not give attributes that will (almost) always be different, like "geometry" and "length". It might consume a lot of time doing computations for not simplifying a single node. I feel that some warning if a name known to have this behavior is given as an input might be nice. |
Thanks @csebastiao. Yes your reasoning for accepting a list of edge attribute names makes sense. I was previously thinking it would be simpler to just redefine the existing # proposed parameter
attrs: list[str] | None = None If So then the new generalized rule 4 could look like: # proposed new rule 4
if attrs is not None:
for attr in attrs:
in_values = {v for _, _, v in G.in_edges(node, data=attr, keys=False)}
out_values = {v for _, _, v in G.out_edges(node, data=attr, keys=False)}
if len(in_values | out_values) > 1:
return True And a MWE to test it: import osmnx as ox
def test_rule4(G, node, attrs):
if attrs is not None:
for attr in attrs:
in_values = {v for _, _, v in G.in_edges(node, data=attr, keys=False)}
out_values = {v for _, _, v in G.out_edges(node, data=attr, keys=False)}
if len(in_values | out_values) > 1:
return True
return False
# test the rule
G = ox.graph_from_place("Piedmont, CA, USA", network_type="drive", simplify=False)
print(test_rule4(G, node=53021742, attrs=None)) # False
print(test_rule4(G, node=53021742, attrs=["osmid"])) # True
print(test_rule4(G, node=53021742, attrs=["highway"])) # False
print(test_rule4(G, node=53021742, attrs=["name"])) # True
print(test_rule4(G, node=53021742, attrs=["name", "highway"])) # True Comments or suggestions? |
This new rule 4 is solving every issue I had @gboeing, I have nothing more to add ! |
Any final comments or feedback from anyone else here? @rohanaras @jGaboardi @anastassiavybornova ? |
Comment from me coming at this with little context: how to get this working? Minimal reproducible example would help, happy to test. |
Nothing from me. That new |
@martinfleis Anything from you? |
This is definitely a lot more elegant than what I had originally proposed! I believe it would work for my old use case. My one concern is that |
Thinking about this a little bit more—is there perhaps a better name for A good enough name could be more easily interpretable than the current Another thing to consider is This check might also deserve a rethink—should a user be allowed to further simply a graph? (Assuming that there are nodes that could be further simplified under the current "strict" regime) if "simplified" in G.graph and G.graph["simplified"]: # pragma: no cover
msg = "This graph has already been simplified, cannot simplify it again."
raise GraphSimplificationError(msg) |
Perhaps |
|
Proposed resolution in #1117. |
|
@rohanaras the simplification algorithm only works once. It would have to be wholly rethought to allow a second simplification to occur on a simplified graph, and I'm not sure that would even be useful: especially with #1117, users should be able to have relatively fine control over how simplification proceeds. Hence the check is really just a guard against inscrutable errors during re-simplification, to let the user know in a more straightforward way that it can only happen once. |
This enhancement has been released in v1.9. |
Note that #1145 additionally adds a |
Is your feature proposal related to a problem?
I've been working on a side project with pedestrian/sidewalk networks recently (and some bike, but mostly ped) and have found myself in situations where it would be useful to simply the network down from the unsimplified OSM network but the existing simplification function is far too strict. In particular, I would like to be able to leave nodes in places where the infrastructure in an area switches from dedicated to non-dedicated. By doing this, nodes capture all of the places where someone walking along the graph has to make a decision—not only to go left or right at an intersection, but also the decision to turn around because (oh no) the sidewalk ends or the trail doesn't have a marked crosswalk across a busy road.
Describe the solution you'd like to propose
I would like to be able to be able to pass a custom
_is_endpoint
function tosimplification.simplify_graph
(andsimplification._get_paths_to_simplify
as a result) as a parameter. The existingstrict
argument could be left as is or appended to a kwargs dictionary if the chosen function is the default.These changes might also require a new
G.graph["simplified"]
flag andsimplification._is_simplified
function, though I have not fully thought through how to reimplement these. My initial impression is that this line could be modified by replacingTrue
with the function name and the second condition in this line could simply be removed.Describe alternatives you've considered
I do have the functionality I want as it is, but it's required me to fully duplicate the
simplify_graph
and_get_paths_to_simplify
functions in their near entirety. My current setup looks like this:This level of duplication seems obviously quite silly to me though, and I'd much rather not have to. And to be clear, I'm only proposing changes to the latter two functions—not the inclusion of my
custom_is_endpoint
function.Additional context
As a quick and dirty demonstration of what this network structure allows me to do, I used the following code to plot the dedicated/undedicated pedestrian network for a Seattle area suburb:
Green indicates a sidewalk or marked crosswalk, grey indicates unmarked crosswalk or pedestrian legal street without sidewalks. Note numerous two way intersections where pedestrian infrastructure transitions.
The text was updated successfully, but these errors were encountered: