`robot diff` always shows all axioms have changed for blank nodes #1243

paulmillar · 2025-02-08T09:31:22Z

I'd like to use robot diff to get a summary of the changes in a GitHub pull request, or during development.

For the PR, (in essence) the script run robot on the input files from HEAD and HEAD^ to generate two outputs, and then run robot diff on these two output.

Unfortunately, there are blank nodes, which are assigned random IRIs by the robot diff command. These blank node IRIs are different between the two versions, leading to a large number of "false positives", where robot diff has identified changed assertions that don't reflect changes in the input.

This looks a lot like #1032.

As a primitive work-around, I can filter out these generated IDs using grep; e.g.,

$ robot diff --left ontology_old.ttl --right ontology_new.ttl --labels true \
    | egrep -v '_:genid[0-9]{10}'

However, using grep results in confusing output: the robot diff command lists the number of axioms that are present in one side that are missing from the other:

64 axioms in right ontology but not in left ontology:

Unfortunately, this axiom count doesn't match the number of axioms that are listed.

Also, filtering the output would result in the output missing any real changes involving a blank node.

Somewhat ironically, I'm actually getter better results from converting the ontology to ttl and using the diff command.

The text was updated successfully, but these errors were encountered:

balhoff · 2025-02-08T13:19:14Z

@paulmillar can you provide sample input that shows the issue?

paulmillar · 2025-02-08T20:46:04Z

@balhoff Thanks for the quick reply.

Here's a simple example that illustrates the problem:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<https://metadata.example.org/2025/test>
  a owl:Ontology;
  dcterms:creator [
    foaf:name "Fred Bloggs";
  ];
  rdfs:comment "An example of a problem.";
  .

Here's an example of robot diff invoked with this input as both the left and right:

paul@monkeywrench:~$ robot diff --left test.ttl --right test.ttl 
2 axioms in left ontology but not in right ontology:
- Annotation(<http://purl.org/dc/terms/creator> _:genid2147483648)
- AnnotationAssertion(<http://xmlns.com/foaf/0.1/name> _:genid2147483648 "Fred Bloggs")

2 axioms in right ontology but not in left ontology:
+ Annotation(<http://purl.org/dc/terms/creator> _:genid2147483649)
+ AnnotationAssertion(<http://xmlns.com/foaf/0.1/name> _:genid2147483649 "Fred Bloggs")
paul@monkeywrench:~$

I realise that I forgot to give the version of robot I'm using. It's robot v1.8.1:

paul@monkeywrench:~$ robot --version
ROBOT version 1.8.1
paul@monkeywrench:~$

v1.8.1 is pretty old, so I downloaded the latest version, which is currently v1.9.7. I was able to reproduced the problem with that version:

paul@monkeywrench:~$ java -jar ~/Downloads/robot.jar --version
ROBOT version 1.9.7
paul@monkeywrench:~$ java -jar ~/Downloads/robot.jar diff --left test.ttl --right test.ttl 
2 axioms in left ontology but not in right ontology:
- Annotation(<http://purl.org/dc/terms/creator> _:genid2147483648)
- AnnotationAssertion(<http://xmlns.com/foaf/0.1/name> _:genid2147483648 "Fred Bloggs"^^xsd:string)

2 axioms in right ontology but not in left ontology:
+ Annotation(<http://purl.org/dc/terms/creator> _:genid2147483649)
+ AnnotationAssertion(<http://xmlns.com/foaf/0.1/name> _:genid2147483649 "Fred Bloggs"^^xsd:string)
paul@monkeywrench:~$

balhoff · 2025-02-10T17:09:34Z

Thanks @paulmillar, the example is helpful since I wanted to make sure you were dealing with an anonymous individual, and not blank nodes related to the RDF representation of class expressions. I think this is basically the same as #1032. As @jamesaoverton noted there I guess it turns into a graph isomorphism problem, which is tricky. We could add an option to simply exclude axioms involving anonymous individuals from diffs, which isn't very satisfactory, or else try to come up with something more clever.

Here is how OWLAPI parses that ontology:

Prefix(:=<https://metadata.example.org/2025/test#>)
Prefix(owl:=<http://www.w3.org/2002/07/owl#>)
Prefix(rdf:=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
Prefix(xml:=<http://www.w3.org/XML/1998/namespace>)
Prefix(xsd:=<http://www.w3.org/2001/XMLSchema#>)
Prefix(foaf:=<http://xmlns.com/foaf/0.1/>)
Prefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)
Prefix(dcterms:=<http://purl.org/dc/terms/>)


Ontology(<https://metadata.example.org/2025/test>
Annotation(dcterms:creator _:genid2147483648)
Annotation(rdfs:comment "An example of a problem.")

Declaration(AnnotationProperty(dcterms:creator))
Declaration(AnnotationProperty(foaf:name))


AnnotationAssertion(foaf:name _:genid2147483648 "Fred Bloggs")
)

paulmillar · 2025-02-11T21:50:39Z

Hi @balhoff ,

As you might have already guess, the example was made-up, something simple that demonstrates the problem. For reference, the real file is here:

https://github.com/ExPaNDS-eu/ExPaNDS-experimental-techniques-ontology/blob/master/source/PaNET_metadata.ttl

While thinking about this, one (perhaps obvious) idea was to try to make the algorithm for generating the blank nodes' IRIs more deterministic. If the IRI for a blank node were the same (across some change to the ontology that leaves the blank node unmodified) then robot diff wouldn't show any changes.

One possible way to be more deterministic might be to take all predicate-object pairs for axioms with the blank node as the subject, sort them, hash the result and use this hash to generate the blank node's IRI.

Naturally, if a blank node were to have an axiom with a blank node as the object then generating the "parent" blank node's IRI would need to be deferred until the "child" blank node's IRI was generated. Since such blank nodes can't be referenced, they should form a simple graph, and the IRI generation should work following a simple depth-first algorithm.

There is the possibility of collisions: two blank nodes with the same set of axioms. This would need to be checked and accounted for (e.g., breaking the symmetry using document order, and adding a counter as a suffix to the IRI). That said, I guess most blank nodes will contain a unique set of axioms, so this outcome is unlikely.

While not perfect, it would be relatively simple and (I think) it would allow robot diff to refrain from showing blank node axioms if they haven't changed.

That said, the output if a blank node has changed would be rather sub-optimal: the derived IRI would change, so robot diff would show all axioms had changed, but I think this approach would still be an improvement over the current behaviour.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`robot diff` always shows all axioms have changed for blank nodes #1243

`robot diff` always shows all axioms have changed for blank nodes #1243

paulmillar commented Feb 8, 2025

balhoff commented Feb 8, 2025

paulmillar commented Feb 8, 2025

balhoff commented Feb 10, 2025

paulmillar commented Feb 11, 2025 •

edited

Loading

robot diff always shows all axioms have changed for blank nodes #1243

robot diff always shows all axioms have changed for blank nodes #1243

Comments

paulmillar commented Feb 8, 2025

balhoff commented Feb 8, 2025

paulmillar commented Feb 8, 2025

balhoff commented Feb 10, 2025

paulmillar commented Feb 11, 2025 • edited Loading

`robot diff` always shows all axioms have changed for blank nodes #1243

`robot diff` always shows all axioms have changed for blank nodes #1243

paulmillar commented Feb 11, 2025 •

edited

Loading