Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robot diff always shows all axioms have changed for blank nodes #1243

Open
paulmillar opened this issue Feb 8, 2025 · 4 comments
Open

robot diff always shows all axioms have changed for blank nodes #1243

paulmillar opened this issue Feb 8, 2025 · 4 comments

Comments

@paulmillar
Copy link

I'd like to use robot diff to get a summary of the changes in a GitHub pull request, or during development.

For the PR, (in essence) the script run robot on the input files from HEAD and HEAD^ to generate two outputs, and then run robot diff on these two output.

Unfortunately, there are blank nodes, which are assigned random IRIs by the robot diff command. These blank node IRIs are different between the two versions, leading to a large number of "false positives", where robot diff has identified changed assertions that don't reflect changes in the input.

This looks a lot like #1032.

As a primitive work-around, I can filter out these generated IDs using grep; e.g.,

$ robot diff --left ontology_old.ttl --right ontology_new.ttl --labels true \
    | egrep -v '_:genid[0-9]{10}'

However, using grep results in confusing output: the robot diff command lists the number of axioms that are present in one side that are missing from the other:

64 axioms in right ontology but not in left ontology:

Unfortunately, this axiom count doesn't match the number of axioms that are listed.

Also, filtering the output would result in the output missing any real changes involving a blank node.

Somewhat ironically, I'm actually getter better results from converting the ontology to ttl and using the diff command.

@balhoff
Copy link
Contributor

balhoff commented Feb 8, 2025

@paulmillar can you provide sample input that shows the issue?

@paulmillar
Copy link
Author

@balhoff Thanks for the quick reply.

Here's a simple example that illustrates the problem:

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<https://metadata.example.org/2025/test>
  a owl:Ontology;
  dcterms:creator [
    foaf:name "Fred Bloggs";
  ];
  rdfs:comment "An example of a problem.";
  .

Here's an example of robot diff invoked with this input as both the left and right:

paul@monkeywrench:~$ robot diff --left test.ttl --right test.ttl 
2 axioms in left ontology but not in right ontology:
- Annotation(<http://purl.org/dc/terms/creator> _:genid2147483648)
- AnnotationAssertion(<http://xmlns.com/foaf/0.1/name> _:genid2147483648 "Fred Bloggs")

2 axioms in right ontology but not in left ontology:
+ Annotation(<http://purl.org/dc/terms/creator> _:genid2147483649)
+ AnnotationAssertion(<http://xmlns.com/foaf/0.1/name> _:genid2147483649 "Fred Bloggs")
paul@monkeywrench:~$ 

I realise that I forgot to give the version of robot I'm using. It's robot v1.8.1:

paul@monkeywrench:~$ robot --version
ROBOT version 1.8.1
paul@monkeywrench:~$ 

v1.8.1 is pretty old, so I downloaded the latest version, which is currently v1.9.7. I was able to reproduced the problem with that version:

paul@monkeywrench:~$ java -jar ~/Downloads/robot.jar --version
ROBOT version 1.9.7
paul@monkeywrench:~$ java -jar ~/Downloads/robot.jar diff --left test.ttl --right test.ttl 
2 axioms in left ontology but not in right ontology:
- Annotation(<http://purl.org/dc/terms/creator> _:genid2147483648)
- AnnotationAssertion(<http://xmlns.com/foaf/0.1/name> _:genid2147483648 "Fred Bloggs"^^xsd:string)

2 axioms in right ontology but not in left ontology:
+ Annotation(<http://purl.org/dc/terms/creator> _:genid2147483649)
+ AnnotationAssertion(<http://xmlns.com/foaf/0.1/name> _:genid2147483649 "Fred Bloggs"^^xsd:string)
paul@monkeywrench:~$ 

@balhoff
Copy link
Contributor

balhoff commented Feb 10, 2025

Thanks @paulmillar, the example is helpful since I wanted to make sure you were dealing with an anonymous individual, and not blank nodes related to the RDF representation of class expressions. I think this is basically the same as #1032. As @jamesaoverton noted there I guess it turns into a graph isomorphism problem, which is tricky. We could add an option to simply exclude axioms involving anonymous individuals from diffs, which isn't very satisfactory, or else try to come up with something more clever.

Here is how OWLAPI parses that ontology:

Prefix(:=<https://metadata.example.org/2025/test#>)
Prefix(owl:=<http://www.w3.org/2002/07/owl#>)
Prefix(rdf:=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
Prefix(xml:=<http://www.w3.org/XML/1998/namespace>)
Prefix(xsd:=<http://www.w3.org/2001/XMLSchema#>)
Prefix(foaf:=<http://xmlns.com/foaf/0.1/>)
Prefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)
Prefix(dcterms:=<http://purl.org/dc/terms/>)


Ontology(<https://metadata.example.org/2025/test>
Annotation(dcterms:creator _:genid2147483648)
Annotation(rdfs:comment "An example of a problem.")

Declaration(AnnotationProperty(dcterms:creator))
Declaration(AnnotationProperty(foaf:name))


AnnotationAssertion(foaf:name _:genid2147483648 "Fred Bloggs")
)

@paulmillar
Copy link
Author

paulmillar commented Feb 11, 2025

Hi @balhoff ,

As you might have already guess, the example was made-up, something simple that demonstrates the problem. For reference, the real file is here:

https://github.com/ExPaNDS-eu/ExPaNDS-experimental-techniques-ontology/blob/master/source/PaNET_metadata.ttl

While thinking about this, one (perhaps obvious) idea was to try to make the algorithm for generating the blank nodes' IRIs more deterministic. If the IRI for a blank node were the same (across some change to the ontology that leaves the blank node unmodified) then robot diff wouldn't show any changes.

One possible way to be more deterministic might be to take all predicate-object pairs for axioms with the blank node as the subject, sort them, hash the result and use this hash to generate the blank node's IRI.

Naturally, if a blank node were to have an axiom with a blank node as the object then generating the "parent" blank node's IRI would need to be deferred until the "child" blank node's IRI was generated. Since such blank nodes can't be referenced, they should form a simple graph, and the IRI generation should work following a simple depth-first algorithm.

There is the possibility of collisions: two blank nodes with the same set of axioms. This would need to be checked and accounted for (e.g., breaking the symmetry using document order, and adding a counter as a suffix to the IRI). That said, I guess most blank nodes will contain a unique set of axioms, so this outcome is unlikely.

While not perfect, it would be relatively simple and (I think) it would allow robot diff to refrain from showing blank node axioms if they haven't changed.

That said, the output if a blank node has changed would be rather sub-optimal: the derived IRI would change, so robot diff would show all axioms had changed, but I think this approach would still be an improvement over the current behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants