Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocabulary Stability: How much is needed and how do we achieve it? #65

Open
sandhawke opened this issue Jan 25, 2016 · 13 comments
Open

Comments

@sandhawke
Copy link

This is splitting off a thread from another issue: solid/solid#35 (comment)

edit to clarify: we're interested in this issue as it applies to solid, not in general.

The meaning of an RDF graph depends on the meaning of the predicate URIs (aka property ids) used in that graph. If I say {<sandro> <http://example.org/likes> <salmon>}, and we assume for now the terms <sandro> and <salmon> have their conventional meanings, that statement might mean I like salmon, or I hate salmon, or I am a salmon, or I own some salmon, or ... practically anything. It totally depends on what the predicate <http://example.org/likes> is accepted to mean. That triple might mean I promise to pay $1000 to each person who walks up to me and says "spaghetti". If we all agreed that's what it meant, that would be okay. (Similar issues arise around the terms <sandro> and <salmon> but they're no harder to solve, so let's worry about them later.)

The problem is, how do we all come to agreement about what a predicate URI means? And what happens if that meaning changes over time?

If I was mistaken about the meaning of that term when I made that statement, I've ended up accidentally providing false information. If the meaning changes after I make the statement, and it's not clear the meaning has changed, I've been turned into a liar.

In general, at this point, the RDF community shrugs and doesn't worry too much about this. I suggest this is one of the reasons people who need their computers to do the right thing shrug and walk away from RDF. This github issue is a place for folks to talk about this a little, if they want.

There is vast history around this. I think it was most actively discussed in 2002-2003 as the RDF Core WG tried to decide what the new RDF specifications should say, under the heading "Social Meaning" (as opposed to "formal meaning", as in the formal model theory for RDF). Eventually they decided consensus was impossible and chose to remain silent.

Two bits of historical reading:

I'm sure there's lots more.

I don't know of any credible solution yet. It's become clear to me that dereference is of little use, because it doesn't guarantee stability. And a standards process is also of little use because it's just too slow and expensive. The best we can do today is a very slow and expensive combination of things: make a standard, have an active community that agrees about the meaning, and also make dereference work. And even that's not good enough for many applications areas, I suspect.

I think the solution is going to be something where the text of the spec is provably frozen, and there's good mapping between versions, so meanings can nicely evolve, free from any confusion about which meaning was intended when a given document was written, but also usable when the meaning hasn't changed too much for a particular application. Two of my sketches in this direction are http://decentralyze.com/2014/06/30/growjson/ and http://www.w3.org/ns/mics .

@kidehen
Copy link

kidehen commented Jan 25, 2016

@sandhawke as you know <#likes> != http://ontologi.es/like#likes . Thus, shouldn't we always expect terms functioning as statement/sentence predicates to be defined using Linked Open Data principles (which basically enables one lookup their meaning)?

{
<#this> schema:about http://ontologi.es/like#likes .
<#this> skos:related http://www.wikihow.com/Differentiate-Between-a-Term-and-a-Word#this .
http://ontologi.es/like#likes schema:mainEntityOfPage http://linkeddata.uriburner.com/c/8EAUSQ .
}

[1] http://linkeddata.uriburner.com/c/8EAUSQ -- a document that describes like:likes.
[2] http://www.wikihow.com/Differentiate-Between-a-Term-and-a-Word -- difference between a word and a term.

@bblfish
Copy link

bblfish commented Jan 25, 2016

Sandro the problem you are describing is known as the Metastability of language in philosophy. David Lewis mentions it twice in his 1969 PhD thesis Convention where he explains how language arises out of conventions. Metastability is a type of global stability which allows local change.

Language, Social Institutions and the web are metastable. They are based on trust, which can be abused. Every time you link to a web page you put out a bit of trust. Every time you get into a bus too. Bracha Ettinger a psychologist/artist pushes that so far as to make it a key part of psychology in her Matrixial Borderspaces -- this type of writing is absolutely the opposite of David Lewis' and you may find it incomprehensible, I think of it more as a poem. What is interesting is that all of these different philosophies converge: conventions are strengthened by use, which stabilises their meaning, because it allows cooperation between people to occur. Cooperation can be explained game theoretically as as a coordination problem. Coordination does psychologically of course require that one is trying to work on a project together with others. Human civilisation could not have emerged without that in any case.

The question of how words get meaning is a complex one. Gareth Evans in his 500 page work The Varieties of Reference, which gives an overview of the debates since Frege, looks at the notion of how we can grasp a concept. Some concepts are innate (e.g. a lot of concepts in vision), many teachable (eg. maths). Concepts have to be composable to form sentences so that limited minds can grasp them. There has to be a minimal element of a concept that allows the thinker to learn it and to judge sentences deploying it to be true.

Currently this is what we get by dereferencing a semantic web URI on the web: the description of the concept is the Pointed Named Graph referred to by the (#) URL. As you point out with http urls that graph can evolve. So it is actually a stream of graphs, pointing always to the latest version.

Now note that David Lewis in Convention and his article "Language and Languages", where he sets out to identify the structure of ALL possible languages, shows how a language specified completely mathematically can evolve too! Indeed a philosopher of Language has to explain such change. In David Lewis purely extensional philosophy of language a (mathematically modelled) language consists of a vocabulary and a grammar that maps phrases built out of the vocabulary onto meanings construed as sets of possible worlds. Sets of possible worlds are mathematical objects that don't change. He integrates change into this model by showing it to be epistemological: we don't know what mathematical language we are speaking -- this can be modelled as us speaking an infinite set of overlapping languages. Sometimes as a new word may be redefined to make the whole language more precise, and this process can be thought of as a selection in the set of set of languages that we are speaking. When this is done correctly it minimises the change of sentences that are true to those that are false.

So if a mathematical model of any possible language has to integrate change, which each speaker of the language can affect, but with nobody control the whole, then we should not be surprised if we find this same thing happen in a technical implementation such as the semantic web. Change has to be taken into account in any language, and it will be in great part out of our control.

Nevertheless let us consider what could be done if we had a URL that pointed to a particular representation, using a DHT url such as ipfs://, or a a url+etag, .... for convenience I'll just identify
that group by one of their most famous members, and call these DHT urls.

  1. One could use DHT urls that don't evolve. But then it would be difficult to adapt the vocabulary over time, and without evolution things would end up breaking.
  2. One could have DHT Urls that do evolve, pointing to the head of a stream: but then we'd essentially have the same feature as http URLs.
  3. One could have URLs pointing to a particular DHT, but with the convention that the user should follow links from that version forward to the latest version, allowing evolution whilst also grounding statements in a particular version. In natural language we do that by looking at the date of initial publication of a book and then if needed looking up definitions or usages of that word at that time to understand the context, even if we try to read it without doing that work - and usually succeed (that's what metastability is about).

I think there are actually use cases for each of these. A initial intuition would be: Pure DHT urls may be exactly right for signing contracts. Evolvable URLs for my profile. And perhaps 3 for ontologies.

In versioned URLs with a convention of moving to latest version, gives us context there is still the problem of how the head of the tree is decided.

  • With http URLs the head is decided by the owner of the domain (in the case of FOAF it's @danbri) which requires trust. Trust is not something to sniff at, and has many advantages.
  • With DHT URLs with a convention there are a number of options:
    • the head of the stream could be something that is in the hands of the initial author in which case we are back at 1 with http(s) urls
    • or we may have vocabulary spits with many future possible versions. This would also be ok, but could be problematic for programs, requiring users to choose which future version they think is the most appropriate
    • or we could have a consensus mechanism for deciding the latest version, and perhaps here some blockchain algorithm -- see also the UK Government office for Science's recent report "Distributed Ledger Technology: beyond block chain" -- could be the right tool. Note that such tools come with their own problems, are in a strong hype phase, and are evolving very fast.

Needless to say this latest is a very interesting research topic.

What is clear is that we can get make a lot of progress with http urls, while these latest features are explored. It should not be too difficult to integrate those changes if and when they come about. Indeed we could work with researchers on this topic to make sure a good solution ties in with web access control and LDP and the web as it is now, for a smooth transition - exactly the way one would expect in a metastable system.

Furthermore we should consider that the meaning of a term is not entirely specified by the description of a term. It is also specified by its use. Here we can pull on philosophical research from pragmatic/analytical philosophers such as Robert Brandom's Making it Explicit: Reasoning, Representing, and Discursive Commitment and Christopher Peacocke's work on A Study of Concepts.

The use of a term in the stack we are building here is strongly defined by the applications that use it. For example foaf:knows will be displayed in distributed address book type applications (such as the one I presented in 2008 at JavaOne) in a way that leads folks using such tools to have certain expectations. An ontology is alive (see Language, Thought and Other Biological Categories: New Foundations for Realism for a defence of that metaphor) if it is used by applications that allow folks to coordinate actions. Clearly in the case of a meeting ontology it is alive if people use it to organise meetings. The tools that allow people to integrate that ontology will not be completely buildable from the ontology alone, as those are often defined using natural language which pack a lot of complexity, and also as they have to tie into concepts instantiated in the users brains, which the tools themselves help to define (see the work by Bernard Stiegler on the organology of technology where he often starts his explanation from the very readable work Proust and the Squid: The Story and Science of the Reading Brain). The process of making a concept explicit is a long one (arguably rarely finished). The tools also influence the concepts people have, and the social expectations. So in a sense the meaning does go beyond the minimal definition of the ontology, and is to be found in its use.

The notion of the organological is one that tries to think the individual, the technical tools he uses, and society as a very complex system(s) where each part influences the others. For example the education system that was built at the beginning of the 20th century was a system set in place to reformat the brains of the citizens to allow them access to reading, in order to allow the emergence of a highly technical society, regulated by laws, police both dependeng on writing, and of course older systems such as habits of politeness, enabling the building of new technical systems such as highways, leading to driving codes, and driving skills, etc. etc... In short the technical tools shape the way we work, the laws, the economy, creating revolutions which require legal and political changes - eg the development of the consumer society, as a system of redistribution to create the market to be able to sell the goods produced mechanically at much higher speed (Rooselvelt's New Deal).

The system is stable but in constant evolution. As we don't yet have technical solutions to solve the problem of evolution of vocabularies in a decentralised and mechanical way based on some notion of consensus, we can use other existing tools to stabilise vocabularies. After all someone publishing a vocabulary is asking for those that use it for trust, and that trust is something that can be abused, where abuse has social and hence ultimately legal consequences. e.g. Software using the foaf vocabulary is relying on @danbri to evolve the vocabulary in a reasonable fashion that respects the existing usage of it.

One can build more resilient systems that are compatible with the current one. But one has to be careful not to over securitise a system - ie render it unusable. Again this is why society creates spaces of trust to function.

@dmitrizagidulin
Copy link
Member

Sandro and Henry, if I understand correctly, the issues that you're describing with the current tools and conventions are:

  1. No schema versioning. Specifically, no version number is readily apparent in the URL of the property, and the vocabulary itself is often not versioned. This introduces a dangerous level of ambiguity to the usage (and is one of the reasons that Schema.org is not really usable for our purposes).
  2. The burden of several levels of trust lies solely with the vocabulary owner. (The trust that the URL will persist, that they will keep evolving the vocabulary to keep up with the changing landscape, that they won't break something or veer off into a wild new direction). And there's no easy recourse for when something goes awry. (In other words, this is the too much stability problem.)
  3. No culture (or enforced/recommended tool-chain) of URL de-referencing and checking. This is the part that I was horrified to discover the other day at lunch. That one can just point to an invalid or non-existing property of a remote vocabulary, and nothing breaks.

Proposed solutions

These problems are not unique to the world of ontologies. They also apply to the general problem of library and package management, in the world of open source software (the problems faced by Node's NPM and Ruby's Gems communities, and many others). And we can re-use very similar solutions, modified to fit the particulars of vocabulary building.

(Based on our earlier conversation with Sandro) a possible solution would be:

  1. Version the schemas. Specifically, let's put the version into the property's URL (using a semantic versioning scheme tweaked for vocabs). Versioning in a Link header may be more correct from a REST standpoint (and can be also done), but by including it in the URL, a developer can see at a glance which version is referenced.
  2. Include the short author (or owner organization) name in the URL (think Github repos). This minor detail allows for vocabularies to be forked (once there is appropriate social consensus), and so hosted under different orgs' namespace, but preserving the original vocab name.
  3. Encourage a culture of link dereferencing and checking. Choose (or develop, if not present) an easy link-checker tool (something like 'rdf-lint') that gets run at "compile time" (as a Make task or during the build process), that makes sure the referenced URLs for vocabularies exist (and that the classes and properties referenced also exist). Advertise its use on projects READMEs and the like.

@kidehen
Copy link

kidehen commented Jan 25, 2016

@bblfish -- Also note the document about Language & Natural Logic by John F. Sowa at: http://www.jfsowa.com/talks/natlog.pdf

@danbri
Copy link

danbri commented Jan 25, 2016

@dmitrizagidulin this is a long-running conversation, regarding RDF/S.

In https://www.w3.org/TR/1998/WD-rdf-schema-19980814/ I was responsible for the following (rather naive) text:

Versioning and URI references

The Resource Description Framework is intended to be flexible and easily extensible; 
this suggests that a great variety of schemas will be created and that new and 
improved versions of these schemas will be a common occurence on the Web. Since 
changing a schema risks breaking other RDF graphs which depend on that schema, 
this specification requires that a new URI is used whenever an RDF schema is 
changed. In effect, changing a schema creates a new one; new schemas namespaces 
should have their own URI to avoid ambiguity. Since an RDF Schema URI unambiguously 
identifies a single version of a schema, RDF processors (and Web caches) should be 
able to safely store copies of RDF schema graphs for an indefinite period. The 
problems of RDF schema evolution share many characteristics with XML DTD version 
management and the general problem of Web resource versioning. Is is expected that 
a general approach to these issues will presented in a future version of this 
document, in co-ordination with other W3C activities. Future versions of this 
document may also offer RDF specific guidelines: for example, describing how 
a schema could document its relationship to preceding versions.

In particular, "changing a schema creates a new one; new schemas namespaces should have their own URI to avoid ambiguity" soon proved itself essentially undeployable. The painful migration of Dublin Core from URIs containing /1.0/ to /1.1/ due to minor definitional tweaks was an example of this.

There is a lot to be said for the idea that the meaning of an RDF property is grounded also in its use, and not just in the assertions made by its creators / maintainers. For concrete example, http://xmlns.com/foaf/spec/#term_schoolHomepage where we changed the definition to match how it was being used in practice by Americans who brought their own interpretation to the word "school":

The original application area for schoolHomepage was for 'schools' in the British-English sense; however American-English usage has dominated, and it is now perfectly reasonable to describe Universities, Colleges and post-graduate study using schoolHomepage.

@dmitrizagidulin
Copy link
Member

@danbri - so what's the implication? (Of the phrase "changing a schema creates a new one; new schemas namespaces should have their own URI to avoid ambiguity"). What am I missing here?

I kind of assumed that it would, hence my proposal to explicitly include the schema version number in the URI.
The key, however, is to strictly encourage (in the cultural/community sense) the use of semantic versioning, both in the URIs themselves, and in the tooling/verification stack.

Given a version number MAJOR.MINOR.PATCH, increment the:

  1. MAJOR version when you make incompatible API changes,
  2. MINOR version when you add functionality in a backwards-compatible manner, and
  3. PATCH version when you make backwards-compatible bug fixes.

Specifically -- do not use the PATCH number in the URI, only the MAJOR and MINOR numbers.
The semantic versioning idea would be slightly refactored to serve ontologies:

  • Changing of meaning and usage of properties (like the schoolHomepage example you bring up) would only increment the PATCH number.
  • Adding fields (and other such non-breaking changes) would bump the MINOR number, and so on.

@danbri
Copy link

danbri commented Jan 25, 2016

The suggestion from experience is that these kinds of schema have literally [1] more in common with dictionary entries than with software library versioning.

[1] http://www.salon.com/2013/08/22/according_to_the_dictionary_literally_now_also_means_figuratively_newscred/

@dmitrizagidulin
Copy link
Member

Sure. But dictionaries publish new editions when they add or remove entries, no?

@bblfish
Copy link

bblfish commented Jan 25, 2016

@dmitrizagidulin software works in very different ways from the web. Most software works on close world models, whereas rdf on an open world model, for one. So one can't really conclude from software to the semantic web. The semantic web is much closer to language which is metastable.

@sandhawke
Copy link
Author

Short meta-comment for now: I've modified the issue statement to include a line (in bold) that the focus here is solid. We're not trying to solve this for "the semantic web", whatever that might actually turn out to be. So comments like, "The semantic web is much closer to language which is metastable," are relevant only if you're trying to bring up the semweb as a point of comparison. Otherwise it's irrelevant.

Sorry for not thinking to make the clear in my first version of the issue statement.

@kidehen
Copy link

kidehen commented Jan 25, 2016

@sandhawke : If possible, would you consider changing:
{ http://example.org/likes }
to:
@Prefix likes: http://ontologi.es/like# .
{ <#sandro> like:likes <#salmon>} .

Using index oriented sign (i.e., indexical) is crucial to understanding the issues at hand. Unfortunately, http://example.org/likes doesn't convey the same granularity required for understanding, neither does http://example.org/likes#this hence my use of likes:like (which one can lookup en route to understanding the relationship type (relation) represented by the statement: { <#sandro> like:likes <#salmon> }.

We can even flesh this out further (I moved the braces around since our nanotation processor will be able to process the RDF statements as presented):

{
@Prefix likes: http://ontologi.es/like# .
@Prefix foaf: http://xmlns.com/foaf/0.1/ .

<#sandro> like:likes <#salmon> ;
a foaf:Person ;
foaf:nick "@sandhawke" .

<#salmon>
owl:sameAs http://wikidata.org/entity/Q2796766 .
}

The information conveyed by the statements above provide ample context for understanding entity types, relationship types, and entity relationships represented by the statements above.

@bblfish
Copy link

bblfish commented Jan 25, 2016

@sandhawke understood. Though LDP which is the read-write API of the semweb ( which includes the web), is part of SoLiD. So I think its relevant. Also the title of the issue is about stability, and so the point that vocabularies are meta-stable, not stable is also relevant. Finally in my longer post above, I did point to some interesting research projects that we could sponsor to help map out the space. :-)

@sandhawke
Copy link
Author

Yes, I appreciate the long explanation above -- been too busy today to write a proper reply.

@deiu deiu added the discussion label Feb 4, 2016
@csarven csarven transferred this issue from solid/solid Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants