semantics Archives – Page 2 of 2 – Marc de Graauw

See also my previous posts on this issue.

So we’ve got backward and forward compatibility, and syntactical and semantical compatibility. (Quick recapture: Backward compatibility is the ability to accept data from older applications, forward compatibility the ability to accept data from newer applications. Syntactical compatibility is the ability to successfully (i.e. without raising errors) accept data from other version applications, semantical compatibility the ability to understand data from other version applications.)

So what else is there?

Noah Mendelsohn made clear to me one has to distinguish language and application compatibility.

Let’s see what this means when looking at syntactical and semantical compatibility. A language L2 is syntactically backward compatible with an earlier language L1 if and only if every L1 document is also an L2 document. Or to rephrase it: if and only if an application built to accept L2 documents also accepts L1 documents. Or (the way I like it): if and only if the set of L1 documents is a subset of the set of L1 documents:

L1 is a subset of L2

And of course L2 is forwards compatible with respect to L1 if, and only if, every L2 document is also a L1 document:

L2 is a subset of L1

This makes it quite clear that if L2 is both backward and forward compatible with respect to L1, both of the above diagrams apply, so L2 = L1:

L2 is L1

But this flies in the face of accepted wisdom! Of course both backward and forward compatibility is possible! HTML is designed to be forward compatible, is it not, through the mechanism of ‘ignore unknown tags’. And two HTML versions of course can be backward compatible, if HTMLn+1 supports everything HTMLn does. Yet the above diagrams speak for themselves as well. The distinction between language and application compatibility offers the solution. The diagrams only are about syntactical language compatibility. The HTML forward compatibility mechanism is about applications as well: HTML instructs browsers to ignore unknown markup. So the HTML compatibility mechanism is about browser, ergo application behavior.

HTML tells HTMLn browsers to accept all L2 (HTMLn+1) markup (and ignore it), and to accept all L1 (HMTLn) markup, and – not ignore, but – process it. (“If a user agent encounters an element it does not recognize, it should try to render the element’s content.” – HTML 4.01) Now this sounds familiar – that’s syntactical versus semantical compatibility, isn’t it? So HTML makes forward compatibility possible through instructing the application – the browser – to syntactically accept future versions, but semantically ignore them. The n-version browser renders n+1 element content, but has no idea what the tags around it mean (render bold and blue? render indigo and italic? render in reverse?).

Summing up: there is no such thing as two (different) languages L2 and L1 which are both back- and forward compatible. There is such a thing as two applications A1 (built for language L1) and A2 (built for L2) which are both back- and forward compatible: A1 must ignore unknown L2 syntax, and A2 must accept and process all L1 syntax and semantics:

A2 back- and forward compatible wrt A1

(Yes, and to be complete, this is a rewrite of an email list submission of mine, but vastly improved through discussions with Noah Mendelsohn and David Orchard, who may or may not agree with what I’ve said here…)

Norman Walsh wrote: “(some) fall into the trap of thinking that an “http” URI is somehow an address and not a name”. It’s an opinion expressed more often, for instance in the TAG Finding “URNs, Namespaces and Registries”, where is says: “http: URIs are not locations“. URNs, Namespaces and Registries” However, for everyone who believes an http URI is not an address but instead a name, there is a paradox to solve.

Suppose I’ve copied the ‘ Extensible Markup Language (XML) 1.0’ document from www.w3.org to my own website:
https://www.marcdegraauw.com/REC-xml-20060816

Now is the following identity statement true?

“http://www.w3.org/TR/2006/REC-xml-20060816 is https://www.marcdegraauw.com/REC-xml-20060816“

Certainly not, of course. If you and I were looking at https://www.marcdegraauw.com/REC-xml-20060816, and some paragraph struck you as strange, you might very well say: ‘I don’t trust https://www.marcdegraauw.com/REC-xml-20060816, I want to see http://www.w3.org/TR/2006/REC-xml-20060816 instead’. That’s a perfectly sensible remark. So of course the identity statement is not true: the document at www.w3.org is from an authorative source. It’s the ultimate point of reference for any questions on the Extensible Markup Language (XML) 1.0. The copy at www.marcdegraauw.com is – at best – a copy. Even if the documents which are retrieved from the URI’s are really the same character-for-character, they carry a very different weight. Next time, you would consult http://www.w3.org/TR/2006/REC-xml-20060816, not https://www.marcdegraauw.com/REC-xml-20060816.

So for http URI’s which retrieve something over the web, two URI’s in different domains represent different information resources. Let’s take a look at names next. I might set up a vocabulary of names of specifications of languages (in a broad sense): Dutch, English, Esperanto, Sindarin, C, Python, XML. In line with current fashion I would use URI’s for the names of those languages, and https://www.marcdegraauw.com/REC-xml-20060816 would be the name for XML. If the W3C had made a similar vocabulary of languages, it would probably use “http://www.w3.org/TR/2006/REC-xml-20060816” as the name for XML. And for names the identity statement

“http://www.w3.org/TR/2006/REC-xml-20060816 is https://www.marcdegraauw.com/REC-xml-20060816“

is simply true: both expressions are names for XML. So the statement is as true as classical examples as “The morning star is the evening star” or “Samuel Clemens is Mark Twain”. This shows we’ve introduced a synonym: http://www.w3.org/TR/2006/REC-xml-20060816 behaves different when used to represent a (retrievable) information resource and when used as a name. In itself, synonyms are not necessarily a problem (though I’d maintain they are a nuisance at all times). The problem can be solved when one knows which class a URI belongs to. If I choose to denote myself with https://www.marcdegraauw.com/, I can simply say in which sense I’m using https://www.marcdegraauw.com/:

https://www.marcdegraauw.com/ is of type foaf:Person

https://www.marcdegraauw.com/ is of type web:document

In the first sense, https://www.marcdegraauw.com/ may have curly hair and be 45 years of age and have three sons, in the second sense it may be retrieved as a sequence of bits over the web. Now, when we try to apply the same solution to the XML example, a paradox emerges. Of course we can talk about http://www.w3.org/TR/2006/REC-xml-20060816 as a name and qualify it:”http://www.w3.org/TR/2006/REC-xml-20060816 is of type spec:lang” or whatever. This is the sense in which the identity statement is true: http://www.w3.org/TR/2006/REC-xml-20060816 is a name for a language specification, and https://www.marcdegraauw.com/REC-xml-20060816 is another name for the same specification. Now try to come up with a proper classification for http://www.w3.org/TR/2006/REC-xml-20060816 for the sense where the identity statement is not true, and find a classification which does not secretly introduce the notion of address. I say it cannot be done. One can classify as “type address” or “type location” et cetera, but every classification which allows the identity statement to be false carries the notion of “address” within it. Whoever maintains that URI’s are not addresses or locations, will have to admit the identity statement is both true and false at the same time.

URI’s are addresses or locations (though not in the simplistic sense of being files in directories under a web root on some computer on the Internet). And when URI’s are used as names as well, every information resource which is named (not addressed) with its own URI is a synonym with the URI used as an address of itself. The URI-as-name and the URI-as-address will behave different and have different inferential consequences: for the URI-as-address, cross-domain identity statements will never be true, for the URI-as-name they may be true or may be false. If you want to avoid such synonyms, you’ll have to use URI’s such as http://www.w3.org/nameOf/TR/2006/REC-xml-20060816. IMO, it can’t get much uglier. If you accept the synonyms, you’ll have to accept a dual web where every http URI is an address and may be a name of the thing retrieved from this address as well – and those two are not the same.

Update: Norman Walsh has convinced me in private email conversation this post contains several errors. I will post a newer version later.

Category: semantics

More Compatibility Flavours

The URI Identity Paradox