The URI Identity Paradox

Norman Walsh wrote: “(some) fall into the trap of thinking that an “http” URI is somehow an address and not a name”. It’s an opinion expressed more often, for instance in the TAG Finding “URNs, Namespaces and Registries”, where is says: http: URIs are not locations“. URNs, Namespaces and Registries” However, for everyone who believes an http URI is not an address but instead a name, there is a paradox to solve.

Suppose I’ve copied the ‘ Extensible Markup Language (XML) 1.0’ document from www.w3.org to my own website:
https://www.marcdegraauw.com/REC-xml-20060816

Now is the following identity statement true?

http://www.w3.org/TR/2006/REC-xml-20060816 is https://www.marcdegraauw.com/REC-xml-20060816

Certainly not, of course. If you and I were looking at https://www.marcdegraauw.com/REC-xml-20060816, and some paragraph struck you as strange, you might very well say: ‘I don’t trust https://www.marcdegraauw.com/REC-xml-20060816, I want to see http://www.w3.org/TR/2006/REC-xml-20060816 instead’. That’s a perfectly sensible remark. So of course the identity statement is not true: the document at www.w3.org is from an authorative source. It’s the ultimate point of reference for any questions on the Extensible Markup Language (XML) 1.0. The copy at www.marcdegraauw.com is – at best – a copy. Even if the documents which are retrieved from the URI’s are really the same character-for-character, they carry a very different weight. Next time, you would consult http://www.w3.org/TR/2006/REC-xml-20060816, not https://www.marcdegraauw.com/REC-xml-20060816.

So for http URI’s which retrieve something over the web, two URI’s in different domains represent different information resources. Let’s take a look at names next. I might set up a vocabulary of names of specifications of languages (in a broad sense): Dutch, English, Esperanto, Sindarin, C, Python, XML. In line with current fashion I would use URI’s for the names of those languages, and https://www.marcdegraauw.com/REC-xml-20060816 would be the name for XML. If the W3C had made a similar vocabulary of languages, it would probably use “http://www.w3.org/TR/2006/REC-xml-20060816” as the name for XML. And for names the identity statement

http://www.w3.org/TR/2006/REC-xml-20060816 is https://www.marcdegraauw.com/REC-xml-20060816

is simply true: both expressions are names for XML. So the statement is as true as classical examples as “The morning star is the evening star” or “Samuel Clemens is Mark Twain”. This shows we’ve introduced a synonym: http://www.w3.org/TR/2006/REC-xml-20060816 behaves different when used to represent a (retrievable) information resource and when used as a name. In itself, synonyms are not necessarily a problem (though I’d maintain they are a nuisance at all times). The problem can be solved when one knows which class a URI belongs to. If I choose to denote myself with https://www.marcdegraauw.com/, I can simply say in which sense I’m using https://www.marcdegraauw.com/:

https://www.marcdegraauw.com/ is of type foaf:Person

or

https://www.marcdegraauw.com/ is of type web:document

In the first sense, https://www.marcdegraauw.com/ may have curly hair and be 45 years of age and have three sons, in the second sense it may be retrieved as a sequence of bits over the web. Now, when we try to apply the same solution to the XML example, a paradox emerges. Of course we can talk about http://www.w3.org/TR/2006/REC-xml-20060816 as a name and qualify it:”http://www.w3.org/TR/2006/REC-xml-20060816 is of type spec:lang” or whatever. This is the sense in which the identity statement is true: http://www.w3.org/TR/2006/REC-xml-20060816 is a name for a language specification, and https://www.marcdegraauw.com/REC-xml-20060816 is another name for the same specification. Now try to come up with a proper classification for http://www.w3.org/TR/2006/REC-xml-20060816 for the sense where the identity statement is not true, and find a classification which does not secretly introduce the notion of address. I say it cannot be done. One can classify as “type address” or “type location” et cetera, but every classification which allows the identity statement to be false carries the notion of “address” within it. Whoever maintains that URI’s are not addresses or locations, will have to admit the identity statement is both true and false at the same time.

URI’s are addresses or locations (though not in the simplistic sense of being files in directories under a web root on some computer on the Internet). And when URI’s are used as names as well, every information resource which is named (not addressed) with its own URI is a synonym with the URI used as an address of itself. The URI-as-name and the URI-as-address will behave different and have different inferential consequences: for the URI-as-address, cross-domain identity statements will never be true, for the URI-as-name they may be true or may be false. If you want to avoid such synonyms, you’ll have to use URI’s such as http://www.w3.org/nameOf/TR/2006/REC-xml-20060816. IMO, it can’t get much uglier. If you accept the synonyms, you’ll have to accept a dual web where every http URI is an address and may be a name of the thing retrieved from this address as well – and those two are not the same.

Update: Norman Walsh has convinced me in private email conversation this post contains several errors. I will post a newer version later.