GET and POST aren’t verbs

Calling HTTP GET and POST ‘verbs’ is a gross misnomer; they really are URI metadata in disguise.

REST is centered around the idea that we should use the way the web works when we do things on the Web – fair enough – and that REST is the architectural style of the Web. RESTful applications – like HTTP – use POST, GET, PUT and DELETE to CREATE, READ, UPDATE and DELETE resources.

The problem is, this is not how the current Web works.Real verbs can be applied to many nouns, and a single noun can take many verbs: the cat walks, the cat talks, the cat whines, the cat shines. Some combinations make no sense, but a lot do. This is not the case for GET and POST and their friends. In principle, it is possible to apply both GET and POST to a single resource; GET an EMPLOYEE record, POST to the same record.

In practice GET and POST do not often apply to the same resource. POST on the Web is used in HTML forms. A form has a method (GET or POST) and an URI (which points to a resource). Usually, POST forms have unique URI’s; they don’t share them with GETS. Amazon uses artificial keys to make the POST URI’s unique. More surprising, they even do the same thing when I GET the URI instead of POST it (which the Firefox Web Developer toolbar supports with a single menu choice). Amazon doesn’t care whether I GET or POST a book to my shopping cart; it (understandably) lands in my shopping cart either way. The same applies to del.icio.us and numerous other – I suspect most – sites.

The Web does not work through applying a small set of verbs to a large number of resources: the resources do all the work. GET and POST aren’t real verbs; they just signal what the URI owner intends to do when the URI is dereferenced; as such, they are URI metadata. The ‘GET’ metadata in a HTML form’s method just informs your browser, and all intermediaries that this URI is supposed to have no side-effects on the server. Of course no browser can know what actually happens when a URI is dereferenced: maybe a document is returned, maybe the missiles are fired. GET and POST are just assertions made by the URI owner about the side effects of the URI.

The ‘old’ Web wouldn’t be one bit different in appearance if both GET and POST weren’t even allowed on the same URI. It’s possible to use them as real verbs, as REST advocates; but the merits of this approach do not derive from the way the current Web works.

Don’t get me wrong: GET and POST are brilliant – especially GET. Returning the GET and POST metadata back to the server with the URI is what makes the Web tick: it allows intermediaries to do smart caching stuff, based on the GET and POST metadata. But however good – GET and POST are not verbs, they are metadata.

The #referent Convention

Update: I learned from the TAG list that Dan Connolly already proposed using #it or #this for the same purpose, and  Tim Berners-Lee proposed using #i to refer to oneself in a similar way. My idea therefore was not very original, and since I regularly read the TAG list and similar sources, it’s even possible I read the idea somewhere and (much) later thought of it as one of my own – though if this happened it was certainly unintentional.

There is a very simple solution to the entire hash-versus-slash debate: whenever you would want to identify anything with a hashless URI, suffix it with #referent. The meaning of x#referent is: I identify whatever x is about. And x is simply an information resource (about x#referent).

The httpRange-14 debate is about what hashless URI’s (without a #) refer to: can they refer only to documents (information resources) or to anything, i.e. persons or cars or concepts. Is it meaningful to say https://www.marcdegraauw.com/marcdegraauw/ refers to ‘Marc de Graauw’. Or does it now identify both a web page and a person, and is this meaningful and/or desirable?

Hash URI’s aren’t thought to be much of a problem in this respect. They have some drawbacks however. It may be desirable to retrieve an entire information resource which describes what the referent of the URI is. And putting all identifiers in one large file makes the file large. Norman Walsh did this: http://norman.walsh.name/knows/who#norman-walsh identifies Norman Walsh, and the ‘who’ file got big. So Norm switched to hashless URI’s: http://norman.walsh.name/knows/who/norman-walsh identifies Norman Walsh. The httpRange-14 solution requires Norm to answer to a GET on this URI with a 303 redirect, in this case to http://norman.walsh.name/knows/who/norman-walsh.html, which does not identify Norman, but simply is an information resource.

If we use the #referent convention, I can say: https://www.marcdegraauw.com/marcdegraauw.html#referent identifies me. And https://www.marcdegraauw.com/marcdegraauw.html is simply an information resource, which is about me. Problem solved.

If I put https://www.marcdegraauw.com/marcdegraauw.html#referent in a browser, I will simply get the entire https://www.marcdegraauw.com/marcdegraauw.html resource, which is a human readable resource about https://www.marcdegraauw.com/marcdegraauw.html#referent. Semantic Web software which understands the #referent convention will know https://www.marcdegraauw.com/marcdegraauw.html#referent refers to a non-information resource (except when web pages are about other web pages) and https://www.marcdegraauw.com/marcdegraauw.html is simply an information resource. Chances of collision of the #referent fragment identifier are very small (Semantic Web jokers who do this intentionally apart) and even in the case of collision with existing #referent fragment identifiers the collision seems pretty harmless. The only thing the #referent convention does not solve is all the existing hashless URI’s out there which (are purported to) identify non-information resources.
In Semantic Web architecture, there is no need ever for hashless URI’s. The #referent convention is easier, more explicit about what is meant, retrieves a nice descriptive human-readable information resource in a browser, along with all necesssary rdf metadata for Semantic Web applications.

The trouble with PSI’s

Published Subject Identifiers face a couple of serious problems, as do all URI-based identifier schemes. A recent post of Lars Marius Garshol reminded me of the – pardon the pun – subject. I was pretty occupied with PSI’s some time ago, maybe now is the moment to write down some of my reservations about PSI’s. PSI’s are URI’s which uniquely identify something, which is – ideally – described on the web page you see when you browse to the URI – read Lars’ post for a real introduction.

First, PSI’s solve the wrong problem. The idea of using some schema to uniquely identify things is hardly novel. Any larger collection of names or id’s for whatever sooner or later faces the problem of telling whether two names point to the same whatever or not. So we’ve got ISBN numbers for books, codes for asteroids and social security numbers for people. They are supposed to designate a single individual or single concept. The problem with all those identifier schemes is not the idea, but the fact that they get polluted over time. Through software glitches and human error and – mostly – intentional fraud a single social security number may point to two or more persons and two or more social security numbers may point to a single person. I can’t speak for non-Dutch realms, but the problem here is very real, and I assume given human nature it is not much different elsewhere. So the real problem is not inventing a identification scheme, the problem is avoiding pollution. This may seem like unfair criticism – no, PSI’s don’t solve famine and war either – but it does set PSI’s in the right light – they are not a panacea for identity problems.

Second, PSI’s are supposed to help identification for both computers – they will compare URI’s, and conclude two things are the same if their URI’s are equivalent – and for humans, through an associated web page. The trouble is what to put on the web page. Let’s make a PSI for me, using my social securitiy number: http://www.sofinummer.nl/0123.456.789. Now what can we put on the web page? We could say “This page identifies the person with the Dutch social security number 0123.456.789” – but that is hardly additional information. If we elaborate – “This page identifies Marc de Graauw, the son of Joop de Graauw and Mieke Hoendervangers, who was born on the 6th of March 1961 in Tilburg, the Netherlands, the person with the Dutch social security number 0123.456.789” we get into trouble. I could find out for instance I was not actually born in Tilburg, but my that my parents for some reason falsely reported this as my birthplace to the authorities. Now even if this were the case, 0123.456.789 would still be my social security number, and it would identify me, not someone else. But if we look at the page, we have to conclude http://www.sofinummer.nl/0123.456.789 identifies nobody, since nobody fits all the criteria listed. The same goes for any other fact we could list – I could find out I was born on another day, to other parents et cetera. The only truly reliable information, the one piece we cannot change, is “This page identifies the person who has been given the Dutch social security number 0123.456.789 by the Dutch Tax Authority”, which hardly is no information at all beyond the social security number itself. All we’ve achieved is prepending my social security number with http://www.sofinummer.nl/, and this simple addition won’t solve any real-world problem. The problem is highlighted by Lars’ example of a PSI, the date page for my birthday, http://psi.semagia.com/iso8601/1961-03-06. This page has no information whatsoever which could not be conveyed with a simple standardized date format, such ‘1961-03-06’ in ISO 8601.

Third, in a real-world scenario, establishing identity statements across concepts from diverse ontologies is the problem to solve. Getting everybody to use the same single identifier for a concept is not feasible. Take an example such as Chimezie Ogbuji’s work on a Problem-Oriented Medical Record Ontology, where it says about ‘Person’:

cpr:person = foaf:Person and galen:Person and rim:EntityPerson and dol:rational-agent

In FOAF a person is defined with:”Something is a foaf:Person if it is a person. We don’t nitpic about whethet they’re alive, dead, real or imaginary.”

HL7’s RIM defines Person as:

“A subtype of LivingSubject representing a human being.”

and defines LivingSubject as:
“A subtype of Entity representing an organism or complex animal, alive or not.”

In FOAF persons can be imaginary and ‘not real’, in the RIM they cannot. Now Chimezie wisely uses and which is an intersection in OWL, so his cpr:Person does not include imaginary persons. And for patient records, which are his concern, it doesn’t matter: we don’t treat Donald Duck for bird flu, so for medical care the entire problem is theoretical. But what about PSI’s: could we ever reconcile the concepts behind foaf:Person and rim:EntityPerson? Probably not: there are a lot of contexts where imaginary persons make sense. So if we make two PSI’s, foaf:Person and rim:EntityPerson, our subjects won’t merge, even when – in a certain context such as medical care – they should. Or we could forbid the use of foaf:Person in the medical realm, but this seems to harsh: the FOAF approach to personal information is certainly useful in medical care.

Identity of concepts is context-dependent. The definitions behind the concepts don’t matter much. Trying to find a universal definition for any complex concept such as ‘person’ will only lead to endless semantic war. Usually natural language words will do for a definition (but you do need disambiguation for homonyms). Way more important than trying to establish a single new id system with new definitions, are ways to make sensible context-dependent equivalences between existing id systems.

15 Sep 2008: Comments are closed

The Semantics of Addresses

There has been a lot of discussion over the past 10-something years on URI’s: are they names or addresses? However, there does not appear to have been a lot of investigation into the semantics of addresses. This is important, since while there are several important theories on the semantics of names (Frege, Russell, Kripke/Donnellan/Putnam et. al.), there have been little classical accounts of the semantics of addresses. A shot.

What are addresses? Of course, first come the standard postal addresses we’re all accustomed to:

Tate Modern
Bankside
London SE1 9TG
England

 

Other addresses, in a broad sense, could be:

52°22’08.07” N 4°52’53.05” E (The dining table on my roof terrace, in case you ever want to drop by. I suggest however, for the outdoors dining table, to come in late spring or summer.)

e2, g4, a8 etc. on a chess board

The White House (if further unspecified, almost anyone would assume the residence of the President of the United States)

(3, 6) (in some x, y coordinate system)

Room 106 (if we are together in some building)

//Myserver/Theatre/Antigone.html

128.30.52.47

Addresses are a lot like names – they are words, or phrases, which point to things in the real world. They enable us to identify things, and to refer to things – like names. ‘I just went to the van Gogh Museum‘ – ‘I was in the Paulus Potterstraat 7 in Amsterdam‘ – pretty similar, isn’t it?
So what makes addresses different from names, semantically? The first thing which springs to mind is ordinary names are opaque, and addresses are not. Addresses contain a system of directions, often but not always, hierarchical. In other words: there is information in parts of addresses, whereas parts of names do not contain useful information. From my postal address you can derive the city where I live, the country, the street. From chess notations and (geo-)coordinates one can derive the position on two (or more) axes. So addresses contain useful information within them, and names for the most part do not.

This is not completely true – names do contain some informative parts – from ‘Marc de Graauw’ you can derive that I belong to the ‘de Graauw’ family, and am member ‘Marc’ of it, but this does not function the way addresses do – it is not: go to the collection ‘de Graauw’ and pick member ‘Marc’. On a side note, though ‘de Graauw’ is an uncommon last name even in the Netherlands, I know at least one other ‘Marc de Graauw’ exists, so my name is not unique (the situation could have been worse though). I don’t even know whether my namesake is part of my extended family or not, so ‘looking up’ the ‘de Graauw’ family is not even an option for me.

Unique names or identifiers are usually even more opaque than natural names – my social security number does identify me uniquely in the Dutch social security system, but nothing can be derived from its parts other than a very vague indication of when it was established. So even when names contain some information within their parts, it is not really useful in the sense that it doesn’t establish much – not part of the location, or identity, or reference. The parts of addresses do function as partial locators or identifiers, the parts of names provide anecdotal information at best.

Names and addresses are fundamentally different when it comes to opacity. What else? Ordinary names – certainly unique names – denote unmediated, they point directly to an individual. Addresses denote mediated, they use a system of coordination to move step-by-step to their endpoint. Addressing systems are set up in such a way they provide a drilling-down system to further and further refine a region in a space until a unique location is denoted. Addresses are usually unique in their context, names sometimes are, and sometimes not. So, e4 denotes a unique square on a chess board, and my postal address a unique dwelling on Earth. The name ‘Amsterdam’ does denote a unique city if the context is the Netherlands, but my name does not denote a unique individual. So addresses pertain to a certain space, where a certain system of directives applies.

Addresses do not denote things, they denote locations. My postal address does not denote my specific house: if we tear it down and build another, the address does not change. e4 does not denote the pawn which stands there, it denotes a square on a chess board, whatever piece is there. So addresses do not denote things, but slots for things. Addresses uniquely denote locations, in a non-opaque, mediated way. If we use ‘name’ in a broad sense, where names can be non-opaque, we could say: addresses are unique names for locations in a certain space.

Names Addresses
Can identify identify
Can refer refer
Denote directly mediated
Point into the world a space
Denote things slots
Are opaque not opaque

Where does this leave us with URI’s? It’s quite clear URL’s (locator URI’s) are addresses. Looking at a URL like http://www.w3.org/2001/tag/doc/URNsAndRegistries-50.html#loc_independent , this tells us a lot:

1) this is the http part of uri space we’re looking at,

2) this is on host www.w3.org

3) the path (on this host) to the resource is 2001/tag/doc/URNsAndRegistries-50.html

4) and within this, I’m pointing to fragment #loc_independent

So URL’s fulfill all conditions of addresses. They are not opaque. Their parts contain useful information. Their parts – schema, authority, path etc. – provide steps to the URL’s destiny – the resource it points to. The identify, they refer, like names. No, URL’s are not addresses of files on file systems on computers, not addresses in this naive sense. But URL’s are addresses in URI space. HTTP URI’s are names of locations in HTTP space. Semantically, URL’s are addresses – at least. Whether URL’s can be names too is another question.

The URI Identity Paradox

Norman Walsh wrote: “(some) fall into the trap of thinking that an “http” URI is somehow an address and not a name”. It’s an opinion expressed more often, for instance in the TAG Finding “URNs, Namespaces and Registries”, where is says: http: URIs are not locations“. URNs, Namespaces and Registries” However, for everyone who believes an http URI is not an address but instead a name, there is a paradox to solve.

Suppose I’ve copied the ‘ Extensible Markup Language (XML) 1.0’ document from www.w3.org to my own website:
https://www.marcdegraauw.com/REC-xml-20060816

Now is the following identity statement true?

http://www.w3.org/TR/2006/REC-xml-20060816 is https://www.marcdegraauw.com/REC-xml-20060816

Certainly not, of course. If you and I were looking at https://www.marcdegraauw.com/REC-xml-20060816, and some paragraph struck you as strange, you might very well say: ‘I don’t trust https://www.marcdegraauw.com/REC-xml-20060816, I want to see http://www.w3.org/TR/2006/REC-xml-20060816 instead’. That’s a perfectly sensible remark. So of course the identity statement is not true: the document at www.w3.org is from an authorative source. It’s the ultimate point of reference for any questions on the Extensible Markup Language (XML) 1.0. The copy at www.marcdegraauw.com is – at best – a copy. Even if the documents which are retrieved from the URI’s are really the same character-for-character, they carry a very different weight. Next time, you would consult http://www.w3.org/TR/2006/REC-xml-20060816, not https://www.marcdegraauw.com/REC-xml-20060816.

So for http URI’s which retrieve something over the web, two URI’s in different domains represent different information resources. Let’s take a look at names next. I might set up a vocabulary of names of specifications of languages (in a broad sense): Dutch, English, Esperanto, Sindarin, C, Python, XML. In line with current fashion I would use URI’s for the names of those languages, and https://www.marcdegraauw.com/REC-xml-20060816 would be the name for XML. If the W3C had made a similar vocabulary of languages, it would probably use “http://www.w3.org/TR/2006/REC-xml-20060816” as the name for XML. And for names the identity statement

http://www.w3.org/TR/2006/REC-xml-20060816 is https://www.marcdegraauw.com/REC-xml-20060816

is simply true: both expressions are names for XML. So the statement is as true as classical examples as “The morning star is the evening star” or “Samuel Clemens is Mark Twain”. This shows we’ve introduced a synonym: http://www.w3.org/TR/2006/REC-xml-20060816 behaves different when used to represent a (retrievable) information resource and when used as a name. In itself, synonyms are not necessarily a problem (though I’d maintain they are a nuisance at all times). The problem can be solved when one knows which class a URI belongs to. If I choose to denote myself with https://www.marcdegraauw.com/, I can simply say in which sense I’m using https://www.marcdegraauw.com/:

https://www.marcdegraauw.com/ is of type foaf:Person

or

https://www.marcdegraauw.com/ is of type web:document

In the first sense, https://www.marcdegraauw.com/ may have curly hair and be 45 years of age and have three sons, in the second sense it may be retrieved as a sequence of bits over the web. Now, when we try to apply the same solution to the XML example, a paradox emerges. Of course we can talk about http://www.w3.org/TR/2006/REC-xml-20060816 as a name and qualify it:”http://www.w3.org/TR/2006/REC-xml-20060816 is of type spec:lang” or whatever. This is the sense in which the identity statement is true: http://www.w3.org/TR/2006/REC-xml-20060816 is a name for a language specification, and https://www.marcdegraauw.com/REC-xml-20060816 is another name for the same specification. Now try to come up with a proper classification for http://www.w3.org/TR/2006/REC-xml-20060816 for the sense where the identity statement is not true, and find a classification which does not secretly introduce the notion of address. I say it cannot be done. One can classify as “type address” or “type location” et cetera, but every classification which allows the identity statement to be false carries the notion of “address” within it. Whoever maintains that URI’s are not addresses or locations, will have to admit the identity statement is both true and false at the same time.

URI’s are addresses or locations (though not in the simplistic sense of being files in directories under a web root on some computer on the Internet). And when URI’s are used as names as well, every information resource which is named (not addressed) with its own URI is a synonym with the URI used as an address of itself. The URI-as-name and the URI-as-address will behave different and have different inferential consequences: for the URI-as-address, cross-domain identity statements will never be true, for the URI-as-name they may be true or may be false. If you want to avoid such synonyms, you’ll have to use URI’s such as http://www.w3.org/nameOf/TR/2006/REC-xml-20060816. IMO, it can’t get much uglier. If you accept the synonyms, you’ll have to accept a dual web where every http URI is an address and may be a name of the thing retrieved from this address as well – and those two are not the same.

Update: Norman Walsh has convinced me in private email conversation this post contains several errors. I will post a newer version later.