Validate for Machines, not Humans

Mark Baker misses an important distinction in “Validation Considered Harmful” when he writes:

“Today’s sacred cow is document validation, such as is performed by technologies such as DTDs, and more recently XML Schema and RelaxNG.

Surprisingly though, we’re not picking on any one particular validation technology. XML Schema has been getting its fair share of bad press, and rightly so, but for different reasons than we’re going to talk about here. We believe that virtually all forms of validation, as commonly practiced, are harmful; an anathema to use at Web scale.”

Dare Obasanjo replied in “Versioning does not make validation irrelevant“:

“Let’s say we have a purchase order format which in v1 has a element which can have a value of "U.S. dollars" or "Canadian dollars" then in v2 we now support any valid currency. What happens if a v2 document is sent to a v1 client? Is it a good idea for such a client to muddle along even though it can't handle the specified currency format?"

to which Mark replied:

“No, of course not. As I say later in the post; ‘rule of thumb for software is to defer checking extension fields or values until you can’t any longer'”

With software the most important point is whether the data sent ends up with a human, or ends up in software – either to be stored in a database for possible later retrieval, or is used to generate a reply message without human intervention. Humans can make sense of unexpected data: when they see “Euros” where “EUR” was expected, they’ll understand. Validating as little as possible makes sense there. When software does all the processing, stricter validation is necessary – trying to make software ‘intelligent’ by enabling it to process (not just store, but process) as-yet-unknown format deviations is a road to sure disaster. So in the latter case stricter validation makes a lot of sense – we accept “EUR” and “USD”, not “Euros”. And if we do that, the best thing for two parties who exchange anything is to make those agreements explicit in a schema. If we “defer checking extension fields or values until you can’t any longer” we end up with some application’s error message. You don’t want to return that to the partner who sent you a message – you’ll want to return “Your message does not validate against our agreed-upon schema”, so they know what to fix (though sometimes you’ll want your own people to look at it first, depending on the business case).

Of course one should not include unnecessary constraints in schema’s – but whether humans or machines will process the message is central in deciding what to validate and what not.
Another point is what to validate – values in content or structure, and Uche Ogbuji realistically adds:

“Most forms of XML validation do us disservice by making us nit-pick every detail of what we can live with, rather than letting us make brief declarations of what we cannot live without.”

Yes, XML Schema and others make structural requirements which impose unnecessary constraints. Unexpected elements often can be ignored, and this enhances flexibility.

The Semantics of Addresses

There has been a lot of discussion over the past 10-something years on URI’s: are they names or addresses? However, there does not appear to have been a lot of investigation into the semantics of addresses. This is important, since while there are several important theories on the semantics of names (Frege, Russell, Kripke/Donnellan/Putnam et. al.), there have been little classical accounts of the semantics of addresses. A shot.

What are addresses? Of course, first come the standard postal addresses we’re all accustomed to:

Tate Modern
Bankside
London SE1 9TG
England

 

Other addresses, in a broad sense, could be:

52°22’08.07” N 4°52’53.05” E (The dining table on my roof terrace, in case you ever want to drop by. I suggest however, for the outdoors dining table, to come in late spring or summer.)

e2, g4, a8 etc. on a chess board

The White House (if further unspecified, almost anyone would assume the residence of the President of the United States)

(3, 6) (in some x, y coordinate system)

Room 106 (if we are together in some building)

//Myserver/Theatre/Antigone.html

128.30.52.47

Addresses are a lot like names – they are words, or phrases, which point to things in the real world. They enable us to identify things, and to refer to things – like names. ‘I just went to the van Gogh Museum‘ – ‘I was in the Paulus Potterstraat 7 in Amsterdam‘ – pretty similar, isn’t it?
So what makes addresses different from names, semantically? The first thing which springs to mind is ordinary names are opaque, and addresses are not. Addresses contain a system of directions, often but not always, hierarchical. In other words: there is information in parts of addresses, whereas parts of names do not contain useful information. From my postal address you can derive the city where I live, the country, the street. From chess notations and (geo-)coordinates one can derive the position on two (or more) axes. So addresses contain useful information within them, and names for the most part do not.

This is not completely true – names do contain some informative parts – from ‘Marc de Graauw’ you can derive that I belong to the ‘de Graauw’ family, and am member ‘Marc’ of it, but this does not function the way addresses do – it is not: go to the collection ‘de Graauw’ and pick member ‘Marc’. On a side note, though ‘de Graauw’ is an uncommon last name even in the Netherlands, I know at least one other ‘Marc de Graauw’ exists, so my name is not unique (the situation could have been worse though). I don’t even know whether my namesake is part of my extended family or not, so ‘looking up’ the ‘de Graauw’ family is not even an option for me.

Unique names or identifiers are usually even more opaque than natural names – my social security number does identify me uniquely in the Dutch social security system, but nothing can be derived from its parts other than a very vague indication of when it was established. So even when names contain some information within their parts, it is not really useful in the sense that it doesn’t establish much – not part of the location, or identity, or reference. The parts of addresses do function as partial locators or identifiers, the parts of names provide anecdotal information at best.

Names and addresses are fundamentally different when it comes to opacity. What else? Ordinary names – certainly unique names – denote unmediated, they point directly to an individual. Addresses denote mediated, they use a system of coordination to move step-by-step to their endpoint. Addressing systems are set up in such a way they provide a drilling-down system to further and further refine a region in a space until a unique location is denoted. Addresses are usually unique in their context, names sometimes are, and sometimes not. So, e4 denotes a unique square on a chess board, and my postal address a unique dwelling on Earth. The name ‘Amsterdam’ does denote a unique city if the context is the Netherlands, but my name does not denote a unique individual. So addresses pertain to a certain space, where a certain system of directives applies.

Addresses do not denote things, they denote locations. My postal address does not denote my specific house: if we tear it down and build another, the address does not change. e4 does not denote the pawn which stands there, it denotes a square on a chess board, whatever piece is there. So addresses do not denote things, but slots for things. Addresses uniquely denote locations, in a non-opaque, mediated way. If we use ‘name’ in a broad sense, where names can be non-opaque, we could say: addresses are unique names for locations in a certain space.

Names Addresses
Can identify identify
Can refer refer
Denote directly mediated
Point into the world a space
Denote things slots
Are opaque not opaque

Where does this leave us with URI’s? It’s quite clear URL’s (locator URI’s) are addresses. Looking at a URL like http://www.w3.org/2001/tag/doc/URNsAndRegistries-50.html#loc_independent , this tells us a lot:

1) this is the http part of uri space we’re looking at,

2) this is on host www.w3.org

3) the path (on this host) to the resource is 2001/tag/doc/URNsAndRegistries-50.html

4) and within this, I’m pointing to fragment #loc_independent

So URL’s fulfill all conditions of addresses. They are not opaque. Their parts contain useful information. Their parts – schema, authority, path etc. – provide steps to the URL’s destiny – the resource it points to. The identify, they refer, like names. No, URL’s are not addresses of files on file systems on computers, not addresses in this naive sense. But URL’s are addresses in URI space. HTTP URI’s are names of locations in HTTP space. Semantically, URL’s are addresses – at least. Whether URL’s can be names too is another question.

Do we have to know we know to know?

John Cowan wrote ‘Knowing knowledge‘ a while ago, about what it means to know something. His definition (derived from Nozick) is:

‘The following four rules explain what it is to know something. X knows the proposition p if and only if:

  1. X believes p;
  2. p is true;
  3. if p weren’t true, X wouldn’t believe it;
  4. if p were true, X would believe it.’

This raises an interesting question. A common position of religious people (or at least religious philosophers) is: ‘I believe in the existence of God, but I cannot know whether God exists’. God’s existence is a matter of faith, not proof. I don’t hold such a position myself, but would be very reluctant to denounce it on purely epistemological grounds.

Now if we suppose for the sake of the argument that God does in fact exist, and that the religious philosopher, X, would not have believed in the existence of God in case God would not have existed (quite coherently, since typically in such views nothing would have existed without God, so no one would have believed anything). Our philosophers’ belief would satisfy the above four criteria. Yet, could we say ‘X knows p’, when X himself assures us he does not know whether p is true? In other words: doesn’t knowing something presuppose the knower would be willing to assert knowing his or her knowledge?