Mark Baker misses an important distinction in “Validation Considered Harmful” when he writes:
“Today’s sacred cow is document validation, such as is performed by technologies such as DTDs, and more recently XML Schema and RelaxNG.
Surprisingly though, we’re not picking on any one particular validation technology. XML Schema has been getting its fair share of bad press, and rightly so, but for different reasons than we’re going to talk about here. We believe that virtually all forms of validation, as commonly practiced, are harmful; an anathema to use at Web scale.”
Dare Obasanjo replied in “Versioning does not make validation irrelevant“:
“Let’s say we have a purchase order format which in v1 has a element which can have a value of "U.S. dollars" or "Canadian dollars" then in v2 we now support any valid currency. What happens if a v2 document is sent to a v1 client? Is it a good idea for such a client to muddle along even though it can't handle the specified currency format?"
to which Mark replied:
“No, of course not. As I say later in the post; ‘rule of thumb for software is to defer checking extension fields or values until you can’t any longer'”
With software the most important point is whether the data sent ends up with a human, or ends up in software – either to be stored in a database for possible later retrieval, or is used to generate a reply message without human intervention. Humans can make sense of unexpected data: when they see “Euros” where “EUR” was expected, they’ll understand. Validating as little as possible makes sense there. When software does all the processing, stricter validation is necessary – trying to make software ‘intelligent’ by enabling it to process (not just store, but process) as-yet-unknown format deviations is a road to sure disaster. So in the latter case stricter validation makes a lot of sense – we accept “EUR” and “USD”, not “Euros”. And if we do that, the best thing for two parties who exchange anything is to make those agreements explicit in a schema. If we “defer checking extension fields or values until you can’t any longer” we end up with some application’s error message. You don’t want to return that to the partner who sent you a message – you’ll want to return “Your message does not validate against our agreed-upon schema”, so they know what to fix (though sometimes you’ll want your own people to look at it first, depending on the business case).
Of course one should not include unnecessary constraints in schema’s – but whether humans or machines will process the message is central in deciding what to validate and what not.
Another point is what to validate – values in content or structure, and Uche Ogbuji realistically adds:
“Most forms of XML validation do us disservice by making us nit-pick every detail of what we can live with, rather than letting us make brief declarations of what we cannot live without.”
Yes, XML Schema and others make structural requirements which impose unnecessary constraints. Unexpected elements often can be ignored, and this enhances flexibility.