Syntactical and Semantical Compatibility

In a previous post I summarized some concepts from David Orchard’s W3C TAG Finding ‘Extending and Versioning Languages‘. Now I’ll make things complicated and talk about syntactical and semantical compatibility.

When I send you a message we can distinguish syntactical compatibility and semantical compatibility. We have syntactical compatibility when I send you a message and you can parse it – there is nothing in the content which makes you think: I cannot read this text. Semantical compatibility is about more than just reading: you need to the understand the meaning of what I’m sending you. Without syntactical compatibility, semantical compatibility is impossible. With syntactical compatibility, semantical compatibility is not guaranteed, it comes as an extra on top of syntactical compatibility.

Semantical compatibility is kind of the Holy Grail in data exchanges. Whenever two parties exchange data, there is bound to be a moment when they find they haven’t truly understood each other. To give just one example from real life: two parties exchanged data on disabled employees who (temporarily) could not work. (In the Netherlands, by law this involves labour physicians as well as insurance companies.) After exchanging data for quite a while, they found out that when they exchanged a date containing the end of the period of disability, one party sent the last day of disability, while the other expected the first working day. Just one day off, but the consequences can significantly add up when insurance is involved…

There is something funny about the relation between syntactical/semantical compatibility and backward/forward compatibility. Remember, backward compatibility is about your V2 word processor being able to handle V1 documents, or your HTML 8.0 aware browser being able to read HTML 7.0 documents. Now if this new application reads and presents the old document, we expect everything to be exactly as it was in the older application. So a HTML 7.0 <b> tag should render text bold; if the HMTL 8.0 browser does not display it this way, we do not consider HTML 8.0 (or the browser) truly backward compatible. In other words, of backward compatible applications we expect both syntactical and semantical backward compatibility: we expect the newer application not just to read the old documents, but we expect new applications to understand the meaning of old documents as well.
Forward compatibility is different. Forward compatibility is the ability of the n-th application to read n+1 documents. So a HTML 7.0 browser, when rendering a HTML 8.0 document, should not crash or show an error, but show the HTML 8.0 as far as possible. Of course, no one can expect HTML 8.0 tags to be processed by the HTML 7.0 browser, but all HTML 7.0 tags should be displayed as before, HTML 8.0 tags should be ignored. In other words, of forward compatible applications we expect syntactical, but not semantical forward compatibility.

This brings to light the key characteristic of forward compatibility: it is the ability to accept unknown syntax, and ignore its semantics. It is reflected in the paradigm: Must-Ignore-Unknown. There is a well-known corollary to this: Must-Understand. Must-Understand flags are constructs which force an application to return an error when they do not understand the thus ‘flagged’ content. Where Must-Ignore-Unknown is a directive which forces semantics of unknown constructs to be ingnored, Must-Understand flags do the reverse: they force the receiver to either understand the semantics (get your meaning) or reject the message.

When we make applications which accept new syntax (to a degree) and ignore their semantics, we make forward compatible applications. Of backward compatibility, we expect it all.

On Compatibility – Back and Forth

Compatibility in data exchanges is very different from – and much more complex than – compatibility in traditional software. David Orchard has written extensively about it in a W3C TAG Finding, ‘Extending and Versioning Languages‘. I’ve given some thought to a few concepts in his piece.

First of all, David distinguishes producers and consumers of documents. This complicates the traditional compatibility picture. With say a word processor things are simple – if your V2 word processor can read documents created with your V1 word processor, the V2 processor is backward compatible with the V1 processor. If the V1 processor can read V2 documents, it is forward compatible with V2. Of course no one will buy a V2 word processor, write documents, and then buy a V1 word processor and try to access the V2 documents, so this is uncommon in traditional software applications.

The producer/consumer distinction complicates things. All of sudden forward compatibility matters a lot. There are two possible situations:

  • an old producer sends a document to a new consumer; if the consumer needs to read it, the consumer should be backward compatible with the older producer;
  • a new producer sends a document to an older consumer; if the consumer needs to read it, the consumer should be forward compatible with the newer producer.

Since there is no universal preferred viewpoint, both back- and forward compatibility are equally important. With preferred viewpoint I mean producers nor consumers are inherently more important than the other. In traditional software, one can assume consumers to be newer than or equal to producer versions: you may try to read old documents with new software, but the reverse is seldom the case without documents being exchanged. However, sometimes there is a preferred viewpoint. In the automobile industry a few huge car makers dominate the market, with many many smaller part suppliers competing for the orders of the few big buyers. The same in consumer goods, where a few large supermarket chains dominate a market with many more food producers. In such cases, the big ones can call the shots and dictate which versions are permissable and which not. The suppliers can comply or go bust. Such an unequality may make backward or forward compatibility more important – depending on whether messages go towards or away from the big shots.

The next question is how to achieve both forms of compatibility. Backward compatibility is relatively easy to achieve: new consumers should support all document features which older producers could make (and which older consumers could read). Forward compatibility is more difficult: the old consumers should be able to process content which they do not yet know about. It can be done by ignoring all unknown content (as in HTML), or having special places in your documents where unknown (read: newer) content is allowed. In XML Schema this can be done with wildcards, special constructs which allow any kind of content where the XML Schema ‘any’ tag occurs (it can be bit more complicated than that in real life, though).This summarizes the main problem area. In a few follow-up posts I’ll explore some of these notions in more detail.