Axioms of Versioning 2

I’ve written a new version of ‘Axioms of Versioning‘. I extended the formalization to get a grasp of the concept of ‘Semantic Backward Compatibility in HL7v3, which I believe is flawed (quote: “Objective of backward model compatibility is that a receiver expecting an ‘old’ version will not misinterpret content sent from a new version”). It seems to be the reverse of the position of the W3C TAG in ‘Extending and Versioning Languages: Terminology‘, and the position I would defend myself. Yet the interaction of new senders with old receivers was not sufficiently explored in my Axioms.

It turns out that exploration of this notion leads to quite natural definitions of ‘may ignore’ and ‘must understand’ semantics. The HL7v3 notion is probably best characterized by the concept of ‘partial semantical forward compatibility’ in my new Axioms. The concept is also close to, if not the same as, the TAG’s ‘Partial Understanding‘.

It really thrilled me to see how helpful my formalisms were in exploring the notions in HL7v3, and uncovering the – I think – hidden meaning in it.

Making All Changes Compatible over Multiple Versions – Part 2

In the previous article I showed how to add an optional element and make this a compatible change over two releases, so existing receivers aren’t broken by unknown elements. It is done by following two simple principles:

  1. have one or more intermediate releases;
  2. use different schema’s in intermediate releases for senders and receivers.

This strategy works well in environments where ‘Big Bang’ upgrades are undesirable and there is control over the number of different releases ‘in the field’. This applies to the Dutch national EHR infrastructure, for which I work, and probably for most exchanges between companies and/or governments.The same can be done for any kind of change. Consider making an optional element required: we have an existing

Order V1:

  • customer-id 1..1
  • name 0..1
  • order-line 1..n

with an optional ‘name’ element, and we want:

Order V2:

  • customer-id 1..1
  • name 1..1
  • order-line 1..n

Upgrading some receiving nodes to V2 would break them once they receive orders without names. If we use an intermediate release, there are no problems:

Sender 2 cannot break receiver 1. Once all parties are at V2 at least, receivers can start to upgrade to V3. If the level of control is such that there are no more than two versions allowed simultaneously, this is sufficient – else we’ll just have to have some more intermediate releases.

There are more use cases. Consider code lists: in V1, we allow the Netherlands and the USA as countries. In the next release, we want to trade with Ireland as well:

If we upgrade in two steps, every subsequent pair of releases is fully compatible. No sender can break a receiver. The same if we remove old codes:

It’s as easy to increase maximum cardinality:

Some changes take more than one intermediate release, such as adding a required element where there was none:

Sometimes we change senders first, other times receivers. It turns out there is a simple rule behind this: if the change from V1 to the final version is – by itself – backward compatible, we change receivers first. If the change is forward compatible, we change senders first.

Backward compatible changes (where new receivers can handle content from old senders) are:

  • making required elements optional;
  • introducing new optional elements;
  • increasing maximum cardinality (from 1 to n, or 0 to n, or 1 to 2);
  • adding code values;

and in general all changes which allow more document instances.
Forward compatible changes (where old receivers can handle content from new senders) are:

  • making optional elements required;
  • removing optional elements;
  • decreasing maximum cardinality (from 1 to n, or 0 to n, or 1 to 2);
  • removing code values;

and in general all changes which allow less document instances.

All patterns are reversible – if one way they describe a backward compatible change over multiple versions, then the other way they describe a forward they describe a forward compatible change and vice versa. In a next post I’ll look at some advantages and disadvantages of this approach and other approaches, such as using <xs:any> schema elements.

Making All Changes Compatible over Multiple Versions – Part 1

This article will show a way to make any change in an XML-based exchange vocabulary compatible over several versions of the exchange language. In environments where there is some control over the number of versions ‘in the field’ this is highly useful. In the Dutch Electronic Health Record exchange for instance we target a maximum of subsequent two versions in the field at any time, and so does the British NHS for their EHR. I guess – for different reasons – we will have to be bit more lenient than two versions, but the environment is still under control. It’s not uncommon in messaging scenario’s to have some control over the number over versions ‘out there’ (as opposed to publishing, where one often has little idea of what versions still exist).

Suppose we have a message version V1, a basic order with a customer id and one or more order lines. In V2 we want to add a name, since the customer may have different contacts for different orders, and needs a way to indicate this. So the V2 message looks like this:

Order V2:

  • customer-id 1..1
  • name 0..1
  • order-line 1..n

As you see, I will indicate minimum and maximum cardinality on all elements. For instance, name occurs zero times minimum and once maximum, or in other words, name is optional. The elements themselves, of course, may be complex in this example. I will add one convenient shorthand for this article, zero minimum cardinality:

Order V1:

  • customer-id 1..1
  • name 0..0
  • order-line 1..n

V1 order contain a minimum of zero names, and a maximum of zero names. In other words, V1 orders cannot contain names. This is not a common construct in schema languages, but it will help this small exposure.

If we do nothing beyond this, V2 senders will break V1 receivers: their schema’s cannot handle the unknown ‘name’ element and either validation will fail or their receiving software might break. There is no way to know without knowing all implementations, which is not doable in a field with thousands of participants. So the existing implementations will break: new senders will send unexpected content to old receivers.

One way of solving the problem is “ignoring unknown content” like HTML does – but here we’ll look at another approach. Instead of moving from version 1 to version 2, we’ll define an intermediate version with different schema’s for senders and receivers:

I’ve simplified the content, focusing on the important ‘name’ element. In version 2, we move to the intermediate situation: senders are unchanged, but receivers are upgraded to accept names. Only in version 3 can senders send names. And all of a sudden all subsequent versions can communicate with each other. If we combine this with the rule that no more than two versions co-exist at the same time, compatibility remains intact. If we allow co-existence of three or more versions at the same time, the principle remains, we’ll just need some more intermediate versions.

In the next part we’ll look at how to use this approach for all changes, not just adding optional content.