Renting a coder

Dan Appleman and Jeff Atwood both wrote a blog “Can You Really Rent a Coder?” about coding auctions such as  ElanceRentACoder and guru.com, which make coders bid on projects worldwide. Typically a lot of small, low-paid projects appear there and those sites have had their share of bad comments.

But they can work, just follow these rules:
– if you live in a rich country, be a buyer, not a bidder;
– subcontract parts of your projects, tell your customer, and tell them you can charge less because of this: they’ll appreciate you for it;
– subcontract technical parts only: a Russian will understand HTTP or XML, but not Dutch law;
– expect to spend a lot of time on specs and communication;
– use a few iterations to polish out parts where your specs were unclear or misinterpreted;
– test a lot and inspect the source code;
– never go for the lower bids: they’re always crap;
– build a longer relationship with coders you have good experience with, give them bonuses and follow-up projects.

Eat Your Own Dogfood

The Dutch government [1] is propagating accessible websites – a truly laudable initiative (basically a Dutch application of the Web Content Accessibility Guidelines 1.0). Their website itself though commits one of the worst web design crimes right in the very Web Guidelines themselves: they are available in … not HTML or XHTML as one would expect, but PDF, MS Word and OpenOffice (though there is a short summary in HTML, but that lacks all the details).

Let’s look up some authority such as usability guru Jakob Nielsen‘s Top Ten Mistakes in Web Design. Indeed, in 2007 “PDF files for Online Reading” is still there as Mistake #2. Or to quote some other authority, the Dutch Web Guidelines themselves: “Checkpoint 11.1: Use W3C technologies when they are available and appropriate for a task and use the latest versions when supported.” (Apologies for not being able to link to that Checkpoint, PDF does not allow me to do that.) I don’t think PDF, MS Word or even OO qualify as W3C technology. (X)HTML is perfectly adequate for this publication – one can always provide an alternative PDF for printing if desired. And HTML of course allows linking to, bookmarking, consistent navigation yada yada yada, you know, the stuff that made the Web great.

They do provide the Guidelines in both English and Dutch. But the Dutch version is only in MS Word – sigh…

The initiative is great, especially since it qualifies websites and thus truly propagates accessible web sites. And they have a nice test tool. (My site, which is plain WordPress, scores a meagre 38/45. I’ll have to check that.) But please, do it right next time. Eat your own dogfood and use HTML.

[1] Technically drempelvrij.nl is an independent foundation, in practice the Dutch government is a major stakeholder. For their internal guidelines the government does much better, there they have everything in HTML next to PDF.

Implementing Healthcare Messaging with XML

Last week Monday I gave a presentation on XML 2007 titled “Implementing Healthcare Messaging with XML” for a very attentive and responsive audience, chaired by Tony Coates. David Orchard and Glen Daniels of multiple WS-standards were there, and I had an interesting chat with them afterwards on the layering problems of WSDL mentioned in my presentation. Jon Bosak inquired about ebXML – which we hadn’t used because it did not seem to get any traction from IBM and Microsoft at the time. With hindsight, looking at what ebXML (ebMS specifically) delivered years ago and the time the WS-* stack took, one wonders whether this was such a wise decision… Anyway, it was great to have such a responsive crowd.

Axioms of Versioning 2

I’ve written a new version of ‘Axioms of Versioning‘. I extended the formalization to get a grasp of the concept of ‘Semantic Backward Compatibility in HL7v3, which I believe is flawed (quote: “Objective of backward model compatibility is that a receiver expecting an ‘old’ version will not misinterpret content sent from a new version”). It seems to be the reverse of the position of the W3C TAG in ‘Extending and Versioning Languages: Terminology‘, and the position I would defend myself. Yet the interaction of new senders with old receivers was not sufficiently explored in my Axioms.

It turns out that exploration of this notion leads to quite natural definitions of ‘may ignore’ and ‘must understand’ semantics. The HL7v3 notion is probably best characterized by the concept of ‘partial semantical forward compatibility’ in my new Axioms. The concept is also close to, if not the same as, the TAG’s ‘Partial Understanding‘.

It really thrilled me to see how helpful my formalisms were in exploring the notions in HL7v3, and uncovering the – I think – hidden meaning in it.

Making All Changes Compatible over Multiple Versions – Part 2

In the previous article I showed how to add an optional element and make this a compatible change over two releases, so existing receivers aren’t broken by unknown elements. It is done by following two simple principles:

  1. have one or more intermediate releases;
  2. use different schema’s in intermediate releases for senders and receivers.

This strategy works well in environments where ‘Big Bang’ upgrades are undesirable and there is control over the number of different releases ‘in the field’. This applies to the Dutch national EHR infrastructure, for which I work, and probably for most exchanges between companies and/or governments.The same can be done for any kind of change. Consider making an optional element required: we have an existing

Order V1:

  • customer-id 1..1
  • name 0..1
  • order-line 1..n

with an optional ‘name’ element, and we want:

Order V2:

  • customer-id 1..1
  • name 1..1
  • order-line 1..n

Upgrading some receiving nodes to V2 would break them once they receive orders without names. If we use an intermediate release, there are no problems:

Sender 2 cannot break receiver 1. Once all parties are at V2 at least, receivers can start to upgrade to V3. If the level of control is such that there are no more than two versions allowed simultaneously, this is sufficient – else we’ll just have to have some more intermediate releases.

There are more use cases. Consider code lists: in V1, we allow the Netherlands and the USA as countries. In the next release, we want to trade with Ireland as well:

If we upgrade in two steps, every subsequent pair of releases is fully compatible. No sender can break a receiver. The same if we remove old codes:

It’s as easy to increase maximum cardinality:

Some changes take more than one intermediate release, such as adding a required element where there was none:

Sometimes we change senders first, other times receivers. It turns out there is a simple rule behind this: if the change from V1 to the final version is – by itself – backward compatible, we change receivers first. If the change is forward compatible, we change senders first.

Backward compatible changes (where new receivers can handle content from old senders) are:

  • making required elements optional;
  • introducing new optional elements;
  • increasing maximum cardinality (from 1 to n, or 0 to n, or 1 to 2);
  • adding code values;

and in general all changes which allow more document instances.
Forward compatible changes (where old receivers can handle content from new senders) are:

  • making optional elements required;
  • removing optional elements;
  • decreasing maximum cardinality (from 1 to n, or 0 to n, or 1 to 2);
  • removing code values;

and in general all changes which allow less document instances.

All patterns are reversible – if one way they describe a backward compatible change over multiple versions, then the other way they describe a forward they describe a forward compatible change and vice versa. In a next post I’ll look at some advantages and disadvantages of this approach and other approaches, such as using <xs:any> schema elements.

Making All Changes Compatible over Multiple Versions – Part 1

This article will show a way to make any change in an XML-based exchange vocabulary compatible over several versions of the exchange language. In environments where there is some control over the number of versions ‘in the field’ this is highly useful. In the Dutch Electronic Health Record exchange for instance we target a maximum of subsequent two versions in the field at any time, and so does the British NHS for their EHR. I guess – for different reasons – we will have to be bit more lenient than two versions, but the environment is still under control. It’s not uncommon in messaging scenario’s to have some control over the number over versions ‘out there’ (as opposed to publishing, where one often has little idea of what versions still exist).

Suppose we have a message version V1, a basic order with a customer id and one or more order lines. In V2 we want to add a name, since the customer may have different contacts for different orders, and needs a way to indicate this. So the V2 message looks like this:

Order V2:

  • customer-id 1..1
  • name 0..1
  • order-line 1..n

As you see, I will indicate minimum and maximum cardinality on all elements. For instance, name occurs zero times minimum and once maximum, or in other words, name is optional. The elements themselves, of course, may be complex in this example. I will add one convenient shorthand for this article, zero minimum cardinality:

Order V1:

  • customer-id 1..1
  • name 0..0
  • order-line 1..n

V1 order contain a minimum of zero names, and a maximum of zero names. In other words, V1 orders cannot contain names. This is not a common construct in schema languages, but it will help this small exposure.

If we do nothing beyond this, V2 senders will break V1 receivers: their schema’s cannot handle the unknown ‘name’ element and either validation will fail or their receiving software might break. There is no way to know without knowing all implementations, which is not doable in a field with thousands of participants. So the existing implementations will break: new senders will send unexpected content to old receivers.

One way of solving the problem is “ignoring unknown content” like HTML does – but here we’ll look at another approach. Instead of moving from version 1 to version 2, we’ll define an intermediate version with different schema’s for senders and receivers:

I’ve simplified the content, focusing on the important ‘name’ element. In version 2, we move to the intermediate situation: senders are unchanged, but receivers are upgraded to accept names. Only in version 3 can senders send names. And all of a sudden all subsequent versions can communicate with each other. If we combine this with the rule that no more than two versions co-exist at the same time, compatibility remains intact. If we allow co-existence of three or more versions at the same time, the principle remains, we’ll just need some more intermediate versions.

In the next part we’ll look at how to use this approach for all changes, not just adding optional content.

Axioms of Versioning

Obsolete, please see the latest version

Version 1

An attempt to define syntactical and semantical compatiblity of versions in a formal way. Much derives from the writings of David Orchard, especially the parts on syntactical forward and backward compatibility (though my terminology differs).

  1. Let U be a set of (specifications of) software processable languages {L1, L2, … Ln}
    1. This axiomatization does not concern itself with natural language
  2. The extension of a language Lx is a set of conforming document instances ELx = {Dx1, Dx2, … Dxn}
    1. Iff ELx = ELy then Lx and Ly are extensionally equivalent
      • Lx and Ly may still be different: they may define different semantics for the same syntactical construct
    2. Iff ELx ? ELy then Lx is an extensional sublanguage of Ly
    3. Iff ELx ? ELy then Lx is an extensional superlanguage of Ly
    4. D is the set of all possible documents; or the union of all ELx where Lx ? U
  3. For each Lx ? U there is a set of processors Px = {Px1, Px2, … Pxn} which implement Lx
    1. Each Pxy exhibits behaviour as defined in Lx
    2. Processors can produce and consume documents
    3. Each Pxy produces only documents it can consume itself
    4. At consumption, Pxy may accept or reject documents
  4. The behaviour of a processor Pxy of language Lx is a function Bxy
    1. The function Bxy takes as argument a document, and its value is a processor state
      • We assume a single processor state before each function execution, alternatively we could assume a <state, document> tuple as function argument
    2. If for two processors Pxy and Pxz for language Lx for a document d Bxy(d) = Bxz(d) then the two processors behave similar for d
      • Two processor states for language Lx are deemed equivalent if a human with thorough knowledge of language specification Lx considers the states equivalent. Details may vary insofar as the language specification allows it.
      • Processor equivalence is not intended to be formally or computably decidable; though in some cases it could be.
    3. If ?d ( d ? ELx ? Bxy(d) = Bxz(d) ) then Pxy and Pxz are behaviourally equivalent for Lx
      • If two processors behave the same for every document which belongs to a language Lx, the processors are behaviourally equivalent for Lx.
  5. An ideal language specifies all aspects of desired processor behaviour completely and unambiguously; assume all languages in U are ideal
  6. A processor Px is an exemplary processor of a language Lx if it fully implements language specification Lx; assume all processors for all languages in U are exemplary
    1. Because they are (defined to be) exemplary, every two processors for a language Lx are behaviourally equivalent
    2. ELx = { d is a document ? Px accepts d }
    3. The complement of ELx is the set of everything (normally, every document) which is rejected by Px
    4. The make set MLx = { d is a document ? Px can produce d }
  7. A language Lx is syntactically compatible with Ly iff MLx ? ELy and MLy ? ELx
    • Two languages are syntactically compatible if they accept the documents produced by each other.
    1. A language Ln+1 is syntactically backward compatible with Ln iff MLn ? ELn+1 and Ln+1 is a successor of Ln
      • A language change is syntactically backward compatible if a new receiver accepts all documents produced by an older sender.
    2. A language Ln is syntactically forward compatible with Ln+1 iff MLn+1 ? ELn and Ln+1 is a successor of Ln
      • A language change is syntactically forward compatible if an old receiver accepts all documents produced by a new sender.
  8. A document d can be a member of the extension of any number of languages
    1. Px is an (exemplary) processor of Lx, Py is an (exemplary) processor of language Ly
    2. Two langauges Lx and Ly are semantically equivalent iff ELx = ELy ? ?d ( (d ? ELx ) ? Bx(d) = By(d) )
      • If two languages Lx and Ly take the same documents as input, and their exemplary processors behave the same for every document, the languages are semantically equivalent.
      • Two languages can only be compared if their exemplary processors are similar enough to be compared.
      • Not every two languages can be compared.
      • “Semantic” should not be interpreted in the sense of “formal semantics”.
  9. The semantical equivalence set of a document d for Lx = { y ? ELx | Bx(d) = Bx(y) }
    1. Or: SLx,d = { y ? ELx | Bx(d) = Bx(y) }
      • The semantical equivalence set of a document d is the set of documents which make a processor behave the same as d
      • Semantical equivalence occurs for expressions which are semantically equivalent, such as i = i + 1 and i += 1 for C, or different attribute order in XML etc.
    2. d ? SLx,d
    3. Any two semantical equivalence sets of Lx are necessarily disjunct
      • If z ? SLx,e were also z ? SLx,d then every member of SLx,e would be in SLx,d and vice versa and thus SLx,d = SLx,e
  10. A language Ly is a semantical superlanguage of Lx iff ?d ( d ? MLx ? By(d) = Bx(d) )
    1. For all documents produced by Px, Py behaves the same as Px
      • Equivalence in this case should be decided based on Lx; if Ly makes behavioural distinctions which are not mentioned in Lx, behaviour is still the same as far as Lx is concerned
    2. It follows: ?d ( d ? MLx ? ?SLy,d ( SLy,d ? ELy ? ( SLx,d ? MLx ) ? SLy,d ? By(d) = Bx(d) ) )
      • For any document produced by Px, the part of its semantical equivalence set which Px can actually produce, is a subset of the semantical equivalence set of Py for this document
    3. For all d ? ELx ? d ? MLx there may be many equivalence sets in Ly for which By(d) ? Bx(d)
      • In other words: for documents accepted but not produced by Px, Ly may define additional behaviours
    4. Lx is a semantical sublanguage of Ly iff Ly is a semantical superlanguage of Lx
  11. A language Ln+1 is semantically backward compatible with Ln iff Ln+1 is a semantical superlanguage of Ln and Ln+1 is a successor of Ln
    1. An old sender may expect a newer, but semantically backward compatible, receiver to behave as the sender intended
    2. A language Ln is semantically forward compatible with Ln+1 iff Ln+1 is a semantical sublanguage of Ln and Ln+1 is a successor of Ln
    3. Semantic forward compatibility is only possible if a language loses semantics; i.e. it’s processors exhibit less functionality, and produce less diverse documents
    4. A processor cannot understand what it does not know about yet

SOAP over REST

Suppose I want to order 100,000 pieces of your newest, ultra-sleek geek gadgets. We negotiate the price etc., and you send me a proposed contract. I agree, and return the contract. Blessed with a healthy skepticism towards all new technologies, we decide to transfer all documents on paper, and since the contract is very important to both of us, I return it using the most trusted courier service available, with parcel-tracking and armored trucks and all. Yet I do not sign the contract. Will you honor it and send me the goods? I doubt it. Yet this is the level of protection HTTPS offers.

With REST, based on the workings of the Web, HTTPS is the standard choice for safe transport. Yet HTTPS only secures the transport, the pipe. Once a message is delivered on the other end, it is simple text, or xml, or whatever format we choose again. Of the signatures used to establish the secure session, nothing remains with the message. We can use client certificates, so both server and client authenticate themselves, but is still only for the pipe, not for the messages. What you want for real contracts are message signatures.

There are several options in REST to solve the problem. One of them is to simply hijack the WS-Security spec of the WS-* stack. Add a soap:Envelope element with the appropriate wss headers to the contract message, and send the resulting xml in a RESTful way to the other party. Maybe this is not 100% WS-Security compliant and there are some dependencies on SOAP or WSDL or other WS-* specs which we do not honor (and maybe not, I haven’t combed the spec for it), but hey, if we squint enough that shouldn’t be much of a practical problem.

Such a coupling of REST and appropriate WS-* specs does seem promising – unless one is tightly in the WS-*-is-evil-by-default camp. It has an immediate consequence: there is almost nothing WS-* can do and REST cannot do – safe travel over non-HTTP connections and a few others. Bill de hÓra wrote: “And do not be surprised to see specific WS-* technologies and ideas with technical merit, such as SAML and payload encryption, make an appearance while the process that generated them is discarded.”

There is a general lesson to be extracted as well: if something belongs with the payload, store it in the payload. HTTP headers are fine for transport, eh, transfer headers but not for anything which inherently belongs with the message payload. HTTP headers should be discardable after the HTTP method completes. Rule of thumb: if you want to keep it after reception, payload header. If not, HTTP header.

When REST advantages weigh less…

There are two interesting posts by Stefan Tilkov WS-* Advantages and Stuart Charlton What are the benefits of WS-* or proprietary services? on when to use WS-* instead of REST.

Stu writes: “When you want a vendor independent MOM for stateful in-order, reliable, non-idempotent messages, and don’t have time or inclination to make your data easily reused… ”

We could reverse this argument: when do the advantages of REST (caching, linking and bookmarking to name some) matter less? For one of my customers I design part of the Dutch national healthcare exchange, which is used to exchange patient data between care providers. Nearly all messages involved include the patient id: therefore most messages are pretty unique, and tied to a particular care context: say a patient visits his GP, or collect medication from his apothecary. In such exchanges, caching doesn’t matter at all. It is possible some data (a patients medication history) is retrieved twice when the patient visits two doctors after another, but in general in such an infrastructure it’s better to simple turn off caching, GET the data twice in the outlier cases and not be bothered by the overhead involved in caching.

It seems to me a lot of business exchanges (say order/invoice such as UBL does) share this property of mostly unique messages, whereas cases such as Google or Amazon APIs clearly will benefit a lot from caching. The distinction is between messaging (sending letters) and publishing (newspapers).
I’m not advocating REST or WS-* here for any particular application, but thinking about where the benefits of REST matter most is another way of thinking choosing technologies. For publishing, REST with all the optimizations of GET is the option to look at first. For messaging it’s less obvious where to start.