SOAP over REST

Suppose I want to order 100,000 pieces of your newest, ultra-sleek geek gadgets. We negotiate the price etc., and you send me a proposed contract. I agree, and return the contract. Blessed with a healthy skepticism towards all new technologies, we decide to transfer all documents on paper, and since the contract is very important to both of us, I return it using the most trusted courier service available, with parcel-tracking and armored trucks and all. Yet I do not sign the contract. Will you honor it and send me the goods? I doubt it. Yet this is the level of protection HTTPS offers.

With REST, based on the workings of the Web, HTTPS is the standard choice for safe transport. Yet HTTPS only secures the transport, the pipe. Once a message is delivered on the other end, it is simple text, or xml, or whatever format we choose again. Of the signatures used to establish the secure session, nothing remains with the message. We can use client certificates, so both server and client authenticate themselves, but is still only for the pipe, not for the messages. What you want for real contracts are message signatures.

There are several options in REST to solve the problem. One of them is to simply hijack the WS-Security spec of the WS-* stack. Add a soap:Envelope element with the appropriate wss headers to the contract message, and send the resulting xml in a RESTful way to the other party. Maybe this is not 100% WS-Security compliant and there are some dependencies on SOAP or WSDL or other WS-* specs which we do not honor (and maybe not, I haven’t combed the spec for it), but hey, if we squint enough that shouldn’t be much of a practical problem.

Such a coupling of REST and appropriate WS-* specs does seem promising – unless one is tightly in the WS-*-is-evil-by-default camp. It has an immediate consequence: there is almost nothing WS-* can do and REST cannot do – safe travel over non-HTTP connections and a few others. Bill de hÓra wrote: “And do not be surprised to see specific WS-* technologies and ideas with technical merit, such as SAML and payload encryption, make an appearance while the process that generated them is discarded.”

There is a general lesson to be extracted as well: if something belongs with the payload, store it in the payload. HTTP headers are fine for transport, eh, transfer headers but not for anything which inherently belongs with the message payload. HTTP headers should be discardable after the HTTP method completes. Rule of thumb: if you want to keep it after reception, payload header. If not, HTTP header.

When REST advantages weigh less…

There are two interesting posts by Stefan Tilkov WS-* Advantages and Stuart Charlton What are the benefits of WS-* or proprietary services? on when to use WS-* instead of REST.

Stu writes: “When you want a vendor independent MOM for stateful in-order, reliable, non-idempotent messages, and don’t have time or inclination to make your data easily reused… ”

We could reverse this argument: when do the advantages of REST (caching, linking and bookmarking to name some) matter less? For one of my customers I design part of the Dutch national healthcare exchange, which is used to exchange patient data between care providers. Nearly all messages involved include the patient id: therefore most messages are pretty unique, and tied to a particular care context: say a patient visits his GP, or collect medication from his apothecary. In such exchanges, caching doesn’t matter at all. It is possible some data (a patients medication history) is retrieved twice when the patient visits two doctors after another, but in general in such an infrastructure it’s better to simple turn off caching, GET the data twice in the outlier cases and not be bothered by the overhead involved in caching.

It seems to me a lot of business exchanges (say order/invoice such as UBL does) share this property of mostly unique messages, whereas cases such as Google or Amazon APIs clearly will benefit a lot from caching. The distinction is between messaging (sending letters) and publishing (newspapers).
I’m not advocating REST or WS-* here for any particular application, but thinking about where the benefits of REST matter most is another way of thinking choosing technologies. For publishing, REST with all the optimizations of GET is the option to look at first. For messaging it’s less obvious where to start.

GET and POST aren’t verbs

Calling HTTP GET and POST ‘verbs’ is a gross misnomer; they really are URI metadata in disguise.

REST is centered around the idea that we should use the way the web works when we do things on the Web – fair enough – and that REST is the architectural style of the Web. RESTful applications – like HTTP – use POST, GET, PUT and DELETE to CREATE, READ, UPDATE and DELETE resources.

The problem is, this is not how the current Web works.Real verbs can be applied to many nouns, and a single noun can take many verbs: the cat walks, the cat talks, the cat whines, the cat shines. Some combinations make no sense, but a lot do. This is not the case for GET and POST and their friends. In principle, it is possible to apply both GET and POST to a single resource; GET an EMPLOYEE record, POST to the same record.

In practice GET and POST do not often apply to the same resource. POST on the Web is used in HTML forms. A form has a method (GET or POST) and an URI (which points to a resource). Usually, POST forms have unique URI’s; they don’t share them with GETS. Amazon uses artificial keys to make the POST URI’s unique. More surprising, they even do the same thing when I GET the URI instead of POST it (which the Firefox Web Developer toolbar supports with a single menu choice). Amazon doesn’t care whether I GET or POST a book to my shopping cart; it (understandably) lands in my shopping cart either way. The same applies to del.icio.us and numerous other – I suspect most – sites.

The Web does not work through applying a small set of verbs to a large number of resources: the resources do all the work. GET and POST aren’t real verbs; they just signal what the URI owner intends to do when the URI is dereferenced; as such, they are URI metadata. The ‘GET’ metadata in a HTML form’s method just informs your browser, and all intermediaries that this URI is supposed to have no side-effects on the server. Of course no browser can know what actually happens when a URI is dereferenced: maybe a document is returned, maybe the missiles are fired. GET and POST are just assertions made by the URI owner about the side effects of the URI.

The ‘old’ Web wouldn’t be one bit different in appearance if both GET and POST weren’t even allowed on the same URI. It’s possible to use them as real verbs, as REST advocates; but the merits of this approach do not derive from the way the current Web works.

Don’t get me wrong: GET and POST are brilliant – especially GET. Returning the GET and POST metadata back to the server with the URI is what makes the Web tick: it allows intermediaries to do smart caching stuff, based on the GET and POST metadata. But however good – GET and POST are not verbs, they are metadata.