fredag, maj 17, 2013

The role of media types in RESTful web services

One of the never ending discussions in the REST community is that of custom and domain specific media types; should we, or should we not, create new media types - and if we should, for what reasons should it be done?

In this blog post I will discuss the role of media types in web services and illustrate it with an example media type. I will go through the requirements for this media type and from this I will build up the features it needs to support. Together with this I will show some example scenarios and sketch out the processing algorithm for the client side. At last I compare this media type to other similar media types (HAL, Sirene, JSON-API).

My goals for this blog post are:
  1. To improve my own understanding of the role of media types in RESTful web services - and share that with others.
  2. To define a new media type for what I call systems integration - and show how it facilitates loose coupling between the integration components.

By systems integration I mean the kind of background processing that takes place behind the scenes in almost any IT enabled business today; shuffling data from one system to another in a safe and durable way without any human interaction.

REST seems like a good fit for systems integration. It has a strong focus on loosely coupled systems where servers and clients can evolve independently of each others; if we can leverage that then the whole ecosystem of multiple servers and clients should be a lot easier to maintain and with much less downtime required for upgrading the various components.

There is an ongoing trend to include hyper media controls in never web services; that is a good trend as it removes the clients dependency on specific URL structures. This in turn allows the server to evolve by adding new resources and link to these - and it also facilitates the ability to use multiple servers without the clients ever noticing (since the client do not care about either URL path structures or host names).

But there is still a thing missing in the puzzle. In Roy Fielding's (in)famous rant "REST APIs must be hypertext-driven" he states:

... Any effort spent describing what methods to use on what URIs of interest should be entirely defined within the scope of the processing rules for a media type

... From that point on, all application state transitions must be driven by client selection of server-provided choices that are present in the received representations

Especially the last statement is interesting "all application state transitions must be driven by client selection of server-provided choices". This means the client should not make any requests without first being instructed to do so (and how to do it). The client should not POST a new Tweet, bug report or similar without being instructed, on the fly, by some mechanism embedded in the server responses. Todays use of links in responses is on the right track, but links do not inform the client about what HTTP method to use (it assumes GET) and neither does it say anything about the possible payload.

With this blog post I will try to explain how a media type, with a sufficient number of hyper media controls, together with some intelligent client side code, can enable what Fielding is describing. The downside of this approach is that client implementations become more complex - the upside is that the whole client/server application becomes much more loosely coupled which, in the end, hopefully will help us reach a maintenance Nirvana of loosely coupled systems integration :-)

By the way, I am not comparing REST with SOAP/WSDL and EDA (event driven architectures) - that is not the purpose here even though these are often found in systems integration projects. I would rather just explore what benefits we can get from REST.

Media type requirements and constrains

The primary driver for this new media type is loose coupling where the clients only depends on the media type and some out-of-band business specific data structures and identifiers. This means:

  • The client must not make any assumptions about URL structures.
  • The client must not make any assumptions about what concrete service implementation it is interacting with.
  • The client must not initiate any HTTP request without following instructions embedded in server responses (besides the initial request).
  • The client should not be given more than:
    • A root URL from which all other resources must be discovered at runtime.
    • A set of business specific data structures.
    • A set of well known identifiers for locating hyper media controls and business data.
The media type itself must be generic with respect to the business domain; it must not contain references to concepts like medical records, e-commerce and so on.

The media type must be rich enough in terms of hyper media affordances to enable all the operations needed for systems integration.

The media type does not need to included much, if any, in terms of UI elements since it is intended for operations without human interaction. Neither is the media type intended for mobile use where bandwidth and message size is a concern.

The media type will be based on JSON. It could just as well be based on XML but, in my experience, JSON is lot simpler to work with, fits the data needs I have met, and has a simple and easy-to-work-with patch format (application/json-patch) which will come in handy later on.

Armed with these constraints and requirements we are ready to build up our new media type.

Example business domain "BugMe"

Through out this blog post I will use the imaginary open standard "BugMe" for interacting with bug tracking systems through the new media type. BugMe supports adding of new bug reports, attaching documents to reports, adding comments to reports and similar features shown later on.

BugMe is not a part of the media type specification - it is only used to illustrate how the media type facilitates interaction with BugMe servers anywhere on the web.

Neither is BugMe a vendor specific "standard", it is strictly defined in terms of the generic media type and a set of bug reporting specific data structures and identifiers (more on that later on).

Compare this to APIs like Twitter and others; these are always defined in terms of vendor specific resources and explicit URL structures and was never designed to be implemented on servers anywhere else on the web.

To highlight the difference between a standard like BugMe and an actual implementation I will assume that some clever guy named Joe, who studies computer science 101 at Example.edu, has set up a BugMe server for some local study project. He is using an implementation that uses a vocabulary slightly different from  BugMe - it talks about "issues" where BugMe talks about "bug reports". This fact is illustrated through the concrete URLs used in the examples . The root URL is http://example.edu/~joe/track.

Example 1 - Creating a bug report

The first thing we will try is to create a new bug report with BugMe. To do so we must supply our client with a few details about the operation:
  • The root URL: http://example.edu/~joe/track/index.
  • A "create bug report" identifier (as defined by BugMe): "http://bugme.org/names/create-bug-report".
  • Bug reporting data (as defined by BugMe)
    • Title: "Something bad happened",
    • Description: "I pressed ctrl-alt-del and all went black",
    • Severity: 5
We must also have an identifier for the media type. Lets call it it "application/razor+json" for no specific reason.
Now we are ready to set our client loose and make it create the bug report. It will do so in the same manner as a human working with a web based UI: get a resource representation, look for well known identifiers that labels data and hyper media controls, fill out data and activate hyper media controls.

This interaction pattern, getting a resource representation and following instructions on the fly, has a price: it requires more complex client side logic than "normal RPC" patterns with design time binding of methods and it results in higher bandwidth due to the embedded hyper media controls. The upside is a much looser coupling between clients and serves. But all of this is of course already discussed in Fielding's thesis on REST ;-)

GET initial resource

At the very beginning our client has nothing to do but GET the root URL in hope of finding something useful there:

Request
GET /~joe/track/index
Accept: application/razor+json

Response
Content-Type: application/razor+json

{
  curies:
  [
    { prefix: "bug", reference: "http://bugme.org/names/" }
  ],
  controls:
  [
    ...,
    {
      type: "link",
      name: "bug:create-bug-report",
      href: "http://example.edu/~joe/track/add-issue",
      title: "Add issue to issue tracker"
    },
    ...
  ]
}

The returned JSON data contains two top level properties defined by the media type: curies and controls. "curies" define short names for URLs used as identifiers in the other elements (see http://www.w3.org/TR/curie/) and "controls" contains various hyper media controls. The use of curies should be optioinal - but it helps reading the responses in posts like this.

Now the client scans the "controls" element looking for the identifier "bug:create-bug-report". In this case it finds a "link" control which is equivalent to an ATOM link. Since our client understands all the features of the media type it will know that a link should be "followed" by issuing a HTTP GET on the "href" value.

This little "algorithm" is equivalent to what a human would do: open up a webpage, look for instructions on how to perform the task at hand and then follow them.

You may have noticed the dots "..." in the example. Those are there for a reason: they illustrate how the client only cares about stuff that is relevant to its current task. Anything else in the response is ignored. The consequence is that the server is free to evolve the content of the resource over time without breaking any clients - as long as it only adds new stuff. Neither does the client care if the content is supposed to be a "link page", a service index, a medical record or have any other specific "type" - as long as it contains elements that will help the client getting closer to its goal.

Follow link

Here we have the next operation:

Request
GET /~joe/track/add-issue
Accept: application/razor+json

Response
200 Ok
Content-Type: application/razor+json

{
  curies: ...,
  controls:
  [
    {
      type: "poe-factory",
      name: "bug:create-bug-report",
      href: "http://example.edu/~joe/track/add-issue",
      title: "Create new idempotent POE resource"
    }
  ]
}

Bingo! This time the client finds an "poe-factory" control with the right name "bug:create-bug-report" and now its time to create the bug report. The control type "poe-factory" means "Post Once Exactly factory" and is a special action element that enables idempotent POST operations. If you do not know what "idempotent" means then take a look at this page: http://www.infoq.com/news/2013/04/idempotent.

The good thing about idempotent operations is that they can safely be repeated if anything goes wrong on the network. If an operation times out the client can simply retry it again without the risk of creating the same entry multiple times. And since this new media type is for safe and durable "behind the scenes" work I find it rather important to include a mechanism for idempotent POST operations.

The implementation chosen here requires the client to do an empty POST first. This will create a new POE resource (thus the name "poe-factory") and redirect the client to it. The client can then POST to the new resource as many times it needs until the operation succeeds. The server returns "201 Created" first time it completes the operation whereas it returns "303 See Other" on following requests. In either case the server includes a "Location" header pointing to the new POE resource.

Subbu Allamaraju has a nice blog post on post once exactly techniques.

I chose this approach for the following reasons:
  • It has the simplest possible client side logic - at the cost of an extra round trip to the server. A similar solution could have required the client to create a GUID (message ID) and include it in the payload somehow, but that would make the protocol slightly more prone to client side errors.
  • It requires no special headers.
  • It adds no extra information to the payload.
  • URLs are opaque and the server gets to choose how the POE/message ID is encoded.

Create POE resource

In order to complete its task the client first issues an empty POST operation to the URL of the "href" attribute:

Request
POST /~joe/track/add-issue
Content-length: 0

Response
201 Created
Location: http://example.edu/~joe/track/add-issue/bd925-ye174h

GET POE resource

It should be rather obvious now that the client has no choice but to follow the response:

Request
GET /~joe/track/add-issue/bd925-ye174h
Accept: application/razor+json

Response
400 Ok
Content-Type: application/razor+json

{
  curies: ...,
  controls:
  [
    {
      type: "poe-action",
      name: "bug:create-bug-report",
      documentation: ... some URL ...,
      method: "POST",
      href: "http://example.edu/~joe/track/add-issue/bd925-ye174h",
      type: "application/json",
      scaffold: ... any JSON object ...,
      title: "Add issue"
    }
  ]
}

Now the client gets a response with a "poe-action" control. This tells the client that it can safely POST as many times it needs to the "href" URL. The actual payload is given by the BugMe specification (Title, Description, Severity).

Some comments on the above response:
  1. The payload is encoded in application/json as a trivial JSON object. Other formats may be included in the media type spec later on.
  2. This format is NOT intended for automatic creation of UI's and thus it contains no UI related list of field definitions or similar.
  3. It is NOT necessary to embed any kind of schema information - that sort of thing is given by the name of the control element.
  4. The optional "scaffold" value is the JSON payload equivalent of a URL template: it supplies default values to some properties and adds additional "hidden" properties the client can ignore (as long as they are sent back).
  5. POE-actions are not restricted to POST - a PATCH with json/patch would work as well (but then perhaps we need to change the action type name).

Create bug report

Then the client issues a new request:

Request
POST /~joe/track/add-issue/bd925-ye174h
Accept: application/razor+json
Content-Type: application/json

{
  Title: "Something bad happened",
  Description: "I pressed ctrl-alt-del and all went black",
  Severity: 5
}

Response
201 Created
Location: http://example.edu/~joe/track/issues/32

GET created bug report

Now we are done unless we want to see the actual created bug report by following the Location header:

Request
GET /~joe/track/issues/32
Accept: application/razor+json

Response
Content-Type: application/razor+json

{
  curies: ...,
  controls: ...,
  payloads:
  [
    ...,
    {
      name: "bug:bug-report",
      data:
      {
        Id: 32,
        Title: "Something bad happened",
        Description: "I pressed ctrl-alt-del and all went black",
        Severity: 5,
        Created: "2012-04-23T18:25:43Z"
      }
    },
    ...
  ]
}

Now that the client can see the actual bug report it wanted to create it knows that the task is completed. Everyone is smiling and put on their happy face :-)

Other hyper media controls

There are of course more scenarios to cover than this single "Create stuff" scenario and these scenarios will call for other kinds of hyper media controls, for instance URL templates, PATCH actions, binary file upload and more (I should cover these in some future blog posts ...)

Error handling

If the client receives a 4xx or 5xx status code it can inspect the JSON payload and look for a property named "error" together with the other "payloads" and "controls" properties. The "error" property should contain data according to my previous blog post on error handling.

Here is an example:

Request
POST /~joe/track/add-issue/bd925-ye174h
Accept: application/razor+json
Content-Type: application/json

{
  Title: "Something bad happened",
  Description: "I pressed ctrl-alt-del and all went black",
  Severity: 5
}

Response
503 Service Unavailable
Content-Type: application/razor+json

{
  error:
  {
    message: "Could not create new bug report; server is down for maintenance",
    ...
  }
}

In addition to this the client can try to use content negotiation to receive error information in the format of application/api-problem+json.

Client side processing algorithm

Here is a simplified view of how the client should process the content:
  1. GET initial root resource.
  2. [LOOP:] Look for hyper media controls with appropriate names.
  3. Check the type of the found control element:
    1. If it is a "link" then follow that link and restart from [LOOP].
    2. If it is a "poe-factory" then issue an empty POST to the href value and restart from [LOOP].
    3. if it is a "poe-action" then issue a request with the specified method and data encoded according to the "target" media type. Then restart from [LOOP].
  4. Look for a payload with the appropriate name: If it exists then the task is complete - otherwise it has failed (actually I don't like this last step, but that is the only kind of "acknowledge" I can see the server responding with).
A consequence of this approach is that the service specification (BugMe in my example) should state nothing about how to find and update data since that is up to the servers actual implementation. The service specification should only consider what kind of data to look for or modify. The "how"-part is contained entirely in the returned hyper media controls.

As the media type evolves and more types of hyper media controls are added the client(s) will grow more and more complex. This is one of the trade offs that has to be accepted in order to keep clients and servers as loosely coupled as possible.

If the media type gets popular one could even expect to see the same scenario we see with todays web browsers: there will be multiple implementations of the client libraries and some will implement more than others of the final specification.

No profile needed

It may be tempting to allow for a "profile" parameter with the media type ID. But typically that would be used to ask for a specific "type" of a resource like for instance "application/razor+json;profile=user". As can be seen in the client side processing algorithm above there is no need for such a thing, so lets not introduce it.

Related work

Quite a few other people are trying to create new media types to reach similar goals, but neither of them include features such as POE semantics. Here is the list of related media types that I am aware of:
And then there is Jim Webber's fantastic "How to GET a cup of coffee" which has been a big inspiration for me over the years.

Reasons for creating a new media type

How many media types should we invent? Well, as many as needed, I would say. The media type described here includes some features not found in other media types (POE semantics for instance) and that should be sufficient argument for creating a new one.

I don't see anything wrong by creating many media types - eventually a few of them will be good enough and gain enough traction to become ubiquitous standards. That's called evolution.

Summary

In this blog post I have tried to explain one way of understanding media type's role in RESTful web services and illustrated it by building up (parts of) a media type for systems integration. I have also touched upon the issue of "typed" resources and how to avoid it (by not assuming anything about the resource type and instead look for certain identifiers in the response) ... there could be a blog post more to come on this issue.

So what do you think? Was this useful, understandable, totally overkill, outright naive or simply a pile of, well, rubbish? Feel free to add a comment, Tweet me or send me an e-mail. I would love to get some feedback.

Happy hacking, Jørn

UPDATE 2014-02-24: I have actually put much of this into a media type called Mason. See http://soabits.blogspot.dk/2014/02/implementing-hypermedia-apis-and-rest.html.

4 kommentarer:

  1. Very useful stuff :) Keep more of this stuff coming! Our team is doing RESTish APIs that are based on Command Query Responsibility Segregation (CQRS) using Spray.io / Akka / Eventsourced - This is really relevant to us and I would love to see more posts like this.

    SvarSlet
    Svar
    1. Thanks for the feedback. Glad you liked it :-)

      Slet
  2. In your POE scenario, I don't understand why you'd need to initiate a GET request after this exchange:

    Request
    POST /~joe/track/add-issue
    Content-length: 0

    Response
    201 Created
    Location: http://example.edu/~joe/track/add-issue/bd925-ye174h

    Instead of re-POSTing, I'd simply do a PUT, which is idempotent:

    PUT /~joe/track/add-issue/bd925-ye174h
    Content-Type: application/json

    {
    Title: "Something bad happened",
    Description: "I pressed ctrl-alt-del and all went black",
    Severity: 5
    }

    Why attempt to force POST to be idempotent (esp. when the HTTP spec. clearly says that it isn't)?

    SvarSlet
  3. > Instead of re-POSTing, I'd simply do a PUT, which is idempotent:

    > Why attempt to force POST to be idempotent

    Because PUT requires the client to know the exact URL of the resource it creates. That is seldom a possibility in the scenarios I have encountered. In the above example, a PUT to /~joe/track/add-issue/bd925-ye174h would be wrong since that is not the location of the created issue - which is why we get a redirect afterwards to http://example.edu/~joe/track/issues/32. It MIGHT be that /~joe/track/add-issue/bd925-ye174h is the final location of the issue, but we cannot say so in general.

    SvarSlet