søndag, december 08, 2013

Media types for APIs

I have previously touched upon the concept of media types (see http://soabits.blogspot.no/2013/05/the-role-of-media-types-in-restful-web.html), but somehow it has always been difficult for me to really nail the concept down in a concise and useful article.

Now the latest discussion about the benefits of hypermedia (see http://soabits.blogspot.no/2013/12/selling-benefits-of-hypermedia.html) got me thinking about media types again - but this time in the perspective of unique service implementations with dedicated clients versus large scale ecosystems of mixed implementations.

As it turns out, media types doesn't mean sh*t on a small scale. That kind of explains why it has been so difficult to get to some kind of consensus about media types for APIs.

Background


When the discussion touches upon media types the arguments usually follow these lines:

  • Completely generic media types like JSON and XML should be avoided since they do not include any kind of hypermedia elements.
  • One school of thought argues that we should have very few (generic) media types. This is to avoid the need for clients to understand too many media types.
  • Another school of thought argues that we should have many different domain specific media types. Otherwise the client wouldn't know what kind of resource it was looking at.

But, as I said, it really doesn't matter. Both schools are right. At least when you look at unique service implementations with dedicated clients - like for instance dedicated Twitter clients.

Let me give you a concrete example from the Twitter API (see https://dev.twitter.com/discussions/5662): the return value from their oauth/request_token "endpoint" is key/value pairs encoded as application/x-www-form-urlencoded - but the server says it is "text/html" which is clearly wrong. Does that break any client implementations? No. Why? Because all clients are dedicated to the Twitter API; they KNOW about this little peculiarity and has been hard coded to work with it.

My point is:
Media types are irrelevant for unique service implementations with dedicated clients. In this world the client always knows exactly what it is doing and what kind of result to expect from the server (and it can safely ignore the media type).

Media types on a large scale


Let us broaden our view and look at the example of "Big corporation buys smaller companies and the result is a big unruly combination of customers, sales orders and other stuff living on different systems" which I introduced in my previous blog post (http://soabits.blogspot.no/2013/12/selling-benefits-of-hypermedia.html).

Now lets assume our fictive client is handed a link/URL to a customer resource in this mess of a heterogeneous mix of different company resources. The client can issue a GET on the URL and in return it will receive a stream of bytes. How does the client interpret those bytes? Obviously it will depend on the media type. But which kind of media type is useful for this purpose?

Let us assume the client understand a generic (hypermedia enabled) media type like HAL. Together with the GET request the client sends an accept header "Accept: application/hal+json". Luckily the server knows how to serve the customer resource as HAL, so the client gets a HAL document in return.

Now what? We have integrated customer resources from three different organizations and each of these have been encoding customer records in HAL - but in different ways.

For instance: Company X has these customer properties:

{
  ID: 1234,
  Name: "John Larsson",
  Address: "Marienborg 1, 2830 Virum, Denmark"
}

while company Y uses these properties:

{
  ID: 1234,
  FirstName: "John",
  LastName: "Larsson",
  Address:
  {
    Address: "Marienborg 1",
    PostalCode: "2830",
    City: "Virum",
    Country: "Denmark"
  }
}

With nothing but this information our client must either give up or do some guessing like "If FirstName is present then assume format of company Y". So apparently we need a bit more information than we already have.

Now we can either choose to add some kind of profile to the representation - either as a header or in the payload - or we can use a domain specific media type.

1) A profile in the payload could be done like this:

{
  ID: 1234,
  profile: "http://company-x.com/profiles/customer-care",
  ... other properties ...
}

2) The profile could also be part of the media type, so we would get "application/hal+json;profile=http://company-x.com/profiles/customer-care".

3) A domain specific media type could be something like "application/company-x.customer-care.hal+json" or similar.

But which method should we choose? Lets take a look at how the client process the server response before we answer that.

Processing a server response


There are three things the client must know in order to process a server response correctly:

  1. How to decode the byte stream (generic knowledge).
  2. What the data represents (domain specific knowledge).
  3. How to locate hypermedia elements in the response (generic knowledge).

The media type is obviously the key to decoding the byte stream - it will tell the client whether it is looking at XML, PDF, HTML, HAL, Sirene and so on.

The media type should also be the key to locating hypermedia elements in the response.

But what about the domain specific knowledge - should we identify what a resource represents with a domain specific media type or with a profile? Both methods work, but there is one more thing to take into account: making the API explorable by client developers (see http://soabits.blogspot.no/2013/12/selling-benefits-of-hypermedia.html).

It is of course possible to implement a browser for any domain specific media type we can think of, but it would obviously be more practical if we could have one single API browser for all kinds of APIs. For this reason we should avoid domain specific media types. The domain knowledge can then be identified by a profile - either in the payload or in a HTTP header.

Wrapping it all up


As with the hypermedia problem: if you stick to unique service implementations with dedicated clients (like a dedicated Twitter client) then media types are utterly irrelevant. The client can safely assume that there will be one, and only one, representation of what ever kind of resource it is looking for.

But if you take broader perspective and venture into a highly heterogeneous, loosely coupled, unorganized, incoherent and fragmented ecology (also called "The internet") - then you need more domain specific information about the resources - either through domain specific media types, or generic media types with profiles.

My recommendation is:

  1. Use generic media types that include hypermedia elements.
  2. Identify domain specific information through profiles.

The media type will tell the client HOW to decode the byte stream and HOW to interact with the resource. The profile will tell the client WHAT it is looking at.

fredag, december 06, 2013

Selling the benefits of hypermedia in APIs

Once more I have found myself deeply engaged in a discussion about REST on the api-craft mailing list (https://groups.google.com/forum/#!topic/api-craft/ZxnLD6q6w7w). This time it started with the question "How do I sell the benefits of hypermedia". It turned out to be harder to answer than one would expect, but after some time we came up with the list below. But before we get into that I better explain "hypermedia" in a few sentences.

The most common use of hypermedia is embedding of links in representations returned from some service on the web. As an example we can look at the representation of a customer record containing a customer ID, customer name, customer contact information and related sales orders. Encoded in JSON we can get something like this:

// Customer record
// URL template: http://company-x.com/customers/{customer-id}
{
  ID: 1234,
  Name: "John Larsson",
  Address: "Marienborg 1, 2830 Virum, Denmark",
}

// List of sales order
// URL template: http://company-x.com/customers/{customer-id}/sales-orders
{
  CustomerId: 1234,
  Orders:
  [
    {
      ID: 10,
      ItemNumber: 15,
      Quantity: 4
    }
  ]
}

These two resources can be found by expanding the customer ID into the URL templates. This requires the client to be hard coded with 1) the URL templates and 2) the knowledge of which values to use as parameters.

Now, if we embed links in the responses then we can remove the hard coded knowledge of at least the sales orders URL template:

// Customer record
// URL template: http://company-x.com/customers/{customer-id}
{
  ID: 1234,
  Name: "John Larsson",
  Address: "Marienborg 1, 2830 Virum, Denmark",
  _links:
  [
    {
      rel: "http://linkrels.company-x.com/sales-orders",
      href: "{link-to-sales-orders}",
      title = "Sales orders"
    }
  ]
}

// List of sales order
// URL template unpublished
{
  CustomerId: 1234,
  Orders:
  [
    {
      ID: 10,
      ItemNumber: 15,
      Quantity: 4,
      _links:
      [
        {
          rel: "http://linkrels.company-x.com/order-details",
          href: "{link-to-sales-order-details}",
          title = "Sales order details"
        },
        {
          rel: "http://linkrels.company-x.com/item-details",
          href: "{link-to-item-details}",
          title = "Item order details (catalog)"
        }
      ]
    }
  ]
}

Notice how links are encoded:

  • Links are always found in collections named _links
  • A single link consists of a link relation identifier "rel", the hypermedia reference "href" and a human readable description "title".
  • Link relations are identified by URLs.

The question is know, what is gained by adding such links?

Short term effects


1. Explorable API

It may sound trivial but do not underestimate the power of an explorable API. The ability to browse around the data makes it a lot easier for the client developers to build a mental model of the API and its data structures.

Think of it like this; traditionally, as a client developer, you would have to read through a pile of documentation before you sit down and write some test programs more or less blindfolded. After that you run your test program to see how the API behaves. Then you have to go back to the documentation and read some more - and then back to coding again. This exercise has three different mental context switches going back and forth between reading documentation, programming and trying out test programs.

With an explorable API you can simply try out the API and test your understanding of it without any programming. Any mental "what-if" hypothesis testing of the API can be carried out right there without any additional tools or programming. The data as well as the interaction tools is right there in front of you, reducing the mental hoops you have to go through to understand the API.

The immediate benefits of an explorable API is perhaps more social than technical. But mind you - a lower barrier of entry means happier client developers, higher API adoption rates and less support, which in the end means fewer annoying support calls to bug YOU at the most annoying times of your work.

2. Inline documentation

Did you notice how link relations are identified by URLs? These URLs can point to online documentation where the API elements can be explained.

The immediate benefits of this are also social just like the explorability of the API. It will lower the barrier of entry to understanding the API and improve API adoption by client developers.

3. Simple client logic

A client that simply follows URLs instead of constructing them itself, should be easier to implement and maintain. It won't need logic to figure out which values to substitute into what URL templates. All it has to do is to identify links in the payload and extract the hypermedia reference URL.

Long term effects


4. The server takes ownership of URL structures

The use of hypermedia removes the client's hard coded knowledge of the URL structures used by the server. This means the server is free to change its URL structures over time when the API evolves without any need to upgrade the clients.

The benefits of this is obviously less coupling between the server and the client, removing the need to upgrade all clients in lock step with the server.

But, you may ask, why should the server change its URL structures? Once the server developers has decided that the URL is /customers/{customer-id} why should they then suddenly decide to change it? Well, I cannot tell you what will change in your API, but here are two examples:

- A resource grows too big. Over time it has been necessary to add more and more features to a single resource and one day it simply becomes too big to handle. So it is decided to split it into multiple sub-resources with new URL structures.

- It turns out that some resources requires bits and pieces of information from other resources when the client access them. It can for instance be an access token of some kind that need to be generated in one place and passed to another resource. With a traditional API the client has to be upgraded with this kind of business logic. With a hypermedia API the client can ignore this complexity and leave it to the server to add the desired parameters to the links it generates.

5. Off loading content to other services

Consider how APIs evolve: after some time you figure out that some of the content should be off-loaded to a Content Delivery Network (CDN). This means new URLs that points to completely different hosts all over the internet. The actual URL cannot be hard coded into the client since it may change over time or contain random pieces of server generated information for the CDN (like for instance some kind of access token). Now the server HAS to embed the URLs in the responses and the client HAS to follow them.

6. Versioning with links

With a hypermedia API it becomes trivial to implement new versions of the API resources without breaking existing clients: old clients will follow existing link relations to old-style resources whereas new clients will know how to follow new link relations to new resources - as long as the server response includes both the old as well as the new links.

Hypermedia also allows the server to re-implement an existing resource with a completely different technology stack, on a completely different server, without the client ever noticing it - given, of course, that the new implementation doesn't make any breaking changes.

If you want to read more about versioning then take a look at Mark Nottingham's "Web API versioning smackdown" at http://www.mnot.net/blog/2011/10/25/web_api_versioning_smackdown

Large scale effects


7. Multiple implementations of the same service

So far we have only looked at clients dedicated to a unique implementation of a single service. That could for instance be something kike a dedicated Twitter client. But where hypermedia really excels is when we start to work with multiple independent implementations of the same service.

Let us try to broaden the scene: think of a big corporation that has engulfed and bought up a lot of smaller companies. All of these smaller companies have their own server setup with lists of inventory, sales orders, customers and so on ...

Now I give you the ID 4328 of customer John Burton ... how will you be able to find the resource that represent the contact information of this customer, when it can live on any one of a dozen servers?

Solution 1: We need a central indexing service that allows the client to search for customer with ID 4328. But how can the indexing service tell the client where the resulting customer record resides? The answer is simple; the response has to contain a link to the customer record resource.

Solution 2: Don't use IDs like 4328 at all. Always refer to resources with their full URLs.

Either way, the client won't know anything about the URL it gets in return - all it has to do is to trust the search result and follow the link.

And now that we have some opaque, meaningless, URL to our customer record information, how do we get to the sales orders placed by said customer? We could take the customer ID again and throw it into some other indexing service and get a new URL out of it - or we could follow a "sales-orders" link-relation embedded in the customer information.

The point is:
When you transcend from unique one-off service implementations with dedicated clients to multiple independent service implementations with a variety of clients then you simply have to use hypermedia elements.
This also means that it can be difficult to sell hypermedia to startup APIs since hypermedia won't add much benefit to one single API living on a isolated island without any requirements of being able to co-exists and co-work seamlessly with other similar services.

Other large scale effects

Hypermedia solves some of the problems related to large scale service implementations as I have just argued. But there are a few more issues to be solved in order to decouple clients completely from specific server implementations; one is related to how the client understands the result (media types) and one is related to error handling.

I have already written about error handling in http://soabits.blogspot.no/2013/05/error-handling-considerations-and-best.html where the last section discuss error handling on a larger scale.

Unfortunately I have yet to write an article explaining my current view on media types - until then you can either check this post in api-craft https://groups.google.com/d/msg/api-craft/5N5SS0JMAJw/b0diFRzopY0J or read my ramblings about media types and type systems http://soabits.blogspot.no/2013/05/the-role-of-media-types-in-restful-web.html.

UPDATE (December 8th 2013): I have just added a blog post about media types: http://soabits.blogspot.no/2013/12/media-types-for-apis.html

Acknowledgments

Thanks to Mike Kelly for his initial blogpost on this tema: http://blog.stateless.co/post/68259564511/the-case-for-hyperlinks-in-apis - and his work on hypermedia linking in JSON with HAL (http://stateless.co/hal_specification.html).