APP For Scalability

July 27th, 2007

One of the common first steps for scaling up an application is to move the database off to a dedicated server - often followed by having multiple application server instances to handle requests. With a standard SQL database that's pretty straight forward, with data stored in Amazon S3 that's not always as simple.

S3 obviously provides a network API, but it doesn't necessarily provide all the functionality you need from your data layer. For instance if you need to update search indexes you need a central server to track the changes and update the indexes. You may also need synchronization above what S3 provides etc. Whatever the reason, you need to provide a server to handle those data layer tasks and then pass the storage off to S3.

So where does APP (Atom Publishing Protocol) fit in? It occurs to me that for a large number of applications it's probably a very good interface to use. Firstly, there's some libraries being created that make it easy to get up and running. More importantly, because it's a standard, other applications can also connect to your data store.

Of course it all depends on exactly what data your application deals with, and I still haven't worked out the best way to deal with versions, but it shows some interesting potential.

Versioned Resources In REST APIs

July 27th, 2007

I like the idea of resources being addressable by a simple URL, but I'm having some difficulty reconciling that with resources that are versioned. Getting at and working with the latest version of a document with REST APIs is all pretty straight forward, but how do you retrieve the document history or a specific version of the document? I'm sure this is something that people have already worked out, but all my searching for discussions of it leads to people talking about versioning the API so that things don't break when you change what operations are available or the data format returned, rather than versioning the resources themselves.

It seems to me that the version information should be available at basically the same place as the main document - it is after all essentially the same document just older. So something like …/resource.html?version=2 makes sense for retrieving a specific version. However, that requires that you know which versions are available - even in a system that doesn't have branching versions may go missing to delete spam or copyright infringement problems. There's also no reason that versions should be simple numbers - they might be 1.1 or a, b, c or just the ETag for that version.

Speaking of ETags, it would be interesting to see how they could be used - there's an If-Match and If-None-Match header for ETags and I've never quite grasped why both exist. It would be nice for one of them to turn out to mean, "I want this exact version" but I doubt it's the case and I really doubt anything out there supports it.

Getting a list of available versions is also an issue - you could consider the list of versions a separate resource (say …/versions/resource.html or …/resource.html/versions) which returns a list of available versions and their meta data but doesn't allow PUT or POST operations. It seems odd though that versions are a separate resource. The other argument is something like …/resource.html?history=all which is okay but needs systems for limiting how much is returned and paging etc.

How are other people handling versioned resources with REST style APIs?