Subscribe to the Nordic APIs digest for a weekly dosage of API strategy, as well as news on upcoming Nordic APIs events and seminars.
So, you’ve published a web API? Well done! You’re serving it over HTTP(S), right? Most developers see no reason to distrust the protocol that’s been holding the web together for almost 30 years. HTTP is very performant, scalable and reliable – in fact, it has multiple nifty performance features to make sure developers can make the most out of the applications built upon it. In this article we explore a few of them, because even though most popular web servers support them, you may need to enable or configure them, and the first step is understanding them.
This article is not meant to give you a step-by-step instruction in how to enable and configure these features in your particular web server, but rather to explain why they are needed, how they work, and in what situations you should use them. Here we go!
Caching might be the performance feature with the most impact, but it is also one of the most complex and error-prone ones. The basic idea is that the client should not need to re-download data that it has previously downloaded – the problem is that of deciding which data the client already has and whether it has changed since the client downloaded it.
There are a few ways of dealing with this. The most common approach is to include aresponse header that tells the client how to cache the data, if ever. But since HTTP is an old and versatile protocol, there are several different cache headers and different ways to use them. Here is a quick overview of the most commonly used cache headers:
cache-control: private, max-age=0, no-cache
The client can then, on subsequent requests, tell the server to only return the resource if the etag has changed, using the If-None-Match header:
Remember to test your app extensively (duh) when using caching, since it is very easy to get it wrong. The more complex your setup is, the more pitfalls you’ll potentially build for yourself. But there is also potential for a great performance and scalability boost.
HTTP almost always uses TCP as an underlying protocol, and there is one property of TCP that can cause severe performance problems if not handled correctly. Well, there are several, but there is one specifically worth highlighting.
If a TCP sent by one party is lost, the party will transmit that packet again after a period of time. Modern network stacks cleverly use round trip times of successful packets to figure out an ideal timeout – for instance, if packets normally take 1 millisecond to arrive, the party can safely assume that any packet that has been in transit for 10 milliseconds is probably lost, and so it can retransmit that packet and stop waiting for the original one to arrive.
But if your application is built in a way such that each HTTP request uses a new TCP connection (instead of reusing an existing one), that historical data is not available. In this case, most network stacks set the initial timeout to 3 seconds. That means that if a packet is lost, it will take 3 full seconds before it is retried! This can be a huge problem for clients with a shaky connection, for instance mobile users.
So what can be done? HTTP has a feature called Keep-Alive, which enables the client and server to maintain their TCP connection, even when the first HTTP request-response cycle has been completed. That way, subsequent requests will use the same TCP connection and any lost packets will be retransmitted much sooner. This is of course only useful if your application involves multiple HTTP requests from each client.
You can further optimize performance of applications where clients send multiple requests by pipelining them. When request pipelining is enabled, the client and server agrees that the client does not need to wait for a response before it sends the next request. This way, you can achieve much higher throughput. However, the responses will still come in the same order as the corresponding requests, so a particularly slow response will still hold up all the responses coming after it. This is called head-of-line blocking and is being addressed in the next version of HTTP called HTTP/2. More on that below.
Note also that pipelining will only always work for requests that do not change the state of the server, for instance GET or HEAD requests (shame on you if you change server state with these requests). Requests that do change the state, such as PUT or DELETE, can be pipelined, provided that the client is sure that subsequent requests do not depend on the state of previous requests. Otherwise, the client will see inconsistent server state between the responses which may or may not break your app.
Non-idempotent requests (that cause a new unique action each time they are made), such as POST, are generally not safe to pipeline. Most often these are implemented as a block in the pipeline, to make sure not to screw up the state for any other requests that might depend on it.
One easy way to save time when transmitting data is to compress the data. HTTP supports multiple formats of compression, but the two most commonly used areGZIP and deflate. In theory they are similar (they use the same compression algorithm but with different headers and checksum), but in practice there has been a lot of confusion with deflate. Since many browsers have been implementing it incorrectly, even though deflate can be faster than GZIP, the general consensus is to avoid deflate. The effect has been that GZIP is the default compression format for most server software. However, there might still be clients that only support deflate, so the best thing is to make the server support both.
So how does it work? Typically, the client tells the server (via Accept-Encodingheader) that it supports some types of compression. The server then compresses the data payload (not the headers) of the HTTP response using that compression scheme and serves it to the client. Depending on the type of content, the data can be up to 90% smaller when compressed.
Client: GET /some-resource HTTP/1.1 Host: www.domain.com Accept-Encoding: gzip, deflate Server: HTTP/1.1 200 OK Content-Length: 1337 Content-Encoding: gzip [compressed data]
It is important to remember, though, that compression is not free. It will take some CPU resources on the server to compress the data, and some resources on the client to decompress it. That can lead to performance problems on the server because it uses up the CPU on compression of HTTP data, and if the data payload is small and the network is fast, the added compression delay might actually make the data take longer to reach the client than when no compression is used. As always, try it out with your scenario and see if it works for you.
You shouldn’t send data that has already been sent or should not be sent for other reasons (see the caching discussion above). HTTP gives you tools to optimize your app the way you want it. Here are some ways you can use HTTP to serve or acceptpartial content:
Client: HEAD /some-resource HTTP/1.1 Host: www.domain.com Server: HTTP/1.1 200 OK Last-Modified: Fri, 26 Aug 2016 21:31:11 GMT ETag: "123abc" Content-Length: 25876
Client: POST /some-resource HTTP/1.1 Host: www.domain.com Expect: 100-continue Content-Length: 42424242 X-Authorization: whatever [no data] Server: HTTP/1.1 100 Continue Client: POST /some-resource HTTP/1.1 Host: www.domain.com Expect: 100-continue Content-Length: 42424242 X-Authorization: whatever [all that juicy data]
So how does the server know that the client expects this? Well, the client has to ask for it. One way for the client to do it is to use a HEAD request to find out the size of the resource, and in doing so, the server can tell the client that it supports partial responses by returning the Accept-Ranges header. Then the client can simply make multiple requests, each with a different byte range in the Range header.
Client: HEAD /some-resource HTTP/1.1 Host: www.domain.com Server: HTTP/1.1 200 OK Last-Modified: Fri, 26 Aug 2016 21:31:11 GMT ETag: "123abc" Content-Length: 25876 Accept-Ranges: bytes Client: GET /some-resource HTTP/1.1 Host: www.domain.com Range: bytes=0-100
So what does the future have in store? Well, the next version of HTTP (not so surprisingly called HTTP/2) was released last year, after having started its life as a Google-developed protocol called SPDY. One of the major differences is that HTTP/2 is based on binary frames that are transmitted over TCP streams, as opposed to ASCII based text messages. This change enables a bunch new features, some of which will bring some performance enhancements, including:
So, what have we learned? Well, for one thing, HTTP is very complex and feature-rich, and configuring it correctly and optimizing your setup can significantly boost performance and reliability. Though the usage can vary, many of the optimizations that are tailor made for web browsing scenarios also carry over nicely to API design. But these optimizations don’t come for free — be sure to test under real world conditions so that they successfully increase the robustness of your service. Happy optimizing!
This post was authored by Joel Kall, co-founder and senior developer at Loop54.
Joel has an M. Sc. in Media Technology from KTH, where he did his master thesis programming ad systems on set-top-boxes for Videoplaza and Ericsson. Since then he has founded three companies, worked as a consultant in web and systems programming, and is currently co-founder and senior developer at Loop54, a company providing on-site search as a service for e-tailers via a REST API. At Loop54, Joel works closely with his pet mathematician (and good friend) to develop new algorithms to make sure the search engine can understand what users are looking for.