Apache 304 and mod_deflate revisited

Last year I commented on how mod_deflate breaks the cache validation model. Essentially the problem has been addressing two issues:

So in January 2008, a change was committed to fix #39727, which introduced #45023. Now in April this year, it was reversed to fix #45023. I agree with the priorities here; no caching is much worse for web performance.

As Roy Fielding pointed out the correct way to deal with this issue is to stop abusing Content-Encoding for performance-compression and start using Transfer-Encoding; pity browsers and HTTP servers haven’t got there yet.

Twitter not sending 304s

The twitter JSON API that I’ve been using for my status widget has a caching problem, which has caused it to be broken in Opera for a while now. Opera is quite aggressive in re-using its cache (which IMHO is a good thing). However, bad things happen when webservices deviate from the HTTP cache validation model. Twitter is recognising that the browser should be hitting its cache, but its response is broken.

Here’s how it goes:

  1. Load up the JSON document for the first time with an empty cache.
  2. Twitter sends a Last-Modified header and the expected JSON document.
  3. Refresh the document (in Opera, hit enter in the address bar as opposed to clicking the Reload button, since the latter forces a cache refresh). Opera sends an If-Modified-Since header.
  4. Twitter (presumably) recognises that the last status update was not after the browser’s cache timestamp. It sends a degenerate response entity: “[]“; an empty array in javascript, with a 200 OK status.

To test this from a shell:

url='http://twitter.com/statuses/user_timeline/p00ya.json?count=1&callback=f'
lm="$(wget --debug $url 2>&1 \
 |grep '^Last-Modified:' \
 |sed -e 's/Last-Modified/If-Modified-Since/' \
 |tr -d '\n\r')"
[ x != "x$lm" ] && \
  wget -nv --save-headers -O - --header="$lm" $url

The brokeness comes from the half-baked response. A 200 OK status code would be fine if the full JSON object was written out. A 304 status code with any kind of entity would be fine too. The empty array might even prevent breakage in user-agents that don’t handle 304s (but do send If-Modified-Since? wtf?). Sending a 200 response overwrites the correct cached entity, replacing it with the degenerate response.

Firefox seems to be unaffected since it doesn’t cache the document at all and so doesn’t send the If-Modified-Since header.

One workaround is to use something like jQuery’s cache breaking capability (where it adds some random tokens to the URL each time). I refuse. Just remember the widget breakage wasn’t my fault!

A separate issue is that it seemed to break Opera quite badly. Perhaps it was because I’m using jQuery’s ready event, but Opera hangs as if the XHR was synchronous. The document wouldn’t receive any events (no mouse-wheel scrolling). I’ve got no idea why though; my callback functions are robust enough to handle being passed the empty array twitter calls them with, and I wasn’t getting any exceptions.

Apache 304s and mod_deflate

The deflate output filter in Apache breaks Apache’s handling of the HTTP cache validation model. It won’t send an HTTP 304 status if mod_deflate is actively filtering the response, even if the Etag and Last-Modified allow it to. I asked, and apparently this is a known issue. I might have to wait until after exams before making a patch for this one.

In its current state, this introduces a tradeoff for large, uncompressed static files (like CSS and Javascript):

  • gzip stuff, and you improve site performance the first time someone visits, but it never gets any faster.
  • allow Apache to send 304 statuses, let the user have a slow load on their first visit, but celebrate as they hit their cache thereafter

The second option looks more attractive.