XHTML fixes for the WordPress reCAPTCHA plugin

July 30th, 2010

The wp-recaptcha plugin for WordPress breaks when you’re serving pages as application/xhtml+xml. I inadvertently broke comments when I installed it (silly me for not testing!). I’ve written a patch that fixes it.

Read the rest of this entry »

Duplicating ggplot axis labels

July 21st, 2010

I’ve been trying for a while to find an elegant solution for duplicating axis ticks and labels in a ggplot chart. Hadley replied on the ggplot2 mailing list, but a working solution within ggplot2 seems a way off.

The situation is this: imagine you have a faceted plot that is tall enough that the x-axis ticks and labels become obscured (e.g. when using a clipped viewport such as a browser window). This is particularly destructive when you’re using an x-scale with manual breaks or a transformation.

library(ggplot2)
g <- ggplot(diamonds, aes(carat, ..density..)) +
   geom_histogram(aes(fill = clarity), binwidth = 0.2) +
   facet_grid(cut ~ .)
print(g)

There simply isn’t a way to repeat the x-axis labels in ggplot2 at the moment without discarding faceting and rendering each facet as a separate ggplot call. I’ve seen some examples of selective plotting used to good effect in combining multiple plots with common elements, but I can’t find anything applicable to keep consistent scales and binning without duplicating a lot of the (internal) facet and bin logic.

Read the rest of this entry »

Aho-Corasick string matching in Haskell

July 18th, 2010

The Aho-Corasick string matching algorithm constructs an automaton for matching a dictionary of patterns. When applied to an input string, the automaton’s time complexity is linear in the length of the input, plus the number of matches (so at worst quadratic in the input). It’s been around since 1975, but it isn’t implemented in the Haskell stringsearch library and I couldn’t even find a general trie data structure from google. So I implemented the Aho-Corasick algorithm myself: take a look at the full Aho-Corasick module.

There was an interesting paper on deriving the algorithm as a result of applying fully-lazy evaluation and memoization on a more naive algorithm. Unfortunately, applying fully-lazy evaluation and memoization to a function in Haskell is non-trivial (despite it being theoretically possible for the compiler to do so!).

It’s always interesting trying to find the functional equivalent to an imperative algorithm. I ended up using some cute Haskell tricks.

Read the rest of this entry »

R and LaTeX PDF graphics

May 17th, 2010

When writing a document in LaTeX that makes use of figures from R, I want to produce a PDF with

  • vector graphics,
  • consistent fonts,
  • not to mess around overlaying text in LaTeX,

and maybe typeset math in the R graphics. This post surveys the state of the art in how to achieve the best of all worlds when importing graphics generated by R into documents typeset to PDF with LaTeX. I look at postscript and PDF figures generated by R’s X11, Cairo, and finally the new (and awesome) TikZ devices.

Read the rest of this entry »