Software

Duplicating ggplot axis labels

Update: the lemon package’s facet_rep_wrap gives the user control over repeated facet labels (thanks to Flore for pointing it out).

I’ve been trying for a while to find an elegant solution for duplicating axis ticks and labels in a ggplot chart. Hadley replied on the ggplot2 mailing list, but a working solution within ggplot2 seems a way off.

The situation is this: imagine you have a faceted plot that is tall enough that the x-axis ticks and labels become obscured (e.g. when using a clipped viewport such as a browser window). This is particularly destructive when you’re using an x-scale with manual breaks or a transformation.

library(ggplot2)
g <- ggplot(diamonds, aes(carat, ..density..)) + 
   geom_histogram(aes(fill = clarity), binwidth = 0.2) + 
   facet_grid(cut ~ .)
print(g)

Faceted Plot where the x-axis labels have been clipped out

There simply isn’t a way to repeat the x-axis labels in ggplot2 at the moment without discarding faceting and rendering each facet as a separate ggplot call. I’ve seen some examples of selective plotting used to good effect in combining multiple plots with common elements, but I can’t find anything applicable to keep consistent scales and binning without duplicating a lot of the (internal) facet and bin logic.

Continue reading “Duplicating ggplot axis labels”

Aho-Corasick string matching in Haskell

The Aho-Corasick string matching algorithm constructs an automaton for matching a dictionary of patterns. When applied to an input string, the automaton’s time complexity is linear in the length of the input, plus the number of matches (so at worst quadratic in the input). It’s been around since 1975, but it isn’t implemented in the Haskell stringsearch library and I couldn’t even find a general trie data structure from google. So I implemented the Aho-Corasick algorithm myself: take a look at the full Aho-Corasick module.

There was an interesting paper on deriving the algorithm as a result of applying fully-lazy evaluation and memoization on a more naive algorithm. Unfortunately, applying fully-lazy evaluation and memoization to a function in Haskell is non-trivial (despite it being theoretically possible for the compiler to do so!).

It’s always interesting trying to find the functional equivalent to an imperative algorithm. I ended up using some cute Haskell tricks.

Update: I’ve written an improved version of Aho-Corasick implemented with Data.Array and Data.Map

Continue reading “Aho-Corasick string matching in Haskell”

R and LaTeX PDF graphics

When writing a document in LaTeX that makes use of figures from R, I want to produce a PDF with

vector graphics,
consistent fonts,
not to mess around overlaying text in LaTeX,

and maybe typeset math in the R graphics. This post surveys the state of the art in how to achieve the best of all worlds when importing graphics generated by R into documents typeset to PDF with LaTeX. I look at postscript and PDF figures generated by R’s X11, Cairo, and finally the new (and awesome) TikZ devices.

Continue reading “R and LaTeX PDF graphics”