<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Defective Semantics</title>
	<atom:link href="http://scarff.id.au/feed/" rel="self" type="application/rss+xml" />
	<link>http://scarff.id.au</link>
	<description>Dean Scarff's perpetual struggle with technology, and other anecdotes</description>
	<lastBuildDate>Sat, 31 Jul 2010 02:44:44 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>XHTML fixes for the WordPress reCAPTCHA plugin</title>
		<link>http://scarff.id.au/blog/2010/xhtml-fixes-for-the-wordpress-recaptcha-plugin/</link>
		<comments>http://scarff.id.au/blog/2010/xhtml-fixes-for-the-wordpress-recaptcha-plugin/#comments</comments>
		<pubDate>Thu, 29 Jul 2010 16:01:39 +0000</pubDate>
		<dc:creator>Dean</dc:creator>
				<category><![CDATA[Problems]]></category>
		<category><![CDATA[wordpress xhtml]]></category>

		<guid isPermaLink="false">http://scarff.id.au/?p=456</guid>
		<description><![CDATA[<p>The wp-recaptcha plugin for WordPress breaks when you&#8217;re serving pages as application/xhtml+xml.  I inadvertently broke comments when I installed it (silly me for not testing!).  I&#8217;ve written a patch that fixes it.</p>
<p><span id="more-456"></span></p>
<p>Under firefox, you get an error like:</p>
<pre>
Error: uncaught exception:
[Exception... "Operation is not supported"  code: "9"
 nsresult: "0x80530009 (NS_ERROR_DOM_NOT_SUPPORTED_ERR)"
 location: "http://www.google.com/recaptcha/api/challenge?k=xxx Line: 12"]
</pre>
<p>While under Chrome it&#8217;s</p>
<pre>
Uncaught TypeError: Object #&#60;a Document&#62; has no method 'write'
api.recaptcha.net/challengek=xxx
</pre>
<p>The default javascript API uses <code>document.write</code>, which isn&#8217;t a DOM method and hence is not a method of true XML documents.  It&#8217;s not a new issue either, wp-recaptcha has had a history of breaking XHTML.  The thing is, the WordPress plugin (which uses the PHP library by the recaptcha people) has an option to &#8220;Be XHTML 1.0 Strict compliant&#8221;; but this only fixes the use of iframes!</p>
<p>The real solution is to use the reCAPTCHA&#8230; <a href="http://scarff.id.au/blog/2010/xhtml-fixes-for-the-wordpress-recaptcha-plugin/" class="read_more">more</a></p>]]></description>
			<content:encoded><![CDATA[<p>The wp-recaptcha plugin for WordPress breaks when you&#8217;re serving pages as application/xhtml+xml.  I inadvertently broke comments when I installed it (silly me for not testing!).  I&#8217;ve written a patch that fixes it.</p>
<p><span id="more-456"></span></p>
<p>Under firefox, you get an error like:</p>
<pre>
Error: uncaught exception:
[Exception... "Operation is not supported"  code: "9"
 nsresult: "0x80530009 (NS_ERROR_DOM_NOT_SUPPORTED_ERR)"
 location: "http://www.google.com/recaptcha/api/challenge?k=xxx Line: 12"]
</pre>
<p>While under Chrome it&#8217;s</p>
<pre>
Uncaught TypeError: Object #&lt;a Document&gt; has no method 'write'
api.recaptcha.net/challengek=xxx
</pre>
<p>The default javascript API uses <code>document.write</code>, which isn&#8217;t a DOM method and hence is not a method of true XML documents.  It&#8217;s not a new issue either, wp-recaptcha has had a history of <a href="http://rickardandersson.com/recaptcha-and-xhtml">breaking XHTML</a>.  The thing is, the WordPress plugin (which uses the PHP library by the recaptcha people) has an option to &#8220;Be XHTML 1.0 Strict compliant&#8221;; but this only fixes the use of iframes!</p>
<p>The real solution is to use the <a href="http://code.google.com/apis/recaptcha/docs/display.html#AJAX">reCAPTCHA AJAX API</a>, which for whatever reason isn&#8217;t exposed in the PHP library.  You can grab my <a href="http://github.com/p00ya/wp-recaptcha">xml fix for wp-recaptcha</a> from github.</p>
<p><small class="postscript">Updated: pushed fork to github</small></p>
]]></content:encoded>
			<wfw:commentRss>http://scarff.id.au/blog/2010/xhtml-fixes-for-the-wordpress-recaptcha-plugin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Duplicating ggplot axis labels</title>
		<link>http://scarff.id.au/blog/2010/duplicating-ggplot-axis-labels/</link>
		<comments>http://scarff.id.au/blog/2010/duplicating-ggplot-axis-labels/#comments</comments>
		<pubDate>Wed, 21 Jul 2010 14:39:36 +0000</pubDate>
		<dc:creator>Dean</dc:creator>
				<category><![CDATA[Problems]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[ggplot]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://scarff.id.au/?p=392</guid>
		<description><![CDATA[<p>I&#8217;ve been trying for a while to find an elegant solution for duplicating axis ticks and labels in a ggplot chart.  Hadley replied on the ggplot2 mailing list, but a working solution within ggplot2 seems a way off.</p>
<p>The situation is this: imagine you have a faceted plot that is tall enough that the x-axis ticks and labels become obscured (e.g. when using a clipped viewport such as a browser window).  This is particularly destructive when you&#8217;re using an x-scale with manual breaks or a transformation.</p>
<pre class="codeblock R">
library(ggplot2)
g &#60;- ggplot(diamonds, aes(carat, ..density..)) +
   geom_histogram(aes(fill = clarity), binwidth = 0.2) +
   facet_grid(cut ~ .)
print(g)
</pre>
<p>There simply isn&#8217;t a way to repeat the x-axis labels in ggplot2 at the moment without discarding faceting and rendering each facet as a separate ggplot call.  I&#8217;ve seen some examples of selective plotting used to good effect in combining multiple plots&#8230; <a href="http://scarff.id.au/blog/2010/duplicating-ggplot-axis-labels/" class="read_more">more</a></p>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been trying for a while to find an elegant solution for duplicating axis ticks and labels in a ggplot chart.  Hadley <a href="http://groups.google.com/group/ggplot2/browse_thread/thread/499e438a7c79994f">replied on the ggplot2 mailing list</a>, but a working solution within ggplot2 seems a way off.</p>
<p>The situation is this: imagine you have a faceted plot that is tall enough that the x-axis ticks and labels become obscured (e.g. when using a clipped viewport such as a browser window).  This is particularly destructive when you&#8217;re using an x-scale with manual breaks or a transformation.</p>
<pre class="codeblock R">
library(ggplot2)
g &lt;- ggplot(diamonds, aes(carat, ..density..)) +
   geom_histogram(aes(fill = clarity), binwidth = 0.2) +
   facet_grid(cut ~ .)
print(g)
</pre>
<p>There simply isn&#8217;t a way to repeat the x-axis labels in ggplot2 at the moment without discarding faceting and rendering each facet as a separate ggplot call.  I&#8217;ve seen some <a href="http://learnr.wordpress.com/2009/05/26/ggplot2-two-or-more-plots-sharing-the-same-legend/">examples of selective plotting</a> used to good effect in combining multiple plots with common elements, but I can&#8217;t find anything applicable to keep consistent scales and binning without duplicating a lot of the (internal) facet and bin logic.</p>
<p><span id="more-392"></span></p>
<p>Instead my best shot was to clone some of the grob elements and redraw them at different locations:</p>
<pre class="codeblock R">
grob &lt;- ggplotGrob(g)
xtext &lt;- getGrob(grob, "layout::axis_h::axis.text", grep = TRUE)
xtext &lt;- editGrob(xtext, gp = gpar(fontsize = 8))
downViewport("background::panels::layout::axis_h-13-3") # ids from grid.ls()
pushViewport(viewport(y = unit(34, "npc"), name = "axis_h_rep-1"))
 grid.draw(xtext)
popViewport()
</pre>
<p>Unfortunately I couldn&#8217;t find a consistent way of querying the grid graphics internals for the measurements necessary to move the &#8220;mirrored&#8221; axis labels to the right place.  The 34 there is a magic number I found with <code>grid.locator()</code> and trial-and-error; it changes depending on the graphics device.  At one point I hoped I could clone the entire <code>axis_h</code> viewport, pry some vertical space from in between the facet panels, and paste the clones in between.  Unfortunately grid layouts don&#8217;t seem to be very mutable once they&#8217;ve been created, and redrawing the text grob seemed like the best I could do to reuse the output.</p>
<p>While looking at the <a href="http://stackoverflow.com/questions/1532535/showing-multiple-axis-labels-using-ggplot2-with-facet-wrap-in-r">stackoverflow answer for the same problem</a>, I came across Harlan&#8217;s assessment:</p>
<blockquote><p>
GGplot&#8217;s philosophy is about doing the right thing with a minimum of customization, which means, naturally, that you can&#8217;t customize things as much as other packages.
</p></blockquote>
<p>This is more significant when contrasted with the context of R itself; in R, <a href="http://www.r-project.org/about.html">the user retains full control</a>.  Coming up against ggplot&#8217;s choice of only exposing high-level primitives often leaves the user with the choice of:</p>
<ul>
<li>accepting The Way ggplot Does Things and not getting what they want</li>
<li>waiting for Hadley to write a patch (next summer, if you&#8217;re lucky?)</li>
<li>wading through ggplot internals so you can duplicate its functionality with <code>plyr</code> and <code>grid</code> calls</li>
<li>abandoning ggplot completely</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://scarff.id.au/blog/2010/duplicating-ggplot-axis-labels/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Aho-Corasick string matching in Haskell</title>
		<link>http://scarff.id.au/blog/2010/aho-corasick-string-matching-in-haskell/</link>
		<comments>http://scarff.id.au/blog/2010/aho-corasick-string-matching-in-haskell/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 11:54:03 +0000</pubDate>
		<dc:creator>Dean</dc:creator>
				<category><![CDATA[Programs]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[haskell]]></category>

		<guid isPermaLink="false">http://scarff.id.au/?p=384</guid>
		<description><![CDATA[<p>The Aho-Corasick string matching algorithm constructs an automaton for matching a dictionary of patterns.  When applied to an input string, the automaton&#8217;s time complexity is linear in the length of the input, plus the number of matches (so at worst quadratic in the input).  It&#8217;s been around since 1975, but it isn&#8217;t implemented in the Haskell stringsearch library and I couldn&#8217;t even find a general trie data structure from google.  So I implemented the Aho-Corasick algorithm myself: take a look at the full Aho-Corasick module.</p>
<p>There was an interesting paper on deriving the algorithm as a result of applying fully-lazy evaluation and memoization on a more naive algorithm.  Unfortunately, applying fully-lazy evaluation and memoization to a function in Haskell is non-trivial (despite it being theoretically possible for the compiler to do so!).</p>
<p>It&#8217;s always interesting trying to find the functional equivalent to an imperative algorithm.  I ended up using some&#8230; <a href="http://scarff.id.au/blog/2010/aho-corasick-string-matching-in-haskell/" class="read_more">more</a></p>]]></description>
			<content:encoded><![CDATA[<p>The Aho-Corasick string matching algorithm constructs an automaton for matching a dictionary of patterns.  When applied to an input string, the automaton&#8217;s time complexity is linear in the length of the input, plus the number of matches (so at worst quadratic in the input).  It&#8217;s been around since 1975, but it isn&#8217;t implemented in the <a href="http://hackage.haskell.org/package/stringsearch">Haskell stringsearch library</a> and I couldn&#8217;t even find a general trie data structure from google.  So I implemented the Aho-Corasick algorithm myself: take a look at the <a href="/file/ahocorasick.hs">full Aho-Corasick module</a>.</p>
<p>There was an interesting paper on <a href="http://www.tuat.ac.jp/~k1kaneko/papers/j5.pdf">deriving the algorithm</a> as a result of applying fully-lazy evaluation and memoization on a more naive algorithm.  Unfortunately, applying <a href="http://www.haskell.org/haskellwiki/Maintaining_laziness">fully-lazy evaluation</a> and <a href="http://www.haskell.org/haskellwiki/Memoization">memoization</a> to a function in Haskell is non-trivial (despite it being theoretically possible for the compiler to do so!).</p>
<p>It&#8217;s always interesting trying to find the functional equivalent to an imperative algorithm.  I ended up using some cute Haskell tricks.</p>
<p><span id="more-384"></span></p>
<p>Instead of a <acronym title="breadth first search">BFS</acronym> to compute the failure function, I propagate a recursive function forward as the trie is constructed.  The separate <code>mkRoot</code> provides the base case with which to tie-the-knot.</p>
<pre class="codeblock haskell">
mkRoot xs = let root = Root (edge [] (sort xs) root) in root
mkTrie prefix f xs = Node goto prefix ((not.null) self) f
  where
    goto = edge prefix kids =&lt;&lt; (failTo f)
    (self, kids) = if null (head xs) then ([head xs], tail xs) else ([], xs)
</pre>
<p>Instead of using a list to implement the branches of a rose tree, I used partial-application over <code>edge</code>.  This certainly looks elegant, but in fact it is the weak point, as <code>withPrefix</code> is a linear search; the imperative approach is an O(1) lookup (with small alphabets) or O(log <i>m</i>) over <i class="math">m</i> branches.  Furthermore, the lazy evaluation of <code>edge</code> means that the trie is being constantly reconstructed as it is traversed by the automaton.</p>
<pre class="codeblock haskell">
data Trie = Node (Char -> Maybe Trie) String Bool Trie
          | Root (Char -> Maybe Trie)

edge :: String -> [String] -> Trie -> Char -> Maybe Trie
edge prefix xs f c =
  if null (withPrefix c)
  then Nothing
  else Just (mkTrie (c:prefix) f (map tail (withPrefix c)))
  where
    withPrefix c = takeWhile ((c==) . head) . dropWhile ((c>) . head) $ xs
</pre>
<p>Overall it doesn&#8217;t run too bad (25 seconds on <a href="/hosts#scud">scud</a> with 50 pathological patterns and 100K of input, compiling with <code>ghc -O2</code>).  Obviously it&#8217;s not generic over types or anything, but it should work fine with lists of types other than <code>Char</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://scarff.id.au/blog/2010/aho-corasick-string-matching-in-haskell/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>R and LaTeX PDF graphics</title>
		<link>http://scarff.id.au/blog/2010/r-and-latex-pdf-graphics/</link>
		<comments>http://scarff.id.au/blog/2010/r-and-latex-pdf-graphics/#comments</comments>
		<pubDate>Mon, 17 May 2010 08:09:16 +0000</pubDate>
		<dc:creator>Dean</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[fonts]]></category>
		<category><![CDATA[LaTeX]]></category>
		<category><![CDATA[pdf]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[TeX]]></category>

		<guid isPermaLink="false">http://scarff.id.au/?p=349</guid>
		<description><![CDATA[<p>When writing a document in LaTeX that makes use of figures from R, I want to produce a PDF with</p>
<ul>
<li>vector graphics,</li>
<li>consistent fonts,</li>
<li>not to mess around overlaying text in LaTeX,</li>
</ul>
<p>and maybe typeset math in the R graphics.  This post surveys the state of the art in how to achieve the best of all worlds when importing graphics generated by R into documents typeset to PDF with LaTeX.  I look at postscript and PDF figures generated by R&#8217;s X11, Cairo, and finally the new (and awesome) TikZ devices.</p>
<p><span id="more-349"></span></p>
<p>If you&#8217;re using some TeX variant, you probably care a lot about the professional presentation of your document.  Accordingly, you probably cringe when you see figures in a PDF document that aren&#8217;t quite as crisp, or have inferior fonts and math typesetting.  Even some of the Use R books fail blatantly in this regard.  You&#8217;d expect the&#8230; <a href="http://scarff.id.au/blog/2010/r-and-latex-pdf-graphics/" class="read_more">more</a></p>]]></description>
			<content:encoded><![CDATA[<p>When writing a document in LaTeX that makes use of figures from R, I want to produce a PDF with</p>
<ul>
<li>vector graphics,</li>
<li>consistent fonts,</li>
<li>not to mess around overlaying text in LaTeX,</li>
</ul>
<p>and maybe typeset math in the R graphics.  This post surveys the state of the art in how to achieve the best of all worlds when importing graphics generated by R into documents typeset to PDF with LaTeX.  I look at postscript and PDF figures generated by R&#8217;s X11, Cairo, and finally the new (and awesome) TikZ devices.</p>
<p><span id="more-349"></span></p>
<p>If you&#8217;re using some TeX variant, you probably care a lot about the professional presentation of your document.  Accordingly, you probably cringe when you see figures in a PDF document that aren&#8217;t quite as crisp, or have inferior fonts and math typesetting.  Even some of the <a href="http://www.springer.com/series/6991">Use R</a> books fail blatantly in this regard.  You&#8217;d expect the <a href="http://had.co.nz/ggplot2/book/qplot.pdf">ggplot2 book</a> to espouse the prettiness of its output, but it has raster figures and an ugly sans-serif font&mdash;a prudent optimisation for file size and rendering speed nevertheless.</p>
<p>The traditional way to get R graphics that are consistent with LaTeX fonts was with a LaTeX toolchain via DVI and postscript.  You can render R graphics to a postscript device that uses the appropriate TeX encoding, according to the instructions in <a href="http://bm2.genes.nig.ac.jp/RGM2/R_current/library/grDevices/html/postscript.html">the R manual for postscript {grDevices}</a>.  I usually run ps2eps on the result to fix the bounding box and other issues; you can also use <code>par(mar=)</code> from R according to the manual.  In the LaTeX preamble you add <code>\usepackage[T1]{fontenc}</code> and then after the document is typeset and it&#8217;s converted from DVI to PDF, the fonts are all magically correct.</p>
<p>However, with some of the new <a href="http://www.ctan.org/tex-archive/macros/latex/contrib/microtype/">typography features</a> (and the shorter toolchain that doesn&#8217;t require ghostscript) available with pdfLaTeX it&#8217;s understandable if you want to move away from plain LaTeX.  However, when you use pdfTeX, you can&#8217;t include EPS graphics.</p>
<p>R is no exception; I&#8217;ve always found integrating graphics with pdfLaTeX to be <acronym title="situation normal, all fouled up">SNAFU</acronym>.  I can personally attest to <a href="http://tclab.kaist.ac.kr/ipe/">ipe</a> (if you can get it to build) and <a href="http://asymptote.sourceforge.net/">asymptote</a> looks good although I&#8217;ve yet to try it.  Using a metapost intermediate format worked well from gnuplot.  xfig is horrible and I detest manually writing any metapost or pstricks.</p>
<p>R can produce PDF output, but R&#8217;s basic PDF device doesn&#8217;t have the same font family and encoding options as its postscript device: you will get the error <q>unknown family &#8216;ComputerModern&#8217;</q> if you try to supply the same parameters.  There are some <a href="http://www.mail-archive.com/r-help@r-project.org/msg01322.html">amusing suggestions</a> for dealing with this, but a lot of the material that turns up from a google search is outdated.</p>
<p>One workaround I tried was to convert the EPS that worked with the previous LaTeX &#8594; DVI &#8594; PDF toolchain.  However:</p>
<ul>
<li>pstopdf results in the correct font, but weird kerning</li>
<li>ps2pdf results in the correct font, weird kerning, and weird scaling/bounding box issues</li>
<li>Apple&#8217;s Preview.app machinery lost the font information</li>
</ul>
<p>Instead of using the EPS route, we can just use a <em>similar</em> font that R does support exporting directly to PDF.  The <a href="http://cm-unicode.sourceforge.net/">Computer Modern &#8211; Unicode</a> project has some ports of the Computer Modern family to OpenType, which R can then utilise.</p>
<p>The X11 device (on Mac OS X) works perfectly with <code class="R">par(family="CMU Serif")</code>, but the pdf device again complains that it&#8217;s not a postscript font, and even if you try to use postscript versions of Computer Modern directly there is an encoding issue.  Paul Murrell has an excellent writeup of <a href="http://www.stat.auckland.ac.nz/~paul/R/CM/CMR.html">using cm-lgc</a> to create PDF output with an alternative encoding for the type 1 CM fonts.  I couldn&#8217;t get either CairoX11 or CairoPDF to produce output with CM, but I suspect Murrell&#8217;s method will fix Cairo too.  Quartz works out of the box if you happen to use Mac OS X.</p>
<pre class="codeblock R">
quartz(type="pdf", file="test-quartz.pdf")
par(family="CMU Serif")
</pre>
<p>Ligatures don&#8217;t work, but I can live with that.  Quartz even manages to typeset math made with <code>expression</code>, but in a way that will stand out as blindingly ugly in a TeX document.</p>
<p>This all brings us to the true state of the art (<acronym title="from what I can see">FWICS</acronym>): <a href="http://cran.r-project.org/web/packages/tikzDevice/index.html">tikzDevice</a>.  This project is still in beta, and their <a href="http://tikzdevice.r-forge.r-project.org/">project page</a> has the eerie bareness of archaic Alexandria.  Nevertheless, once I figured out how to install it, I was impressed.</p>
<pre class="codeblock R">
# from R
install.packages("filehash")
install.packages("tikzDevice", dependencies=TRUE,
  repos="http://R-Forge.R-project.org",
  type="source")
</pre>
<p>I took the opportunity to switch from macports&#8217; TeXlive to the MacTeX-derived BasicTeX, and installed a couple of necessary packages:</p>
<pre class="codeblock sh">
# from sh
sudo /usr/texbin/tlmgr install pgf preview
</pre>
<p>It&#8217;s as easy as <code class="R">tikz("tikzfig.tex")</code> and your usual plotting commands to generate a figure, then:</p>
<pre class="codeblock latex">
% in the preamble:
\usepackage{tikz}
\usepackage{color}
% ...
\begin{document}
% ... then in the body:
\begin{figure}[p]
\input{tikzfig}
\caption{TikZ}
\end{figure}
\end{document}
</pre>
<p>It processes all the text in the figure with LaTeX, including math marked up with &#8216;$&#8217;s.  However, don&#8217;t bother using R <code>expression</code>s in your labels: symbols and accents are positioned by R as separate nodes, and would-be TeX specials aren&#8217;t even escaped.</p>
]]></content:encoded>
			<wfw:commentRss>http://scarff.id.au/blog/2010/r-and-latex-pdf-graphics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
