<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Defective Semantics &#187; haskell</title>
	<atom:link href="http://scarff.id.au/blog/tag/haskell/feed/" rel="self" type="application/rss+xml" />
	<link>http://scarff.id.au</link>
	<description>Dean Scarff's perpetual struggle with technology, and other anecdotes</description>
	<lastBuildDate>Thu, 03 Nov 2011 22:39:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Aho-Corasick string matching in Haskell</title>
		<link>http://scarff.id.au/blog/2010/aho-corasick-string-matching-in-haskell/</link>
		<comments>http://scarff.id.au/blog/2010/aho-corasick-string-matching-in-haskell/#comments</comments>
		<pubDate>Sun, 18 Jul 2010 11:54:03 +0000</pubDate>
		<dc:creator>Dean</dc:creator>
				<category><![CDATA[Programs]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[haskell]]></category>

		<guid isPermaLink="false">http://scarff.id.au/?p=384</guid>
		<description><![CDATA[<p>The Aho-Corasick string matching algorithm constructs an automaton for matching a dictionary of patterns.  When applied to an input string, the automaton&#8217;s time complexity is linear in the length of the input, plus the number of matches (so at worst quadratic in the input).  It&#8217;s been around since 1975, but it isn&#8217;t implemented in the Haskell stringsearch library and I couldn&#8217;t even find a general trie data structure from google.  So I implemented the Aho-Corasick algorithm myself: take a look at the full Aho-Corasick module.</p>
<p>There was an interesting paper on deriving the algorithm as a result of applying fully-lazy evaluation and memoization on a more naive algorithm.  Unfortunately, applying fully-lazy evaluation and memoization to a function in Haskell is non-trivial (despite it being theoretically possible for the compiler to do so!).</p>
<p>It&#8217;s always interesting trying to find the functional equivalent to an imperative algorithm.  I ended up using some&#8230; <a href="http://scarff.id.au/blog/2010/aho-corasick-string-matching-in-haskell/" class="read_more">more</a></p>]]></description>
			<content:encoded><![CDATA[<p>The Aho-Corasick string matching algorithm constructs an automaton for matching a dictionary of patterns.  When applied to an input string, the automaton&#8217;s time complexity is linear in the length of the input, plus the number of matches (so at worst quadratic in the input).  It&#8217;s been around since 1975, but it isn&#8217;t implemented in the <a href="http://hackage.haskell.org/package/stringsearch">Haskell stringsearch library</a> and I couldn&#8217;t even find a general trie data structure from google.  So I implemented the Aho-Corasick algorithm myself: take a look at the <a href="/file/ahocorasick.hs">full Aho-Corasick module</a>.</p>
<p>There was an interesting paper on <a href="http://www.tuat.ac.jp/~k1kaneko/papers/j5.pdf">deriving the algorithm</a> as a result of applying fully-lazy evaluation and memoization on a more naive algorithm.  Unfortunately, applying <a href="http://www.haskell.org/haskellwiki/Maintaining_laziness">fully-lazy evaluation</a> and <a href="http://www.haskell.org/haskellwiki/Memoization">memoization</a> to a function in Haskell is non-trivial (despite it being theoretically possible for the compiler to do so!).</p>
<p>It&#8217;s always interesting trying to find the functional equivalent to an imperative algorithm.  I ended up using some cute Haskell tricks.</p>
<p><small class="postscript">Update: I&#8217;ve written an improved version of <a href="/blog/2010/improved-aho-corasick-in-haskell/">Aho-Corasick implemented with Data.Array and Data.Map</a></small></p>
<p><span id="more-384"></span></p>
<p>Instead of a <acronym title="breadth first search">BFS</acronym> to compute the failure function, I propagate a recursive function forward as the trie is constructed.  The separate <code>mkRoot</code> provides the base case with which to tie-the-knot.</p>
<pre class="codeblock haskell">
mkRoot xs = let root = Root (edge [] (sort xs) root) in root
mkTrie prefix f xs = Node goto prefix ((not.null) self) f
  where
    goto = edge prefix kids =&lt;&lt; (failTo f)
    (self, kids) = if null (head xs) then ([head xs], tail xs) else ([], xs)
</pre>
<p>Instead of using a list to implement the branches of a rose tree, I used partial-application over <code>edge</code>.  This certainly looks elegant, but in fact it is the weak point, as <code>withPrefix</code> is a linear search; the imperative approach is an O(1) lookup (with small alphabets) or O(log <i>m</i>) over <i class="math">m</i> branches.  Furthermore, the lazy evaluation of <code>edge</code> means that the trie is being constantly reconstructed as it is traversed by the automaton.</p>
<pre class="codeblock haskell">
data Trie = Node (Char -> Maybe Trie) String Bool Trie
          | Root (Char -> Maybe Trie)

edge :: String -> [String] -> Trie -> Char -> Maybe Trie
edge prefix xs f c =
  if null (withPrefix c)
  then Nothing
  else Just (mkTrie (c:prefix) f (map tail (withPrefix c)))
  where
    withPrefix c = takeWhile ((c==) . head) . dropWhile ((c>) . head) $ xs
</pre>
<p>Obviously it&#8217;s not generic over types or anything, but it should work fine with lists of types other than <code>Char</code>.</p>
<p>The following pathological case didn&#8217;t run too badly (25 seconds for m=50, n=100000 on <a href="/hosts#scud">scud</a>, compiled with <code>ghc -O2</code>).  Profiling it revealed 20 million entries into <code>edge</code>; which easily dominates the timing.  Oddly enough this just seems to be a large constant&#8212;other samples suggest it&#8217;s linear in the product <i class="math">m n</i>.</p>
<pre class="codeblock haskell">
main = do
  args &lt;- getArgs
  let
    (m:n:_) = map (fst . head . readDec) args
    patterns = (take m . tails . concat . take 25 . repeat) "ab"
    haystack = (concat . take n . repeat) "ab"
  putStr $ show (length (findMatches patterns haystack))
</pre>
]]></content:encoded>
			<wfw:commentRss>http://scarff.id.au/blog/2010/aho-corasick-string-matching-in-haskell/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>monadic parser combinators with parsec</title>
		<link>http://scarff.id.au/blog/2008/monadic-parser-combinators-with-parse/</link>
		<comments>http://scarff.id.au/blog/2008/monadic-parser-combinators-with-parse/#comments</comments>
		<pubDate>Thu, 22 May 2008 16:40:57 +0000</pubDate>
		<dc:creator>Dean</dc:creator>
				<category><![CDATA[Programs]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[haskell]]></category>
		<category><![CDATA[monads]]></category>
		<category><![CDATA[parsec]]></category>

		<guid isPermaLink="false">http://scarff.id.au/?p=19</guid>
		<description><![CDATA[<p>I decided to pull out Haskell for my Linguistics 1101 Phrase Structure Rules Assignment.  It seemed like a good opportunity to play with these monadic parser combinator things, which sound impressive if nothing else.  The result was pleasing, although I&#8217;m not sure if my tutor will appreciate it.</p>
<p>It was fun revisiting Haskell, and writing parsers directly using Parsec is certainly a novel alternative to using a Bison-style compiler-compiler.  Spirit was similar, but C++ can become so syntactically clunky some of the joy is lost.</p>
<p>I&#8217;m not sure whether it was something specific to the Parsec paradigm, my abuse of Parsec, or my ignorance of Haskell and monadic programming in general, but I kept finding myself on the wrong side of Monads and do-expressions.  It seems you have to use liftM a lot.</p>
<p>In looking for a generalisation of the liftM<i>n</i> functions I came up with:</p>
<pre class="codeblock haskell">
foldMLr</pre><p>&#8230; <a href="http://scarff.id.au/blog/2008/monadic-parser-combinators-with-parse/" class="read_more">more</a></p>]]></description>
			<content:encoded><![CDATA[<p>I decided to pull out Haskell for my Linguistics 1101 Phrase Structure Rules Assignment.  It seemed like a good opportunity to play with these monadic parser combinator things, which sound impressive if nothing else.  The result was <a href="http://ucc.asn.au/~dos/uni/ling1101/ass04/PhraseParser.hs">pleasing</a>, although I&#8217;m not sure if my tutor will appreciate it.</p>
<p>It was fun revisiting Haskell, and writing parsers directly using <a href="http://www.cs.uu.nl/people/daan/download/parsec/parsec.html">Parsec</a> is certainly a novel alternative to using a Bison-style compiler-compiler.  Spirit was similar, but C++ can become so syntactically clunky some of the joy is lost.</p>
<p>I&#8217;m not sure whether it was something specific to the Parsec paradigm, my abuse of Parsec, or my ignorance of Haskell and monadic programming in general, but I kept finding myself on the wrong side of Monads and do-expressions.  It seems you have to use liftM a lot.</p>
<p>In looking for a generalisation of the liftM<i>n</i> functions I came up with:</p>
<pre class="codeblock haskell">
foldMLr :: (Monad m) => (t -> a -> a) -> a -> [m t] -> m a
-- foldMLr f u xs binds the monads in xs headfirst, and folds their results
-- from the right using f and u as the rightmost.
foldMLr _ u [] = return u
foldMLr f u (x:xs) = do { a &lt;- x ; b &lt;- foldMLr f u xs ; return (f a b) }
-- equivalently:
-- foldMLr f u (x:xs) = liftM2 f x (foldMLr f u xs)
</pre>
<p>which is not the same as <code class="haskell">foldM</code> but is a generalisation of <code class="haskell">sequence</code>, which can be defined as <code class="haskell">foldMLr (:) []</code>.  I didn&#8217;t end up using it in the final parser.</p>
<p>Another issue was that constructing a parse tree (using Data.Tree types) was actually somewhat tedious.  I guess Parsec assumes that you want to fold up the result within the parsers.</p>
<p>Also watch out for the change in <code class="haskell">showErrorMessages</code>, in ghc it <a href="http://www.haskell.org/ghc/docs/latest/html/libraries/parsec/Text-ParserCombinators-Parsec-Error.html#v%253AshowErrorMessages">takes some extra initial string arguments</a> that weren&#8217;t there in <a href="http://legacy.cs.uu.nl/daan/download/parsec/parsec.html#showErrorMessages">the standalone release</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://scarff.id.au/blog/2008/monadic-parser-combinators-with-parse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

