Posted by Tom Moertel
Fri, 27 Jan 2012 04:24:00 GMT
This has been done a million times before, but if you
haven’t seen it, it’s pretty neat. Let’s say you have a simple
recursive data structure like a binary tree:
module Tree where
data Tree a = Empty
| Node a (Tree a) (Tree a)
deriving (Eq, Show)
t0 = Empty
t1 = Node 1 Empty Empty
t3 = Node 2 (Node 1 Empty Empty) (Node 3 Empty Empty)
Now say you want to traverse the nodes, in “preorder.” So you write a
traversal function. It folds a binary operator f through
the values in the tree, starting with an initial accumulator value of
z. First, it visits the current node, then the left subtree, and
finally the right subtree, combining each value it encounters
with the current accumulator value to get the next accumulator
value. The final accumulator value is the result.
Spelling out the steps, it looks like this:
preorder_traversal f z tree = go tree z
where
go Empty z = z
go (Node v l r) z = let z' = f v z
z'' = go l z'
z''' = go r z''
in z'''
But if you wanted an “inorder” traversal instead, you’d write a
slightly different function, visiting the left subtree before the
current node:
inorder_traversal f z tree = go tree z
where
go Empty z = z
go (Node v l r) z = let z' = go l z
z'' = f v z'
z''' = go r z''
in z'''
To see how the two traversal functions work, let’s use both to flatten
the 3-node tree t3 we defined earlier.
flatten traversal = reverse . traversal (:) []
test0i = flatten inorder_traversal t3
test0p = flatten preorder_traversal t3
Great.
But now you start thinking about post-order traversal. Do you want
to write a third traversal function that’s almost the same as the
first two?
So you go back to the inorder traversal and start staring real hard at that let statement.
Can you rewrite it? Yes:
inorder_traversal2 f z tree = go tree z
where
go Empty z = z
go (Node v l r) z = go r . f v . go l $ z
And now you’ve got it. Just by reordering the applications of (go
l), (f v), and (go r), you can control the
traversal.
Which leads to a general traversal function that lets some other function
control that ordering:
traverse step f z tree = go tree z
where
go Empty z = z
go (Node v l r) z = step (f v) (go l) (go r) z
Doesn’t that look beautiful?
Now you can just spell out your desired traversals:
preorder = traverse $ \n l r -> r . l . n
inorder = traverse $ \n l r -> r . n . l
postorder = traverse $ \n l r -> n . r . l
A test drive:
test1p = flatten preorder t3
test1i = flatten inorder t3
test1o = flatten postorder t3
And you’re happy.
But can you go further? What if your trees are binary search trees
and you just want to find the minimum or maximum value? Can you
just traverse to the left or right?
Yes:
leftorder = traverse $ \n l r -> l . n
rightorder = traverse $ \n l r -> r . n
treemin = leftorder min maxBound
treemax = rightorder max minBound
test2l = treemin t3 :: Int
test2r = treemax t3 :: Int
Isn’t that neat?
Posted in functional programming
Tags fun, haskell, recursive, refactoring, trees
5 comments
no trackbacks

Posted by Tom Moertel
Wed, 28 Mar 2007 19:40:00 GMT
This article is part three in a series on introductory Haskell
programming. In the first
article,
we wrote a small program to recursively scan file-system directories
and print their contents as ASCII-art trees. In the second
article,
we refactored the program to make its logic more reusable by separating
the directory-scanning logic from the tree-printing logic. In this
article, we will address a shortcoming of the refactored version: It
must scan directory hierarchies completely before printing their
trees, i.e., it must scan and then print,
when doing both simultaneously is both more efficient and
more user friendly.
Recall from the previous article that our directory-printing program
is factored into three pieces of logic:
- fsTraverse, which traverses a file-system hierarchy and returns a tree data structure;
- showTree, which converts a tree into lovingly crafted ASCII art; and
- traverseAndPrint, which prints the tree for a file-system hierarchy by
using the first two pieces of logic.
The types of the functions are as follows:
fsTraverse :: Path -> DentName -> IO DirTree
showTree :: Tree String -> String
traverseAndPrint :: Path -> IO ()
Note that showTree is a pure function, but the other two return
IO actions that may have side effects.
Within traverseAndPrint, fsTraverse and showTree are combined
into a composite IO action by the =<< combinator:
putStr . showTree =<< fsTraverse root path
The sequencing semantics of Haskell’s IO monad forces all of the
effects of fsTraverse to complete before any following
effects can begin. To better understand these sequencing semantics,
let’s consider a simple example.
The IO-monad code,
can loosely be interpreted as running the action a, which forces
its side effects to occur, and then running the action b, which forces
its side effects to occur.
In reality, a and b are not actions. They are
functions. Like all Haskell functions, they are pure and have no side
effects. It’s just that a and b return values that
represent actions, and those actions may have side effects, and the
semantics of the IO monad guarantee the ordering of those effects
(should the actions end up being connected to the runtime’s
top-level IO action and executed). If you think that’s weird, hold
that thought. For now, all that’s important is that, if the composite
action represented by the expression
(a >> b) is executed, the
effects of a, regardless of how complex, will be executed
before the effects of b.
Thus if a represents building a tree by recursively scanning
a file-system hierarchy, the entire tree must be built before
b ever gets a chance to do its thing. For our particular
application, however, that particular sequencing is suboptimal. We
know from our earlier, monolithic implementation that the
file-system hierarchy can be scanned and printed simultaneously, which
is more efficient. Ideally, then, our refactored
implementation should be just as efficient.
In this article, we will look at one way to maintain the clean,
logical separation of the a part from the b part
while allowing the parts’ effects to be interleaved for efficiency.
We will use an extension to the Haskell language to make the
directory-scanning action lazy so that it builds the tree as the tree
is consumed.
Ready? Let’s dive in.
Read more...
Posted in haskell
Tags directory_tree_series, haskell, io, laziness, lazy, trees
9 comments
no trackbacks

Posted by Tom Moertel
Wed, 07 Mar 2007 21:04:00 GMT
In my previous article on writing a simple directory-tree printer in
Haskell,
I wrote a small program to recursively scan file-system
directories and print their contents as ASCII-art trees. The
program made for an approachable example of how to use Haskell for
“imperative” tasks, but it has a few problems.
First, the directory-scanning logic and tree-printing logic are
intertwined. Neither is reusable. Second, both bits of logic are
rigid, specialized for this particular task. Even if you could
reuse them, you wouldn’t want to.
In this article, the second in a series, we will explore ways to make
our original code more reusable. We will separate the directory
scanning from the tree printing, harness the power of some old
friends from Haskell’s libraries, and think about the costs
and benefits of our changes.
The plan
Recall our original directory tree–listing solution, the
core of which I will reprint below:
tlist path =
visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path
visit path leader tie arm node = do
putStrLn (leader ++ arm ++ tie ++ node)
visitChildren (path ++ "/" ++ node) (leader ++ extension)
where
extension = case arm of "" -> ""; "`" -> " "; _ -> "| "
visitChildren path leader =
whenM (doesDirectoryExist path) $ do
contents <- getDirectoryContents path
`catch` (\e -> return [show e])
let visibles = sort . filter (`notElem` [".", ".."]) $ contents
arms = replicate (length visibles 1) "|" ++ ["`"]
zipWithM_ (visit path leader "-- ") arms visibles
The tlist function kicks off the process for a particular
file-system path, handing off to visit which recursively descends the
directory tree from the root node. The visit function calls
visitChildren to expand the subtree, if any, for each node visited.
The visitChildren function, in turn, calls back to visit to
repeat the process for each child in the subtree.
In effect, we are traversing the tree rooted at path, printing each
node in passing.
To separate the traversal part from the printing part, we will
introduce a tree data structure. The file system–traversal code
will emit a tree, and the tree-showing code will consume a tree. We
will rewrite our old tlist function, which we might as well rename to
the more descriptive traverseAndPrint, to glue the two pieces
together with the tree serving as glue:
traverseAndPrint :: Path -> IO ()
traverseAndPrint path = do
tree <- fsTraverse root path
putStrLn (showTree tree)
where
root = if "/" `isPrefixOf` path then "" else "."
That’s the plan. Now let’s carry it out.
Read more...
Posted in haskell
Tags directory_tree_series, haskell, io, refactoring, trees
8 comments
no trackbacks

Posted by Tom Moertel
Thu, 22 Feb 2007 21:30:00 GMT
At a recent gathering, my friend Casey
mentioned that he was learning a new programming language and, as a
learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and
prints out a tree representation for each:
$ tlist cheating-hangman
cheating-hangman
|-- CVS
| |-- Entries
| |-- Repository
| `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl
I thought the problem sounded like fun, and so I wrote a small
solution in Haskell. Let’s take a look.
Read more...
Posted in programming, functional programming, haskell
Tags directory_tree_series, haskell, io, trees
10 comments
no trackbacks
