The inner beauty of tree traversals

Posted by Tom Moertel Fri, 27 Jan 2012 04:24:00 GMT

This has been done a million times before, but if you haven’t seen it, it’s pretty neat. Let’s say you have a simple recursive data structure like a binary tree:

module Tree where

data Tree a = Empty
            | Node a (Tree a) (Tree a)
            deriving (Eq, Show)

-- some binary trees

t0 = Empty
t1 = Node 1 Empty Empty
t3 = Node 2 (Node 1 Empty Empty) (Node 3 Empty Empty)

Now say you want to traverse the nodes, in “preorder.” So you write a traversal function. It folds a binary operator f through the values in the tree, starting with an initial accumulator value of z. First, it visits the current node, then the left subtree, and finally the right subtree, combining each value it encounters with the current accumulator value to get the next accumulator value. The final accumulator value is the result.

Spelling out the steps, it looks like this:

preorder_traversal f z tree = go tree z
  where
    go Empty        z = z
    go (Node v l r) z = let z'   = f v z     -- current node
                            z''  = go l z'   -- left subtree
                            z''' = go r z''  -- right subtree
                        in z'''

But if you wanted an “inorder” traversal instead, you’d write a slightly different function, visiting the left subtree before the current node:

inorder_traversal f z tree = go tree z
  where
    go Empty        z = z
    go (Node v l r) z = let z'   = go l z    -- left subtree
                            z''  = f v z'    -- current node
                            z''' = go r z''  -- right subtree
                        in z'''

To see how the two traversal functions work, let’s use both to flatten the 3-node tree t3 we defined earlier.

flatten traversal = reverse . traversal (:) []

test0i = flatten inorder_traversal t3   -- [1,2,3]
test0p = flatten preorder_traversal t3  -- [2,1,3]

Great.

But now you start thinking about post-order traversal. Do you want to write a third traversal function that’s almost the same as the first two?

So you go back to the inorder traversal and start staring real hard at that let statement. Can you rewrite it? Yes:

inorder_traversal2 f z tree = go tree z
  where
    go Empty        z = z
    go (Node v l r) z = go r . f v . go l $ z

And now you’ve got it. Just by reordering the applications of (go l), (f v), and (go r), you can control the traversal.

Which leads to a general traversal function that lets some other function control that ordering:

traverse step f z tree = go tree z
  where
    go Empty        z = z
    go (Node v l r) z = step (f v) (go l) (go r) z

Doesn’t that look beautiful?

Now you can just spell out your desired traversals:

preorder   = traverse $ \n l r -> r . l . n
inorder    = traverse $ \n l r -> r . n . l
postorder  = traverse $ \n l r -> n . r . l

A test drive:

test1p = flatten preorder t3   -- [2,1,3]
test1i = flatten inorder t3    -- [1,2,3]
test1o = flatten postorder t3  -- [1,3,2]

And you’re happy.

But can you go further? What if your trees are binary search trees and you just want to find the minimum or maximum value? Can you just traverse to the left or right?

Yes:

leftorder  = traverse $ \n l r -> l . n
rightorder = traverse $ \n l r -> r . n

treemin = leftorder min maxBound
treemax = rightorder max minBound

test2l = treemin t3 :: Int  -- 1
test2r = treemax t3 :: Int  -- 3

Isn’t that neat?

Posted in
Tags , , , ,
5 comments
no trackbacks
Reddit Delicious

Directory-tree printing in Haskell, part three: lazy I/O

Posted by Tom Moertel Wed, 28 Mar 2007 19:40:00 GMT

This article is part three in a series on introductory Haskell programming. In the first article, we wrote a small program to recursively scan file-system directories and print their contents as ASCII-art trees. In the second article, we refactored the program to make its logic more reusable by separating the directory-scanning logic from the tree-printing logic. In this article, we will address a shortcoming of the refactored version: It must scan directory hierarchies completely before printing their trees, i.e., it must scan and then print, when doing both simultaneously is both more efficient and more user friendly.

Recall from the previous article that our directory-printing program is factored into three pieces of logic:

  1. fsTraverse, which traverses a file-system hierarchy and returns a tree data structure;
  2. showTree, which converts a tree into lovingly crafted ASCII art; and
  3. traverseAndPrint, which prints the tree for a file-system hierarchy by using the first two pieces of logic.

The types of the functions are as follows:

fsTraverse       :: Path -> DentName -> IO DirTree
showTree         :: Tree String -> String
traverseAndPrint :: Path -> IO ()

Note that showTree is a pure function, but the other two return IO actions that may have side effects.

Within traverseAndPrint, fsTraverse and showTree are combined into a composite IO action by the =<< combinator:

putStr . showTree =<< fsTraverse root path

The sequencing semantics of Haskell’s IO monad forces all of the effects of fsTraverse to complete before any following effects can begin. To better understand these sequencing semantics, let’s consider a simple example.

The IO-monad code,

a >> b

can loosely be interpreted as running the action a, which forces its side effects to occur, and then running the action b, which forces its side effects to occur.

In reality, a and b are not actions. They are functions. Like all Haskell functions, they are pure and have no side effects. It’s just that a and b return values that represent actions, and those actions may have side effects, and the semantics of the IO monad guarantee the ordering of those effects (should the actions end up being connected to the runtime’s top-level IO action and executed). If you think that’s weird, hold that thought. For now, all that’s important is that, if the composite action represented by the expression (a >> b) is executed, the effects of a, regardless of how complex, will be executed before the effects of b.

Thus if a represents building a tree by recursively scanning a file-system hierarchy, the entire tree must be built before b ever gets a chance to do its thing. For our particular application, however, that particular sequencing is suboptimal. We know from our earlier, monolithic implementation that the file-system hierarchy can be scanned and printed simultaneously, which is more efficient. Ideally, then, our refactored implementation should be just as efficient.

In this article, we will look at one way to maintain the clean, logical separation of the a part from the b part while allowing the parts’ effects to be interleaved for efficiency. We will use an extension to the Haskell language to make the directory-scanning action lazy so that it builds the tree as the tree is consumed.

Ready? Let’s dive in.

Read more...

Posted in
Tags , , , , ,
9 comments
no trackbacks
Reddit Delicious

Directory-tree printing in Haskell, part two: refactoring

Posted by Tom Moertel Wed, 07 Mar 2007 21:04:00 GMT

In my previous article on writing a simple directory-tree printer in Haskell, I wrote a small program to recursively scan file-system directories and print their contents as ASCII-art trees. The program made for an approachable example of how to use Haskell for “imperative” tasks, but it has a few problems.

First, the directory-scanning logic and tree-printing logic are intertwined. Neither is reusable. Second, both bits of logic are rigid, specialized for this particular task. Even if you could reuse them, you wouldn’t want to.

In this article, the second in a series, we will explore ways to make our original code more reusable. We will separate the directory scanning from the tree printing, harness the power of some old friends from Haskell’s libraries, and think about the costs and benefits of our changes.

The plan

Recall our original directory tree–listing solution, the core of which I will reprint below:

tlist path =
    visit (if "/" `isPrefixOf` path then "" else ".") "" "" "" path

visit path leader tie arm node = do
    putStrLn (leader ++ arm ++ tie ++ node)
    visitChildren (path ++ "/" ++ node) (leader ++ extension)
  where
    extension = case arm of ""  -> ""; "`" -> "    "; _   -> "|   "

visitChildren path leader =
    whenM (doesDirectoryExist path) $ do
        contents <- getDirectoryContents path
            `catch` (\e -> return [show e])
        let visibles = sort . filter (`notElem` [".", ".."]) $ contents
            arms = replicate (length visibles - 1) "|" ++ ["`"]
        zipWithM_ (visit path leader "-- ") arms visibles

The tlist function kicks off the process for a particular file-system path, handing off to visit which recursively descends the directory tree from the root node. The visit function calls visitChildren to expand the subtree, if any, for each node visited. The visitChildren function, in turn, calls back to visit to repeat the process for each child in the subtree. In effect, we are traversing the tree rooted at path, printing each node in passing.

To separate the traversal part from the printing part, we will introduce a tree data structure. The file system–traversal code will emit a tree, and the tree-showing code will consume a tree. We will rewrite our old tlist function, which we might as well rename to the more descriptive traverseAndPrint, to glue the two pieces together with the tree serving as glue:

traverseAndPrint :: Path -> IO ()
traverseAndPrint path = do
    tree <- fsTraverse root path
    putStrLn (showTree tree)
  where
    root = if "/" `isPrefixOf` path then "" else "."

That’s the plan. Now let’s carry it out.

Read more...

Posted in
Tags , , , ,
8 comments
no trackbacks
Reddit Delicious

A simple directory-tree printer in Haskell

Posted by Tom Moertel Thu, 22 Feb 2007 21:30:00 GMT

At a recent gathering, my friend Casey mentioned that he was learning a new programming language and, as a learning exercise, had written a directory-tree printer. That’s a program that recursively scans directory hierarchies and prints out a tree representation for each:

$ tlist cheating-hangman

cheating-hangman
|-- CVS
|   |-- Entries
|   |-- Repository
|   `-- Root
|-- Makefile
|-- cheating-hangman.lhs
`-- cheating-hangman.pl

I thought the problem sounded like fun, and so I wrote a small solution in Haskell. Let’s take a look.

Read more...

Posted in , ,
Tags , , ,
10 comments
no trackbacks
Reddit Delicious