Posted by Tom Moertel
Sun, 15 Apr 2012 22:28:00 GMT
In software, as in life, every problem you solve is actually part of a
larger problem, an optimization problem in which you are trying to
squeeze as much as you can from a single scarce resource: yourself.
Every hour you spend solving one problem is an hour you no longer have
to solve other problems. At the end of your life, when you receive
the final tally of how you spent your scant allotment of hours, you
probably don’t want to discover that you wasted most of them on
problems that didn’t matter.
Which brings us to David Heinemeier Hansson’s recent post
on Testing like the
TSA, in
which he advises against wasting your time writing tests that are
unlikely to find problems. He’s right.
Whether you realize it or not, every time you write a test, you are
placing a bet. You are betting that the test’s expected payoff will
outweigh the test’s expected opportunity cost. Instead of writing
that test, would you be better off doing something else? For every
test you write, you are betting that the answer is no.
But being right about this answer – reliably right – is tricky. It
requires that you have a good understanding of which bets are likely
to pay off and which are not, which are cheap to place and which are
expensive. In other words, it requires that your internal probability
distributions be well calibrated for the kind of code you write and
the environment in which you code.
New programmers, unfortunately, are poorly calibrated. They have
little experience to draw upon and end up imitating other people’s
probability distributions. These they pick up, for better or worse,
from folklore – best practices, development methods, expert opinion,
the guy down the hall. As a result, new programmers concentrate their
nascent probability distributions around the things they learn first
and, consequently, over-bet on those things.
But even the best programmers over-bet on the things they know.
That’s because they under-bet in their blind spots, and we all have
blind spots.
In general, we never bet on the things we don’t know. If all you
know about getting your code right is TDD, you’ll never bet on types
or proofs or constructive correctness because you don’t know how to
place those bets. But those bets are often dirt cheap and pay in
spades. If you’re not betting on them at least some of the time,
whatever you are betting on probably costs more and pays less. You
could be doing better.
But to do better, you have to know what other bets are out there. You
have to know what it costs to place them and how likely they are to
pay off. To gain this knowledge, you’ll have to go beyond what is
comfortable; you’ll have to explore those dark, distant parts of the
casino and place some trial bets.
If you’ve never written a unit test, write some. If you always write
your tests after your main logic, write them before. If you always
write them before, write them after. If all you ever do is unit testing, try integration and system tests, too. If you’ve never done
QuickCheck-like property checking, do it. If you don’t know how to
prove things about your code, write some proofs. If you don’t know
how to represent important properties as types so that your computer
will prove them for you, now’s the time to learn. If you don’t know
how to design APIs so that certain classes of problems become
impossible, start practicing.
And, after you place your bets, pay attention to what happens down
the road. Learn from what you observe. Learn which bets are likely pay off
now, pay off later, and pay off never.
The more bets you know how to place, and the more you know
about their expected payoffs, the better you’ll place your
bets. And the better you place your bets, the more problems you’ll solve,
problems that actually matter, before you run of out of time.
Remember: Your supply of hours is finite. If you care about making those hours matter, learn how to bet them wisely. Both in software, and in life.
Edited 2012-04-30 16:56 to add note about integration and system testing and also about paying attention to what happens with your bets so that you learn about expected payoffs. Thanks to
grauenwolf for the editorial feedback.
Posted in testing
Tags bets, probability, proofs, testing
7 comments
no trackbacks

Posted by Tom Moertel
Mon, 30 Jan 2012 04:48:00 GMT
I have finally done it! With my recent post on tree traversals, I have managed to write 100 thousand words for my blog:
>> Article.find(:all).inject(0) { |sum,a| sum +=
?> (a.body + a.extended.to_s).split(/\s+/).length }
=> 100334
That sounds impressive until you realize that my first blog post, Fun with Asterisk, was about nine years ago. So we’re only talking, on average, about 11 thousand words per year. And that’s not hard, if you stick with it.
For me, the trick has been sticking with it. I joined a startup at the end of 2007, and my blogging abruptly lost about four fifths of its pace:

So I need to discipline myself to blog more frequently. I hope the next 100 thousand words won’t take so long to write.
Finally, I’d like to take this opportunity to thank you for reading and commenting. You’re the reason I wrote those words in the first place. You made the first 100 thousand words fun.
Thank you!
Your pal,
Tom Moertel
Posted in site news
Tags 100k, blog, blogging, statistics, writing
1 comment
no trackbacks

Posted by Tom Moertel
Fri, 27 Jan 2012 04:24:00 GMT
This has been done a million times before, but if you
haven’t seen it, it’s pretty neat. Let’s say you have a simple
recursive data structure like a binary tree:
module Tree where
data Tree a = Empty
| Node a (Tree a) (Tree a)
deriving (Eq, Show)
t0 = Empty
t1 = Node 1 Empty Empty
t3 = Node 2 (Node 1 Empty Empty) (Node 3 Empty Empty)
Now say you want to traverse the nodes, in “preorder.” So you write a
traversal function. It folds a binary operator f through
the values in the tree, starting with an initial accumulator value of
z. First, it visits the current node, then the left subtree, and
finally the right subtree, combining each value it encounters
with the current accumulator value to get the next accumulator
value. The final accumulator value is the result.
Spelling out the steps, it looks like this:
preorder_traversal f z tree = go tree z
where
go Empty z = z
go (Node v l r) z = let z' = f v z
z'' = go l z'
z''' = go r z''
in z'''
But if you wanted an “inorder” traversal instead, you’d write a
slightly different function, visiting the left subtree before the
current node:
inorder_traversal f z tree = go tree z
where
go Empty z = z
go (Node v l r) z = let z' = go l z
z'' = f v z'
z''' = go r z''
in z'''
To see how the two traversal functions work, let’s use both to flatten
the 3-node tree t3 we defined earlier.
flatten traversal = reverse . traversal (:) []
test0i = flatten inorder_traversal t3
test0p = flatten preorder_traversal t3
Great.
But now you start thinking about post-order traversal. Do you want
to write a third traversal function that’s almost the same as the
first two?
So you go back to the inorder traversal and start staring real hard at that let expression.
Can you rewrite it? Yes:
inorder_traversal2 f z tree = go tree z
where
go Empty z = z
go (Node v l r) z = go r . f v . go l $ z
And now you’ve got it. Just by reordering the applications of (go
l), (f v), and (go r), you can control the
traversal.
Which leads to a general traversal function that lets some other function
control that ordering:
traverse step f z tree = go tree z
where
go Empty z = z
go (Node v l r) z = step (f v) (go l) (go r) z
Doesn’t that look beautiful?
Now you can just spell out your desired traversals:
preorder = traverse $ \n l r -> r . l . n
inorder = traverse $ \n l r -> r . n . l
postorder = traverse $ \n l r -> n . r . l
A test drive:
test1p = flatten preorder t3
test1i = flatten inorder t3
test1o = flatten postorder t3
And you’re happy.
But can you go further? What if your trees are binary search trees
and you just want to find the minimum or maximum value? Can you
just traverse to the left or right?
Yes:
leftorder = traverse $ \n l r -> l . n
rightorder = traverse $ \n l r -> r . n
treemin = leftorder min maxBound
treemax = rightorder max minBound
test2l = treemin t3 :: Int
test2r = treemax t3 :: Int
Isn’t that neat?
Posted in functional programming
Tags fun, haskell, recursive, refactoring, trees
5 comments
no trackbacks

Posted by Tom Moertel
Mon, 07 Nov 2011 05:05:00 GMT
Recently, I needed a lightweight mocking solution when adding unit tests to some existing Python code. I could have used one of the many Python mocking libraries, but I only had to replace
a few module functions during testing, so I just wrote a tiny
mocking helper rather than add a dependency to the project.
The helper is simple and surprisingly versatile:
import contextlib
import functools
@contextlib.contextmanager
def mocked(func,
expecting=None, returning=None, raising=None, # specs
replacement_logic=None, called=1):
"""Stub out and mock a function for a yield's duration."""
if (returning, raising, replacement_logic).count(None) < 2:
raise ValueError("returning, raising, and replacement_logic "
"are incompatible with each other")
# default logic for implementing mock fn: return or raise per specs
def default_logic(*_args, **_kwds):
if raising:
raise raising
return returning
# prepare wrapper to replace mocked function for duration of yield
invocations = [0]
@functools.wraps(func)
def replacement(*args, **kwds):
if expecting is not None:
assert expecting == (args, kwds) # did we get expected args?
invocations[0] += 1
return (replacement_logic or default_logic)(*args, **kwds)
# replace mocked function, yield to test, and then check & clean up
module = sys.modules.get(func.__module__)
setattr(module, func.__name__, replacement)
try:
yield # give control back to test for a while
assert invocations[0] == called # was mock called enough?
finally:
setattr(module, func.__name__, func)
def uncalled(func):
"""Require that a function not be called for a yield's duration."""
return mocked(func, called=0)
The idea is to wrap test code that requires mocked
external functions with a mocking helper. Here’s an example:
def test_ml_errors_must_be_reported(self):
"""When an error occurs, it must be reported; nothing must be sent."""
data = self._request_data()
exception = mls.MlError()
exception.exc = '[[[error description]]]'
with mocked(mls.subscriber_count, raising=exception):
with uncalled(sms.send):
resp = self.app.request("/mailings", method='POST', data=data)
self.assertEqual(resp.status, '200 OK')
self.assertIn(exception.exc, resp.data) # must be reported
In this test method, I’m making a simulated POST request to a web
service that’s supposed to check how many users are in a mailing
list and then, possibly, send some SMS messages. In this case,
however, I want to simulate that an error occurs when talking to the
mailing-list service. The test must verify that the error is reported
and, crucially, that no messages are sent.
The library modules for talking to the mailing-list service and the
SMS service are called mls and sms. So,
for the duration of the simulated request, I’m replacing two functions
in these modules with mock versions.
The mock version of mls.subscriber_count, when called,
will raise the exception I’m trying to simulate. The mock version of
sms.send, however, must not be called; if it is, the
uncalled mocking helper will alert me by raising an
exception.
So the mocking helpers not only temporarily install mock
implementations of functions but also assert that those mock
implementations are (or are not) called as expected. In the following
code, for example, I use this capability to make sure that the
mailing-list service is asked to get the subscriber count for the
mailing list having the right key. I also simulate the service
returning a subscriber count of 123.
def test_unconfirmed_msgs_must_be_confirmed(self):
"""An unconfirmed msg must be confirmed and not sent."""
mlkey = 'TEST_ML_KEY'
body = 'Test message!'
data = self._request_data(mlkey=mlkey, body=body)
with mocked(mls.subscriber_count, expecting=((mlkey,),{}), returning=123):
with uncalled(sms.send):
resp = self.app.request("/mailings", method='POST', data=data)
self.assertEqual(resp.status, '200 OK')
self.assertIn('Reply CONFIRM', resp.data) # we must ask for confirmation
self.assertIn('123 subscribers', resp.data) # and supply subscriber count
self.assertIn(body, resp.data) # and the msg to be sent
The mocked helper lets you do a few more things, too, but
you get the idea: Sometimes a short helper function can take you a
long way.
Tags mocks, python, testing
no comments
no trackbacks

Posted by Tom Moertel
Wed, 02 Nov 2011 02:55:00 GMT
Back in 2007, I repaired an aging and fairly obscure A/V receiver that had lost the ability to respond to its remote control. This I did by re-soldering some hard-to-find solder joints that had broken on its circuit board.
On the chance that someone else had a similar problem, I posted some instructions and photos on my blog. I didn’t think much of it at the time.
But since then, every week or so, another comment shows up, thanking me for writing it. Some typical examples:
Fixed my Kenwood V6030D 10 minutes ago. Life is good again, Thanks mate….
Ditto! Worked on my VR-209 like a champ! Thanks!
Thank you for this posting, which I stumbled upon when I was researching the problem of my remote control no longer working for this receiver (VR-507). These pictures were invaluable to locate the faulty pins. (They sure are small.) Re-soldering them restored full functionality to the receiver and the original remote control. Good job!
Thanks, Fixed my KRF-8010D with this, been 5 years fighting with remote working now and then.
There are now about 60 comments like that. I never would have imagined that 60 people would have read the post let alone get out a soldering iron because of it. But they did! And it helped them!
Now, every time I’m feeling down, I Google up that post and read the thank-yous. It cheers me up.
So here’s the lesson: Write it down. If you’ve figured something out, even if it seems unimportant, write it down. Maybe someone else will find it helpful. Maybe a lot of someone elses will find it helpful.
You never know. It might even cheer you up someday.
Posted in interesting stuff
Tags helping, lessons, surprising, writing
5 comments
no trackbacks

Posted by Tom Moertel
Thu, 21 Jul 2011 00:16:00 GMT
In 2007, I counted how many words I had written for my blog and was surprised to discover it was 60 thousand, about a short novel’s worth. Just now, I stumbled upon that earlier count and wondered what the count was today.
>> Article.find(:all).inject(0) { |sum,a| sum +=
?> (a.body + a.extended.to_s).split(/\s+/).length }
=> 98702
I’m about 1300 words shy of 100 thousand.
Now I’ve got to write a few more posts for the blog.
:-)
Posted in writing
Tags blog, writing
no comments
no trackbacks

Posted by Tom Moertel
Sun, 10 Jul 2011 22:06:00 GMT
Recently, I wrote a simple plagiarism detector as a fun programming exercise. Then, merely a few days later, some company gave me cause to use it.
This company, it seems, was looking to hire a programmer. So they posted a job ad that was more or less word-for-word copied from a job ad that I had written for the company where I work. The duplication, being so extensive, was hard to miss. (The offending company, to its credit, promptly removed the copied ad from its web site when we let them know about it.)
I had originally written the text used in that ad back in December 2010, when we were starting another round of hiring. I had hoped that when the right kind of programmers read it, they would discern that we were programmers just like them, programmers who cared for their craft enough and who cared for their team enough to take hiring other programmers seriously. I didn’t want our ads to seem anything like those spat out by people just mouthing the words that everyone else was mouthing to “get talent.” So I worked on getting the words right, thinking the investment would somehow help us stand apart, if just a bit, when hiring.
But I had failed to consider that authenticity in job ads can be faked by just copying what seems authentic. So, a few days ago, when Googling for statistically unlikely phrases from the text I had written, I was actually surprised to discover that a number of companies and recruiters were now using my words, more or less unchanged, to signal how “authentic” they were.
At first, I was annoyed. But, upon reflection, I realized that the plagiarism was telling me that I had written something worth stealing. That’s a good thing, right?
After all, in a society where all too many people are willing to claim your words as their own, which is worse: to write something and have it stolen or to write something and have it not?
P.S. For the pedants who would point out that nothing was actually “stolen” here, please understand that steal has a well-established sense that means basically to plagiarize, as in T.S. Eliot’s quip that “Mediocre writers borrow. Great writers steal.”
Posted in writing
Tags hiring, plagiarism, recruiters, weaselry, writing
5 comments
no trackbacks

Posted by Tom Moertel
Thu, 16 Jun 2011 04:13:00 GMT
Recently, I wrote a simple plagiarism detector in Haskell using
n-grams. The problem it solves is this: You have a document, the
suspect, which you believe may be derived from other documents, the
sources. You want to find regions in the suspect that are identical
to those in the sources.
One simple solution is to use n-grams, which are just the n-word
neighborhoods taken from some sequence of words. For example, the
4-grams of the sequence a b c d e f are a b c d, b c d e, and
c d e f.
The idea for plagiarism detection with n-grams is that, as you
increase n, the probability that a given n-gram will be
shared between two arbitrary documents rapidly vanishes. For example,
Google says, right now, that the 2-gram “rapidly vanishes” appears in
about 11,000 documents, but the 5-gram “two different documents
rapidly vanishes” occurs in no documents. (Once Google indexes this
post, however, that will change.)
But when one document is copied partly from another, they will
probably share many long sequences of words. So, the sharing of longer
n-grams is evidence of copying.
It’s not proof, however. If two papers about U.S. history both contain
the 5-gram “The United States of America,” that coincidence is not surprising. But
when lots of unlikely sharing occurs, it’s strong evidence of some
form of common ancestry.
Fortunately, when copying occurs, it’s often easy to see, once
you know where to look. So my detector just annotates the suspect
document to indicate which portions of it share n-grams with the
source documents. Shared regions are converted to ALL CAPS. For
example, looking for shared 4-grams, my detector finds ample evidence
that the following paragraph was copied from the text above:
$ ./Plag 4 suspect.txt blog-post-plagiarism-detector.txt
But it's not rock solid. If two documents ABOUT U.S. HISTORY
BOTH CONTAIN THE 5-GRAM "THE UNITED STATES OF AMERICA,"
that's not shocking. BUT WHEN LOTS OF sharing occurs, that's
pretty good EVIDENCE OF SOME FORM OF COMMON ANCESTRY.
In fact, it was copied: I duplicated an earlier paragraph and then
changed a few words.
Using n-grams to detect sharing is not a perfect system,
but it’s a simple system, and, because its output is
easily interpreted, it’s an effective system. When your eyeballs see
a lot of ALL CAPS TEXT, it’s easy to judge what
it represents.
Anyway, here’s the code.
module Main where
import Control.Applicative ((<$>))
import Data.Char (toLower, toUpper, isAlphaNum, isSpace)
import Data.List (tails)
import qualified Data.Set as S
import System (getArgs)
main = do
n:suspect:sources <- getArgs
putStrLn =<< findMatches (read n) suspect sources
findMatches n suspect sources = do
db <- S.fromList . concat <$> mapM (loadNgrams n) sources
stxt <- readFile suspect
return $ concat (match n db (nGrams n stxt) (whiteWords stxt))
match :: Int -> S.Set [String] -> [[String]] -> [String] -> [String]
match n db ngs wws = mapAt (map toUpper) matchlocs wws
where
matchlocs = uniq . concat $ zipWith isMatch [0..] ngs
isMatch loc ng | ng `S.member` db = [loc .. loc + n 1]
| otherwise = []
mapAt :: (a -> a) -> [Int] -> [a] -> [a]
mapAt f locs ws = go 0 locs ws
where
go _ [] ws = ws
go _ _ [] = []
go i (l:ls) (w:ws) | i < l = w : go (i + 1) (l:ls) ws
go i (l:ls) (w:ws) | otherwise = f w : go (i + 1) ls ws
uniq = S.toList . S.fromList
loadNgrams n file = nGrams n <$> readFile file
nGrams :: Int -> String -> [[String]]
nGrams n = takeWhile ((n ==) . length) . map (take n) .
tails . map normalize . words
normalize = map toLower . filter isAlphaNum
whiteWords :: String -> [String]
whiteWords s = case span isSpace s of
("", "") -> []
(s1, "") -> [s1]
(s1, s') -> (s1 ++ w) : whiteWords s''
where
(w, s'') = break isSpace s'
Posted in programming
Tags haskell, ngrams, plagiarism
9 comments
no trackbacks

Posted by Tom Moertel
Tue, 05 Apr 2011 16:50:00 GMT
Has anyone tried the Presso hand-powered espresso maker? It seems that some people have figured out how to get good espresso from it.
I’m wondering if it’s reliably good espresso. If it is, the Presso might make a good take-into-work espresso maker.
Any experience reports?
Posted in espresso
Tags espresso, presso
6 comments
no trackbacks

Posted by Tom Moertel
Sat, 26 Mar 2011 16:14:00 GMT
I recently started redirecting requests for my blog’s RSS and Atom feeds to a consolidated feed with FeedBurner. If you’re reading my blog via one of these feeds – and you should be since my blog’s design is sufficiently horrible to burst your eyeballs if you try to read it directly – please let me know if the feeds break for you.
Examples of breakage:
- the feeds no longer update
- the content of the feeds looks garbled or goofy
- images or links are broken
- the feeds literally catch on fire and burn
If you see any of the above, please let me know.
Thanks!
Posted in site news
Tags feedburner, feeds, redirect
4 comments
no trackbacks
