Over dinner tonight, my daughter told us a story about how they were working on a Word Find in school today and she found the word “ELF”. She was fascinated by the fact that the theme of the Word Find was supposed to be Thanksgiving and that the word was not listed among the words to find.
The story started me thinking.
What do you suppose is the probability that in an m x m Word Find (meaning m rows and m columns) you would find a given three letter word, like “ELF”?
To answer the question, you must first lay out some ground rules, or assumptions as we call them in mathematical modeling. First, let’s assume that the letters are distributed randomly and uniformly. Second, we’ll assume that there are three orientations to finding the word: vertical, horizontal and diagonal. As with most of these puzzles I’ve every done and especially the ones my daughter is now doing as a 10 year old, we’ll also expect to find the word forward or backward.
For a single 3 letter random string, the probability that it matches our given word would clearly be
because the likelihood of each letter matching is 1/26 and because we assumed random letters, a forward match has a probability of 1/26 * 1/26 * 1/26, as does a backward match.
The next question to answer would be, how many 3-letter sequences are there for an m x m matrix of letters?
In each row, there would (m-2) 3-letter sequences which means m*(m-2) horizontal sequences. The same goes for the columns. Now consider the diagonals. On the main diagonal there are m letters so there would be (m-2) 3-letter sequences. On the first super-diagonal there are (m-1) letters so there would be (m-3) 3-letter sequences. The same is true for the first sub-diagonal. If you keep marching up the diagonals, you’ll see (m-4), then (m-5), all the way down to 1.
So we’ll have a total of
for the number of forward diagonal 3-letter sequences. Since our word-find is assumed to be square, we’ll have the same number going on the back diagonal. So here is a total count on 3-letter sequences: horizontal + vertical + diagonal.
To simplify this, recall that if you have the sum of the first k integers,
So we can simplify the Total as first combining the first two terms,
Then we combine the last terms using the sum formula above,
Some minimal algebra finally gives us the formula for the number of 3-letter sequences as a function of m:
That’s surprisingly simple!
Back the main question now: we want the probability that at least one of these 3-letter sequences matches our given word. Turns out, under our assumptions, this is just a binomial distribution. Woohoo!
Recall that a binomial distribution with n trials and the probability, p, for a success in a single trial has the probability density function
Thus, we are looking for “at least one success” which means we calculate . We know that and
Now, you can plug and chug in your calculator or run out and stick a formula in Excel, Wolfram Alpha, Maple, etc. Either way, we now have an easy answer to a couple of interesting questions:
1. If we have 20x20 grid of randomly distributed letters, what is the probability that the word “ELF” appears?
IN EXCEL: =1-binomdist(0,4*(m-2)*(m-1),2/26^3,FALSE)
2. What size of a grid do we need to have a greater than 50% chance that the word “ELF” appears?
You could find this using “What-If Analysis” in Excel or with just a little algebra on the formulas above we can show that the m that solves the equation below, yields the answer (which is left as an exercise to the reader).
Just for fun, here’s a graph of the probability of at least one match as a function of m.
Well, now I can tell my daughter it’s not so special that she found the word “ELF” in her Thanksgiving puzzle. Or… I can just smile and nod, like a daddy should.
(And, by the way, to all you fellow math enthusiasts, I welcome your comments and critiques. If I missed something or messed something up, let me know what I did and we’ll find a better answer.)