The Locker Problems Solved
A few days ago, I posted a couple of problems that I had stumbled across over at Math Jokes for Mathy Folks. If you haven’t already, take a moment to read the puzzles: The Locker Problems.
Problem 1
To start with, we can simply begin walking through process by hand. Locker 1 will begin closed, then the 1st person will come in an open it. After that, no one touches the 1st locker so we know it stays open.
Locker 2 is opened by the 1st person, closed by the 2nd and it stays closed. Locker 3 is opened by the 1st person, left alone by the 2nd and then closed by the 3rd. This is going to get tedious if I keep explaining in words. Lets use a table. If a locker is open, I’ll put a “o”, if it’s closed I’ll put a “c”.
This is also going to get tedious if I want to do this for 1000 lockers and 1000 students. We definitely need to find a pattern.
My next step was put together a lazy little Matlab script to do the same exact thing as my table. (See here.) Using the “spy” function in MATLAB allows me to see the sparsity pattern of the matrix. The open lockers are represented by 1’s and the closed ones by 0’s. Here’s what it looks like for 100 lockers and 100 students.
There is definitely a pattern! Notice those open lockers form the bands you see in the picture. So why do some end up open and some end up closed?
The easiest way to see why is to consider what happens to a single locker. For example, think about locker 24. When does it change state? Obviously, person 1 opens the locker, person 2 closes it, person 3 opens it, person 4 closes, person 5 does nothing, etc. Notice that if the locker number, 24, is divisible by the person number, then the state changes:
The number 24 has 8 factors, that is, 8 numbers that divide evenly into it. So the key is that any locker number with an even number of factors will end up with closed and any with an odd number will end up open. So what numbers have an odd number of factors?
At first thought, it seems that all numbers should have an even number of divisors since they always occur in pairs:
24 = 1 x 24
24 = 2 x 12
24 = 3 x 8
24 = 4 x 6
Consider a number like 36, though.
36 = 1 x 36
36 = 2 x 18
36 = 3 x 12
36 = 4 x 9
36 = 6 x 6 <---- Ah ha!!
Perfect squares are the only numbers that can have an odd number of divisors. So locker numbers which are perfect squares (i.e., 1, 4, 9, 16, 25, …) will remain open while the rest stay closed!! That matches exactly with the table above and the graph from MATLAB.
All we need to know is how many perfect squares are less than 1000.

ANSWER: There will be 31 open lockers
Problem 2
In the second version of the problem, we have only 30 students and some are absent. In fact, we need to determine which ones were absent so that all the lockers except the first on remain closed.
Starting in a similar fashion as before, we could build a table by hand and determine who would have to be sick. For example, the 1st student is clearly present since the first locker remains open. After that, the 2nd and 3rd are both present since those lockers end up closed. The 4th person must be absent since if they were present then the 4th locker would be open.
So any locker number that has an odd number of factors (only counting factors if that person is present) must not be there. For example, consider the 8th locker. The number 8 has 4 factors but one of them is 4 so it actually only has 3 factors when counting those present. So 8 must be out sick.
Following this pattern, we eliminate all multiples of 4. Then when we get to the 9th locker, they will also absent. So will all multiples of 9. Continue this pattern until you reach 30.
The numbers that are not shaded are called squarefree since they have no repeated factors when you they are prime factored. (Read more about squarefree numbers.) The absent folk are the "non-squarefree” numbers, also known as squareful.
ANSWER: There are 11 squareful numbers less than 30 so 11 students are absent.
That was fun!
The Locker Problems
Problem 1
Every day, 1000 students enter a school that has 1000 lockers. All of the lockers are closed when they arrive. Student 1 opens every locker. Student 2 closes every other locker. Student 3 then “changes the state” of every third locker – that is, he opens it if it’s closed, and he closes it if it’s open. Student 4 then changes the state of every fourth locker, Student 5 changes the state of every fifth locker, and so on, so that Student n changes the state of everynth locker.
Which lockers are open after all 1000 students have finished opening and closing lockers?
Problem 2
Every day, 30 students enter a room with 30 lockers. All of the lockers are closed when they arrive. Student 1 opens every locker. Student 2 closes every locker. Student 3 then “changes the state” of every third locker — that is, he opens it if it’s closed, and he closes it if it’s open. Student 4 then changes the state of every fourth locker, Student 5 changes the state of every fifth locker, and so on, so that Student n changes the state of every nth locker.
One day, some students are out sick. Regardless, those present repeat the process and just skip the students who are absent — for instance, if Student 3 were absent, then no one would change the state of every third locker.
When they finish, only Locker #1 is open, and the other 29 lockers are all closed. How many students were absent?
HT: Math Jokes for Mathy People
Solutions have been posted: The Locker Problems Solved
Hunting for Simpson’s Paradox (part 2)
(If you haven’t already, you should read part 1 of this article which was posted on April 13, 2011)
Simpson’s paradox illustrates an intuitive error in reasoning that's hard to accept without credible examples. It occurs when a correlation or trend that is present in groups is reversed when the groups are combined. The example we’ve been working with deals with batting averages. A hypothetical situation was presented in the first part of the article in which we stated that one batter had a higher batting average against left-handed pitchers and against right-handed pitchers, separately. However, when the overall batting average was calculated, that batter had a lower average.
Here’s the table once more just for reference:
![]()
Certainly it was an interesting example but it was entirely made-up. I just tweaked the numbers a bit to get the table to work. I set my sights on finding some of my own examples of Simpson’s paradox based on REAL data.
Attempt 1
Yahoo Sports provides some easily accessible baseball team split stats, e.g., home vs. road, pre-all star break vs. post-all star break, turf vs. grass, indoor vs. outdoor (http://yhoo.it/hKAiLn). I started by pulling the overall team batting averages and then their averages split by home games versus road games (for the 2010 season). By pasting into Excel and sorting, then manually checking for a good 15 minutes, I was able to establish that there was no example of Simpson’s paradox in this data set. Bummer.
Attempt 2
This process had to be automated because there was no way I was going to spend 15+ minutes checking every split, every season, until I found an example. While this could be fairly easily done with a VBA script (e.g. macro) in Excel, I decided to move over to MATLAB simply because I’m more well-versed in scripting there. Algorithmically speaking, it really is a piece of cake to automate.
NOTE: If the details of the script don’t interest you, you can skip down to the results. And yes, I found some examples. WOOHOOO!
First, you need your data read into the program. In Excel, I had a table with the first column as the team names, the second column was the overall batting average and the third and fourth column were the split stats (i.e., team batting averages for road vs. home). To move this into MATLAB, I created a cell array called teams which would contain team names.
teams = {}
Then, after opening the teams variable in the variable editor window (double-click it in the Workspace), I pasted in the team names (right-click, “Paste Excel Data”). Next, I created a variable x and pasted in the three columns of batting average data.
At this point a very simple little script (simphunt.m) tests every pair of teams for Simpson’s paradox. The main step of the script is to compare the overall averages for each pair and the split stats for each pair. If the relation of the split is reversed in the overall, then we have an example.
if (x(i,1)>x(j,1) && x(i,2)<x(j,2) && x(i,3)<x(j,3)) || ...
(x(i,1)<x(j,1) && x(i,2)>x(j,2) && x(i,3)>x(j,3))
Results
The first split I tested was, again, home vs. road (to which I already knew the answer):
There are no cases of Simpson's Paradox
The next was day vs. night. Disappointed again.
There are no cases of Simpson's Paradox
Fortunately, it was only taking a minute or so to fetch the data and move it into MATLAB. Nevertheless, I was still feeling a little disheartened. I was beginning to expect to run through dozens of splits, maybe from multiple seasons, or even moving on to player vs. player stats instead of teams.
Then I went with indoor vs. outdoor. JACKPOT!!!
There are 5 cases of Simpson's Paradox
' Chicago Cubs' ' Oakland Athletics'
' Cleveland Indians' ' Tampa Bay Rays'
' Colorado Rockies' ' Milwaukee Brewers'
' Los Angeles Angels' ' Tampa Bay Rays'
' New York Mets' ' Toronto Blue Jays'
Five Cases!!! Looking back at the data we can confirm:
I could clearly keep hunting but I’ve satisfied my desire for finding my own examples. Next time I bring up Simpson’s paradox in any of my classes, I’ll start with my manufactured example but quickly move on to these REAL examples.
Read More…
Find out more about Simpson’s Paradox at the following links:
Simpson's paradox - Wikipedia, the free encyclopedia
http://www.furtadoworld.com/handouts/SimpsonParadox.pdf
When Combined Data Reveal the Flaw of Averages
Unboxing the Casio EX-ZR100
Here at the School of Math and Sciences at Wayland Baptist University, we just purchase a new point-and-shoot camera for integration into our curriculum. As part of a student project that I’m directing this term, we needed to capture some video filmed at a relatively high frame-rate. After much research on available options within our price range, we decided on the Casio EX-ZR100 model camera which provides not only 12.1 megapixels and a 12.5x optical zoom, it can also film lower resolution videos at 240, 480 or even 1000 frames per second.
I’ll take time later on this blog to document the project for which we used the camera, but in the meantime you might enjoy watching this. Actually, you should be forewarned, this may be the most disturbing footage of a mathematics professor anywhere on the internet:
Hunting for Simpson’s Paradox, part 1
Let’s say we happen to know the batting average for two baseball players. Overall, player 1 has a higher average that player 2. However, if you consider only how each player hits against left-handed pitchers we find that player 2 actually has a better average player 1. In this hypothetical scenario, it also turns out that player 2 also has a better average against right-handed pitchers. How is that possible?
Doesn’t it make intuitive sense that if player 2 is better than player 1 against left and right handed pitchers separately, that he must be better than player 1 against all pitchers? While that may be what our intuition tells us, it turns out that it’s not necessarily true.
Consider the following table. Note that batting average is simply the ratio of a players number of hits over the number of at-bats.
This table presents exactly the hypothetical scenario described above. Separately, player 2 had a higher average than player 1 against left and right handed pitchers, but over all player 1 has a higher average than player 2.
This phenomenon is commonly known in statistics as Simpson’s paradox. It demonstrates how our intuition can get us into trouble. Briefly stated, Simpson’s paradox occurs when a correlation or trend that is present in groups is reversed when the groups are combined.
I was recently reminded of Simpson’s paradox when @Math_Bits posted a link on twitter to an article, “Instances of Simpson’s Paradox” by Thomas R. Knapp. It got me thinking. Sure I can manufacture an example and I’ve seen a few examples in papers, texts and even wikipedia. But I want to find my own examples. And of course, manufactured examples like the table above don’t count. I need real data.
I figured the best place to start for real data that’s easy to find is in sports, say baseball, while we’re thinking of it. I started simple and pulled up the first split data set I found. Over on Yahoo sports, I pulled up team statistics for the full 2010 season, see http://yhoo.it/hKAiLn. I pulled the data over into excel and began (manually, ug…) hunting for an example of Simpson’s paradox. Of course, I would start the hardest way possible. I looked at batting averages (overall, home and on the road). I made three lists, one for each category: overall, home and road. Then, I sorted the teams from highest to lowest and began looking one-by-one for pairs of teams where one had a higher average overall but lower both at home and on the road.
… to no avail. I even reversed the process by sorting from lowest to highest.
I’m beginning to believe that Knapp was right when he claimed that examples of Simpson’s paradox are extremely rare.
The next step was to automate this process. As a programmer, I began devising a simple code that will take an overall list and lists for each group and identifies all those pairs that satisfy Simpson’s paradox.
In the next post, I’ll walk through the progress I made using Matlab to do my dirty work.
Read More…
Find out more about Simpson’s Paradox at the following links:
Simpson's paradox - Wikipedia, the free encyclopedia
http://www.furtadoworld.com/handouts/SimpsonParadox.pdf
When Combined Data Reveal the Flaw of Averages
God Rewards Fools
I'm currently reading "The Code Book" By Simon Singh. It's fascinating to read the back-story of the cryptographic algorithms I learned in graduate school.
“…the way to get to the top of the heap in terms of developing original research is to be a fool, because only fools keep trying. You have idea number 1, you get excited, and it flops. The you have idea number 2, you get excited, and it flops. Then you have idea number 99, you get excited, and it flops. Only a fool would be excited by the 100th idea, but it might take 100 ideas before one really pays off. Unless you’re foolish enough to be continually excited, you won’t have the motivation, you won’t have the energy to carry it through. God rewards fools.”
- Martin Hellman, quoted in The Code Book, p. 256
Whitfield Diffie and Martin Hellman, together with Ralph Merkle, developed what is considered by most to be the most significant cryptographic contribution of the modern era. They developed a method by which the key for encrypting information can be exchanged over a public channel. This breakthrough provides the means by which we use the public internet for private communication.
Want to know more? You should take my Cryptography and Personal Security class this spring.
The Googolplex
A recent question was posed to me on Facebook by a family member and because it’s related to one of my kids’ go-to big-numbers I thought I’d share the exchange. Here was the question:
How do you correctly notate a googleplex [sic] in scientific notation. I think it would be 10^10^100 but some of my co-workers disagree.
For those of you that might be unfamiliar, a googolplex is, in fact, one of the largest numbers that can be easily described to the non-mathematically oriented. You’ll obviously be aware that Google is the company that introduced a technology that changed the way we use the internet. They’ve become so ubiquitous that the very word is now a verb synonymous with internet search.
Not everyone is aware that the name “Google” is actually a corruption of googol which is the number 1 followed by 100 zeros, i.e.,
10,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000,000,000,
000,000,000,000,000,000,000,000,000,000
It also has the name ten duotrigintillion, but googol has a nice ring to it. After a bit of research, using Google in fact, I learned a couple of interesting facts from “Mathematics and the Imagination” by Edward Kasner and James Newman. The name “googol” was first introduced by Dr. Kasner:
The name "googol" was invented by a child (Dr. Kasner's nine-year-old nephew) who was asked to think up a name for a very big number, namely 1 with one hundred zeroes after it. He was very certain that this number was not infinite, and therefore equally certain that it had to have a name.
At the same time, Dr. Kasner suggested an even larger number which he dubbed the googolplex which would be formed much like the googol number except instead of 1 followed by 100 zeros, it would be 1 followed by a googol of zeros. Take a minute to let that soak in. The very writing down of the number in standard decimal form would not be possible, according to Kasner, even “if you went to the farther star, touring all the nebulae and putting down zeros every inch of the way.”
The question above proposes an alternative to writing it down the “old-fashioned” way. A googol could easily be written with exponentiation,
.
So, we might also be able to write a googolplex as
,
which is 10 raised to a googol. Going back to the original question,
How do you correctly notate a googleplex [sic] in scientific notation. I think it would be 10^10^100 but some of my co-workers disagree.
My Response:
Yes, kind of. Technically, scientific notation is of the form, A x 10^N, where A is a number between 1 and 10 and N is any integer. I would say yours is the mathematical notation for a googolplex.
Also, there is not a general agreement as to which order you should do the exponentiation. For example,
is a googolplex (1 followed by a googol of zeros). However,

is not a googolplex. It turns out to be a 1 followed by 1000 zeros, much smaller than a googolplex.
Fortunately that answer was clear enough for him and to quote my family member, “The things tech geeks talk about at work... lol”.
Circle Eyes
Emily: Did you know that humans are actually animals, too?Zachary: Yeah, because they both have circle eyes.
Quote by David Hilbert
“Mathematics is a game played according to certain rules with meaningless marks on paper.”- David Hilbert
