The Devil Is in the Digits

http://www.washingtonpost.com/wp-dyn/content/article/2009/06/20/AR2009062000004.html

Your browser's settings may be preventing you from commenting on and viewing comments about this
item. See instructions for fixing the problem.
Discussion Policy CLOSEComments that include profanity or personal attacks or other inappropriate
comments or material will be removed from the site. Additionally, entries that are unsigned or contain
"signatures" by someone other than the actual author will be removed. Finally, we will take steps to
block users who violate any of our posting standards, terms of use or privacy policies or any other
policies governing this site. Please review the full rules governing commentaries and discussions. You
are fully responsible for the content that you post.

Who's Blogging» Links to this article  
By Bernd Beber and Alexandra Scacco
Saturday, June 20, 2009; 12:02 AM

Since the declaration of Mahmoud Ahmadinejad's landslide victory in Iran's presidential election,
accusations of fraud have swelled. Against expectations from pollsters and pundits alike, Ahmadinejad
did surprisingly well in urban areas, including Tehran -- where he is thought to be highly unpopular --
and even Tabriz, the capital city of opposition candidate Mir Hussein Mousavi's native East Azarbaijan
province.

Others have pointed to the surprisingly poor performance of Mehdi Karroubi, another reform
candidate, and particularly in his home province of Lorestan, where conservative candidates fared
poorly in 2005, but where Ahmadinejad allegedly captured 71 percent of the vote. Eyebrows have been
raised further by the relative consistency in Ahmadinejad's vote share across Iran's provinces, in spite
of wide provincial variation in past elections.

These pieces of the story point in the direction of fraud, to be sure. They have led experts to speculate
that the election results released by Iran's Ministry of the Interior had been altered behind closed
doors. But we don't have to rely on suggestive evidence alone. We can use statistics more
systematically to show that this is likely what happened. Here's how.

We'll concentrate on vote counts -- the number of votes received by different candidates in different
provinces -- and in particular the last and second-to-last digits of these numbers. For example, if a
candidate received 14,579 votes in a province (Mr. Karroubi's actual vote count in Isfahan), we'll focus
on digits 7 and 9.


This may seem strange, because these digits usually don't change who wins. In fact, last digits in a fair
election don't tell us anything about the candidates, the make-up of the electorate or the context of the
election. They are random noise in the sense that a fair vote count is as likely to end in 1 as it is to end
in 2, 3, 4, or any other numeral. But that's exactly why they can serve as a litmus test for election fraud.
For example, an election in which a majority of provincial vote counts ended in 5 would surely raise
red flags.

Why would fraudulent numbers look any different? The reason is that humans are bad at making up
numbers. Cognitive psychologists have found that study participants in lab experiments asked to write
sequences of random digits will tend to select some digits more frequently than others.

So what can we make of Iran's election results? We used the results released by the Ministry of the
Interior and published on the web site of Press TV, a news channel funded by Iran's government. The
ministry provided data for 29 provinces, and we examined the number of votes each of the four main
candidates -- Ahmadinejad, Mousavi, Karroubi and Mohsen Rezai -- is reported to have received in
each of the provinces -- a total of 116 numbers.

The numbers look suspicious. We find too many 7s and not enough 5s in the last digit. We expect each
digit (0, 1, 2, and so on) to appear at the end of 10 percent of the vote counts. But in Iran's provincial
results, the digit 7 appears 17 percent of the time, and only 4 percent of the results end in the number
5. Two such departures from the average -- a spike of 17 percent or more in one digit and a drop to 4
percent or less in another -- are extremely unlikely. Fewer than four in a hundred non-fraudulent
elections would produce such numbers.

As a point of comparison, we can analyze the state-by-state vote counts for John McCain and Barack
Obama in last year's U.S. presidential election. The frequencies of last digits in these election returns
never rise above 14 percent or fall below 6 percent, a pattern we would expect to see in seventy out of
a hundred fair elections.

But that's not all. Psychologists have also found that humans have trouble generating non-adjacent
digits (such as 64 or 17, as opposed to 23) as frequently as one would expect in a sequence of random
numbers. To check for deviations of this type, we examined the pairs of last and second-to-last digits
in Iran's vote counts. On average, if the results had not been manipulated, 70 percent of these pairs
should consist of distinct, non-adjacent digits.

Not so in the data from Iran: Only 62 percent of the pairs contain non-adjacent digits. This may not
sound so different from 70 percent, but the probability that a fair election would produce a difference
this large is less than 4.2 percent. And while our first test -- variation in last-digit frequencies --
suggests that Rezai's vote counts are the most irregular, the lack of non-adjacent digits is most striking
in the results reported for Ahmadinejad.

Each of these two tests provides strong evidence that the numbers released by Iran's Ministry of the
Interior were manipulated. But taken together, they leave very little room for reasonable doubt. The
probability that a fair election would produce both too few non-adjacent digits and the suspicious
deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the
numbers are clean is a one in two-hundred long shot.

Bernd Beber and Alexandra Scacco, Ph.D. candidates in political science at Columbia University, will
be assistant professors in New York University's Wilf Family Department of Politics this fall.


mikemout wrote:
I'm sure someone will point out Benford's law. Which also helps identify if a set of numbers are
randomly created.
In general, the "law" says that the probability of the first digit being a "d" is

P(d) = (ln(1+1/d))/ln(10)


6/22/2009 11:03:21 PM
Recommend (0) Report Abuse Discussion Policy

AlchemyToday wrote:
Martial - You're forgetting the obvious fact underlying any analysis of fraud in this election: the
Iranian clerical regime was so caught of guard and acted in such haste, and obviously lacks someone
capable of generating 116 random numbers on a believable distribution based on past results that the
fraud will be obvious in the reported numbers.

Also, there's not enough focus on the fact that the probability of the occurrence identified here is 0.15%
and not 0.5%... go simulate it yourself. It's also the product of the individual probabilities mentioned
here: 3.5% (a 17% digit and a 4% digit) * 4% (62% non-adjacent digits) = 0.15%

It's trivial to show that the adjacency of the penultimate digit is independent of the identity of the last
digit, so the probability of both conditions being met is the product of the individual probabilities.

I have no clue why no one pointed this out to the authors when they passed around the draft of the
article; presumably anyone smart enough to recognize it held their tongue because they'd have to point
out the folly of the whole thing.
6/22/2009 9:47:30 PM
Recommend (0) Report Abuse Discussion Policy

wlockhar wrote:
What a crock!
6/22/2009 9:11:20 PM
Recommend (0) Report Abuse Discussion Policy

Martial wrote:
The outcome variable is dichotomous--dishonestly reported v. honestly reported vote counts. The
predictor variable set comprises proposed indicators of bad behavior; e.g., predictor variable
1--deviation from a uniform distribution of the numerals 1,2,3,4,5,6,7,8,9, and 0 as they appear in last
digits; predictor variable 2--degree of increase above 30% of the numerals 1,2, and 3 as they appear in
last digits; predictor variable 3--degree of decrease below 8% of the duplicate last digit pairs
11,22,33,44,55,66,77,88,99,00; and predictor variable 4--increase of the number 7 as the first digit.

Since there are thousands of honest election return sets and hundreds of dishonest election return sets,
one could readily perform binomial logit regression or log negative binomial logit regression to
determine the value of each predictor variable with respect to the existence or lack thereof of
dishonest election returns.

Arguendo, assume constancy of specificity and sensitivity with respect to prevalence and a lack of
difference in alterations in counts with respect to the type of fraud (e.g., having voters from the
cemetary is the same as changing the names on the ballots), this still would be a very stupid method to
proceed because one is not dealing with a disease like schizophrenia, but with a criminal activity like
counterfeiting. Were uniform distribution of the last digit were a super means of detecting fraud, one
might imagine the following Chicago conversation five years from now:

Boss: Charlie, come here.

Charlie: What's up boss?

Boss: Big, big trouble.

Charlie: Duh, waddaya mean, Boss?

Boss: Nimrod, the FBI is choking down our throats about the Alderman operation. Didn't I tell you
about that Beber Scacco garbage? You were supposed to use the microsoft random function for that
last digit.

[whacks Charlie with glove ten times].

Charlie: Sorry Boss.

Boss: Don't let it happen again. Go tell Patty B. to work her magic with the FBI agent.


6/22/2009 9:07:19 PM
Recommend (0) Report Abuse Discussion Policy

ekim53 wrote:
So, what is the Iranian version of ACORN called?
6/22/2009 9:03:45 PM
Recommend (1) Report Abuse Discussion Policy

seanwf wrote:
I learned long ago to always look at the source when statistics are published in the news. Although the
authors are Ph.D. candidates, political science is nothing like psychology nor statistics and it would be
nice to be able to check their numbers.

Even if their math is flawless, p <. 005 is still a possible, albeit quite unlikely outcome. My intuition
tells me that since the authors were working with such a large sample size that this probability should
be lower.

On a personal note, I'm pretty convinced the election was a sham. It would still be nice to be able to
check for myself.
6/22/2009 8:31:19 PM