Tuesday’s election results were full of surprises. The election
results indicated that almost all polls were wrong this time. I view the polling as a survey statistics. the election results exposed how survey statistics could go wrong - badly.
Before the election,
each poll is considered as an independent survey – taking a sample from the the overall population and then trying to
make the prediction about the population. For presidential election, we calculate the sample proportion (proportion of polling subjects who will vote for a candidate) and then try to predict the population proportion. Usually in statistics, when we do the
sampling to predict the population, we will never know if our prediction is
correct or not because the truth for the population will never be revealed or known.
It is different for the presidential election. After the
election, we know the truth and the truth will verify if all the polls are
wrong or correct. Unfortunately, the polls are mostly wrong this time.
We still remember the famous statistician (even though he is
actually not a statistician) named Nate Silver and his website fivethirtyeight.com. He became famous after he
predicted correctly for 49 out 50 states in 2008 presidential election and 50
out 50 states in 2012 presidential election. In 2013, he was invited to
give a keynote speech in annual Joint Statistical Meeting (the largest
conference in statistics field).
Predicting correctly for 49 out of 50 and 50 out of 50
states sounds like a great feat, however, for majority of the states, anybody
who pays a little bit attention to the presidential election will be able to
predict the results correctly. For example, it will be pretty safe to put
Texas, Indiana, Kentucky,… into the category of the red states and New York, California into the category of the blue
states. In probability terms, I am willing to bet that Hillary will have 100% chance to win California and Trump will have 100% chance to win Mississippi. There are actually less than 10 (or maybe even less than 5) states – so
called battleground states – where the polling and prediction are critical. Predicting
correctly for 49 out of 50 states may essentially be just predicting 4 out of 5
states. For 2016 election, the predictions are down
to several battleground states such as Florida, North Carolina, Ohio, Michigan,
Virginia,... - he got many of them wrong, especially in Michigan, Wisconsin, and Pennsylvania. In the final poll prediction prior to the November 8 election, Nate Silver and fivethirtyeight.com predicted the following:
"giving Clinton a 71.4% chance of winning, and predicting the former Secretary of State would end up with 302 electoral votes (270 are required for victory) and a 3.6 percentage point margin–48.5% to 44.9%–in the popular vote."
Here is comparison of the final predictions from fivethirtyeight.com and the final results for all 50 states. The highlights in yellow are states with discordance (i.e., the prediction probability of Trump winning less than 50%, but Trump won; the prediction probability of Trump winning greater than 50%, but Trump lost):
State
|
Abbr.
|
Probability of Trump Winning
|
Actual Result of Trump Winning
|
AL
|
greater than 99.9%
|
Yes
|
|
AK
|
76.4%
|
Yes
|
|
AZ
|
66.6%
|
Yes
|
|
AR
|
99.6%
|
Yes
|
|
CA
|
less than 0.1%
|
No
|
|
CO
|
22.4%
|
No
|
|
CT
|
2.7%
|
No
|
|
DE
|
8.5%
|
No
|
|
FL
|
44.9%
|
Yes
|
|
GA
|
79.1%
|
Yes
|
|
HI
|
1.1%
|
No
|
|
ID
|
99%
|
Yes
|
|
IL
|
1.7%
|
No
|
|
IN
|
97.5%
|
Yes
|
|
IA
|
69.8%
|
Yes
|
|
KS
|
97.3%
|
Yes
|
|
KY
|
99.6%
|
Yes
|
|
LA
|
99.5%
|
Yes
|
|
ME
|
17.3%
|
No
|
|
MD
|
less than 0.1%
|
No
|
|
MA
|
less than 0.1%
|
No
|
|
MI
|
21.1%
|
Yes
|
|
MN
|
15.0%
|
Yes
|
|
MS
|
97.8%
|
Yes
|
|
MO
|
96.1%
|
Yes
|
|
MT
|
95.9%
|
Yes
|
|
NE
|
97.7%
|
Yes
|
|
NV
|
41.7%
|
No
|
|
NH
|
30.2%
|
No
|
|
NJ
|
3.1%
|
No
|
|
NM
|
17.2%
|
No
|
|
NY
|
0.2%
|
No
|
|
NC
|
44.5%
|
Yes
|
|
ND
|
97.7%
|
Yes
|
|
OH
|
64.6%
|
Yes
|
|
OK
|
greater than 99.9%
|
Yes
|
|
OR
|
6.3%
|
No
|
|
PA
|
23.0%
|
Yes
|
|
RI
|
6.8%
|
No
|
|
SC
|
89.7%
|
Yes
|
|
SD
|
93.9%
|
Yes
|
|
TN
|
97.3%
|
Yes
|
|
TX
|
94.0%
|
Yes
|
|
UT
|
83.2%
|
Yes
|
|
VT
|
1.9%
|
No
|
|
VA
|
14.5%
|
No
|
|
WA
|
1.6%
|
No
|
|
WV
|
99.7%
|
Yes
|
|
WI
|
16.5%
|
Yes
|
|
WY
|
98.9%
|
Yes
|
If we just look at the discordance: there is 6 out of 50 (12%) states with the prediction probability of Trump winning less than 50%, but Trump won; there is 0 out of 50 states with the prediction probability of Trump winning greater than 50%, but Trump lost.
|
Actual Results of Trump Winning
|
|
Yes
|
No
|
|
Probability of Trump Winning
According to Fivethirtyeight.com:
|
||
greater than 50%
|
25
|
0
|
less than 50%
|
6
|
19
|
Nate Silver is a democratic and he indicates that the party affiliation does not have any impact on his prediction. However, there may be unconscious biases in the prediction, at least this seems to be true based on the prediction and the actual election results for this election: for the discordance, always underestimated the probability of Trump winning (in all six states with discordance).
Not sure what model is used in prediction for Nate Silver
and his fivethirtyeight.com. The predictions by fivethirtyeitht.com is still better than other polls (even though the predictions this time are not as good as previous elections). I see the similarities between his analysis and
the meta analysis – the data analysis or modeling based on various sources of
the polling data. The prediction is made in the format of probability of
winning by each candidate based on the aggregate data from the meta analysis.
For Nate Silver and his fivethirtyeight.com, with three
consecutive presidential elections (2008, 2012, and 2016), he got the first two right, but the third
one totally wrong. This reminds us that it needs duplication and the
verification to tell if a model is robustly correct. Remember what George Box
said "essentially, all models are wrong, but some are useful"
No comments:
Post a Comment