## Saturday, November 12, 2016

### Presidential Election and Statistics

Tuesday’s election results were full of surprises. The election results indicated that almost all polls were wrong this time. I view the polling as a survey statistics. the election results exposed how survey statistics could go wrong - badly.

Before the election, each poll is considered as an independent survey – taking a sample from the the overall population and then trying to make the prediction about the population. For presidential election, we calculate the sample proportion (proportion of polling subjects who will vote for a candidate) and then try to predict the population proportion. Usually in statistics, when we do the sampling to predict the population, we will never know if our prediction is correct or not because the truth for the population will never be revealed or known.

It is different for the presidential election. After the election, we know the truth and the truth will verify if all the polls are wrong or correct. Unfortunately, the polls are mostly wrong this time.

We still remember the famous statistician (even though he is actually not a statistician) named Nate Silver and his website fivethirtyeight.com. He became famous after he predicted correctly for 49 out 50 states in 2008 presidential election and 50 out 50 states in 2012 presidential election. In 2013, he was invited to give a keynote speech in annual Joint Statistical Meeting (the largest conference in statistics field).

Predicting correctly for 49 out of 50 and 50 out of 50 states sounds like a great feat, however, for majority of the states, anybody who pays a little bit attention to the presidential election will be able to predict the results correctly. For example, it will be pretty safe to put Texas, Indiana, Kentucky,… into the category of the red states and New York, California into the category of the blue states. In probability terms, I am willing to bet that Hillary will have 100% chance to win California and Trump will have 100% chance to win Mississippi.  There are actually less than 10 (or maybe even less than 5) states – so called battleground states – where the polling and prediction are critical. Predicting correctly for 49 out of 50 states may essentially be just predicting 4 out of 5 states. For 2016 election, the predictions are down to several battleground states such as Florida, North Carolina, Ohio, Michigan, Virginia,... - he got many of them wrong, especially in Michigan, Wisconsin, and Pennsylvania.  In the final poll prediction prior to the November 8 election, Nate Silver and fivethirtyeight.com predicted the following:
"giving Clinton a 71.4% chance of winning, and predicting the former Secretary of State would end up with 302 electoral votes (270 are required for victory) and a 3.6 percentage point margin–48.5% to 44.9%–in the popular vote."

Here is comparison of the final predictions from fivethirtyeight.com and the final results for all 50 states. The highlights in yellow are states with discordance (i.e., the prediction probability of Trump winning less than 50%, but Trump won; the prediction probability of Trump winning greater than 50%, but Trump lost):

 State Abbr. Probability of Trump Winning Actual Result of Trump Winning AL greater than 99.9% Yes AK 76.4% Yes AZ 66.6% Yes AR 99.6% Yes CA less than 0.1% No CO 22.4% No CT 2.7% No DE 8.5% No FL 44.9% Yes GA 79.1% Yes HI 1.1% No ID 99% Yes IL 1.7% No IN 97.5% Yes IA 69.8% Yes KS 97.3% Yes KY 99.6% Yes LA 99.5% Yes ME 17.3% No MD less than 0.1% No MA less than 0.1% No MI 21.1% Yes MN 15.0% Yes MS 97.8% Yes MO 96.1% Yes MT 95.9% Yes NE 97.7% Yes NV 41.7% No NH 30.2% No NJ 3.1% No NM 17.2% No NY 0.2% No NC 44.5% Yes ND 97.7% Yes OH 64.6% Yes OK greater than 99.9% Yes OR 6.3% No PA 23.0% Yes RI 6.8% No SC 89.7% Yes SD 93.9% Yes TN 97.3% Yes TX 94.0% Yes UT 83.2% Yes VT 1.9% No VA 14.5% No WA 1.6% No WV 99.7% Yes WI 16.5% Yes WY 98.9% Yes

If we just look at the discordance: there is 6 out of 50 (12%) states with the prediction probability of Trump winning less than 50%, but Trump won; there is 0 out of 50 states with the prediction probability of Trump winning greater than 50%, but Trump lost.

 Actual Results of Trump Winning Yes No Probability of Trump Winning According to Fivethirtyeight.com: greater than 50% 25 0 less than 50% 6 19

Nate Silver is a democratic and he indicates that the party affiliation does not have any impact on his prediction. However, there may be unconscious biases in the prediction, at least this seems to be true based on the prediction and the actual election results for this election: for the discordance, always underestimated the probability of Trump winning (in all six states with discordance).

Not sure what model is used in prediction for Nate Silver and his fivethirtyeight.com. The predictions by fivethirtyeitht.com is still better than other polls (even though the predictions this time are not as good as previous elections).  I see the similarities between his analysis and the meta analysis – the data analysis or modeling based on various sources of the polling data. The prediction is made in the format of probability of winning by each candidate based on the aggregate data from the meta analysis.

For Nate Silver and his fivethirtyeight.com, with three consecutive presidential elections (2008, 2012, and 2016), he got the first two right, but the third one totally wrong. This reminds us that it needs duplication and the verification to tell if a model is robustly correct. Remember what George Box said "essentially, all models are wrong, but some are useful"