A Measure of Polling Accuracy :: HowWillAmericaVote.com

Follow @HWAV_TJHalva

Recent Tweets

The fundamental objective of polling is to infer, from a small subset of a given population, the opinion of that population. Polling is an application of statistics, but in most cases the outcome is not empirically verifiable. The true value of public opinion can only be known by querying each member of the public; an expensive task sometimes referred to as Democracy.

For election based polls, pollsters seek to measure a potential outcome, overtime, with hypothetical questions like “If the election were held today.” As time passes, “today”, eventually becomes Election Day. Iowa and New Hampshire have had their Election Days, but before they voted, a total of 192 polls, from 28 unique pollsters were conducted. The vast majority of these 192 polls contain results which can never be verified empirically; a smaller minority however, can be directly compared to the election result they sought to measure.

Using our aggregated polling data and the known outcome, we should be able to assess which pollsters did a better job. An election’s outcome serves as a single basis point by which a pollster’s result can be assessed empirically. It is the only time in which public opinion is definitively known. This assessment, perhaps called empirical accuracy, is our focus.

One definition of better is the accuracy with which a pollster measured or sampled a demographic sub-sample. This concept derives from the concept of expected demographic outcome. The procedure outlined below could be applied to any sample or combination of sub-samples. We’re going to focus on the gender sample and we’ll call it the gender expectation.

Using any given poll, these are our steps to determine the gender expectation. The first two steps derive information from the actual outcome, generally from exit or entrance polling; the same data must also be released by the pollster or the analysis cannot be done:

Determine the outcome within each sample group; in our case this would be the outcome from only males and the outcome from only females.
Determine the proportion of participation from each sample group; in our case that would be the number of males which voted and the number of females which voted.

The following steps then require the inclusion of the poll’s data:

Using the information from #1 and #2, re-weight the data provided by the poll to deduce a new topline result. This procedure does not punish a pollster for weighting incorrectly.
Calculate the deviation of each sample. Determine the difference for each group from #1 with that reported by the poll. Next calculate the deviation of the re-weighted outcome in #3 with the actual outcome. Summing all deviations yields the overall score; the lower the better.

A carefully crafted example is below to illustrate the procedure; our example assumes a fixed and consistent participation rate for #2; 4 males and 6 females are always included in each example. This simplifies the examples, but nullifies the significance of #3 as the re-weighting doesn’t alter the outcome.

Let’s suppose a municipality exists with 120 residents; let’s then assume that 100 residents are eligible to vote. Breaking News! An election was just held with two candidates, A and B; 40 people voted. The result along with an exit poll (for each gender) is provided below:

Actual	Candidate A	Candidate B	Total
All	16	24	40
Female	0	24	24
Male	16	0	16

The above data provides us with #1; all males voted for A and all females for B. We are also given #2, which is the number of participants in each group; 16 males and 24 females.

There were four fictitious pollsters that released polls the previous day; the raw topline margin, the deviation between the two candidates, of the four mystery pollsters is below:

Pollster	Topline Margin
1	0%
2	0%
3	10%
4	10%

To illustrate the lack of depth in the above ranking, we’re going to assess their accuracy by using the gender expectation method. We’ll assume each pollster sampled 10 people and included the correct ratio of males/females to reduce the number of possible variables.

The first was conducted by a Bad pollster:

Bad	Candidate A	Candidate B	Total	Deviation
All	4	6	10	0%
Female	3	3	6	50%
Male	1	3	4	75%

This Bad pollster matched the overall outcome, but on closer inspection of the gender crosstabs, they didn’t do so good. They were off by 50% in the female demographic, compared to the actual outcome, and off by 75% among males. They produced an accurate overall result, but for the wrong reasons. They included 6 individuals (60%) in their sample, 3 of each gender, which did not align with their sample group’s actual outcome. The total deviation from the actual outcome was 125%; the sum of each demographic and the overall deviation. To reiterate, this sum includes the calculation in #3, but because the sample group sizes remain static, the calculation is still 0%, the same as the initial topline. This is a terrible poll which got lucky with their overall result.

Another Good pollster, published these results:

Good	Candidate A	Candidate B	Total	Deviation
All	5	5	10	10%
Female	3	3	6	50%
Male	2	2	4	50%

This Good pollster missed the overall outcome, but was actually more accurate than the Bad pollster, by measure of deviation. Each gender deviated by 50% from their actual outcome, which caused the overall outcome to deviate by 10%. The overall deviation is 110%; this is a simple example of a pollster getting the topline result wrong, because the sub-samples were wrong.

Another, Better pollster also published results:

Better	Candidate A	Candidate B	Total	Deviation
All	4	6	10	0%
Female	1	5	6	17%
Male	3	1	4	25%

This pollster was mostly correct with the gender samples and accurate with the overall result. Their deviation is a mere 42%.

The last, and Best, pollster’s result:

Best	Candidate A	Candidate B	Total	Deviation
All	5	5	10	10%
Female	1	5	6	17%
Male	4	0	4	0%

The Best poll erred in the opinion of just one female, but missed the overall result. Their deviation however was just 27%. This is the most accurate pollster.

Let’s now rank the 4 pollsters by their deviation, derived from their gender expectation:

Pollster	Deviation
Best	27%
Better	42%
Good	110%
Bad	125%

Not surprisingly, the Best pollster had the lowest deviation. If we revisit the naïve ranking from above, I’ve revealed each pollster’s true identity:

Pollster	Topline Margin
Bad	0%
Better	0%
Good	10%
Best	10%

The naïve topline margin resulted in the Best poll, getting ranked the worst. By using gender expectation we’re able to more deeply analyze a poll’s sample and truly assess whether it was accurate for the correct reasons.

We'll use this technique going forward to assess the accuracy of each pollster in a given matchup; rankings from the 2016 New Hampshire Democratic Primary will be published later this week.

Sources

There are currently no citations.

TAGS: methodology

Showing 0 Comments | Sorted By best

You must Login or Register to contribute.

Matchups

MN Hypothetical President: Hillary Clinton (D), Jeb Bush (R)
Nov 8, 2016
MN Hypothetical President: Hillary Clinton (D), Chris Christie (R)
Nov 8, 2016
WI Hypothetical President: Gary Johnson (L), Bernie Sanders (D), Donald Trump (R)
Nov 8, 2016
Wisconsin President
Nov 8, 2016
Wisconsin US Senate
Nov 8, 2016
More Matchups

Latest Polls

VA: Christopher Newport University
Nov 1-6, 2016
FL: Quinnipiac University
Nov 3-6, 2016
NC: Quinnipiac University
Nov 3-6, 2016
NM: ZiaPoll
Nov 6, 2016
NH: University of New Hampshire
Nov 3-6, 2016
More Polls

Matchups

Polls

Special Coverage

Articles

About

Recent Tweets

Advertisement

A Measure of Polling Accuracy

Matchups

Latest Polls

Recent Articles

Advertisement

Authentication Required...

Create an Account

Matchups

Polls

Special Coverage

Articles

About

Recent Tweets

Advertisement

A Measure of Polling Accuracy

Matchups

Latest Polls

Recent Articles

Advertisement

Authentication Required...

Create an Account

Log In