I just completed the latest round of site improvements; we're on version 3 now! The two major points of focus were decreasing load times and enhancing the matchups page. The speed improvements should be self evident, although portions of the poll pages remain slow. I'm going to spend most of this article discussing new features.
When viewing a specific election on the matchups page (eg. 2016 Iowa Dem Caucus) you'll find a slew of new items in the left column. These new sections enable you to affect the selection or filtering of polls included in the overall projection. Each of these filters relates to an empirical field present within a given poll. The interaction type, for example, enables you to filter polls which were conducted using either a live interviewer or in some automated fashion.
Additional time selection options have also been added; this filter inclusively selects a range of polls based on their end date. The zoom option, when checked, will scale the graph to display the inputted time range, while still including all eligible polls in the projection math. It's a convenient way to focus on the most recent polls while still including older polls in the weighting.
The other non-obvious feature is the "Project Turnout" section. This section is only displayed when viewing the All Respondents demographic. This new section permits the re-weighting of sub-sample demographic information by implementing the concept of expected demographic outcome I have discussed previously. The only demographic currently supported is gender. Let's take a look at an example poll to illustrate how the re-weighting is calculated; here is the unaltered and publicly released result for Quinnipiac's latest Iowa poll:
The gender slider allows you to alter the composition of the gender sample used by the pollster. In the sample above, males represented 41% of the sample while females 59%. Let's re-weight this poll to match the 2008 Iowa Caucus entrance polls which were 43% male and 57% female ; it's a simple multiplication. The re-weighted result causes about a 1.5% swing in Bernie's favor; here's the table above re-weighted:
The algorithm recomputes the overall result for each poll in the matchup and creates the graph using the re-weighted data. Some pollsters do not release sub-sample data for gender, when one of these polls is encountered during the re-weighting, the entire poll is excluded.
Why is this weighting important? Pollsters don't know which demographics are going to vote in what percentages so they use historical models and make educated guesses. If we look at historical data, pollsters do a pretty good job of predicting the overall outcome, but not so good predicting individual demographics. The interface enables the normalization of raw response information while discarding the weighted sample composition used by the pollster. The re-weighting of sub-samples is only useful when the response data is accurate, but the weighting wrong.
If we look at the 2012 Presidential Election in Missouri, the response data was wrong so re-weighting is basically useless. The female margin derived from polls in Missouri was about 3.5%, but the exit polls had the margin at 8% . In Missouri, the pollsters generally arrived at the correct overall margin of 9.65%, but did so despite the fact they were off by 5% among females. The response data was wrong so re-weighting can't correct the outcome. I played with the slider and determined the re-weighted male to female ratio necessary in Missouri to match the aggregate margin of 9.65% is a composition of 55% males. This deviates from the exit poll gender ratio by about 10%. The previous presidential exit poll is generally predictive, so without knowing the actual turnout, we could have concluded that something was wrong with the response data in Missouri based on the aforementioned 10% deviation.
Let's perform the same sort of re-weighing exercise in Iowa, and make the assumption that the aggregate outcome is correct, which is currently a 4.7% advantage to Hillary. A ratio of 38% male to 62% female is required to match the margin of 4.7%; this gender composition includes 5% more females than 2008, and 7% more than 2000  and 2004 . The conclusion then follows, using the same pattern as Missouri, that something is wrong with the response data for one or both of the genders. This assessment will form the basis for our first pollster rating after Iowans Caucus on Monday; if a pollster arrives at the correct topline result, but weighted incorrectly and had bad response data, they will be punished.
In other news, I setup a twitter account, @HWAV_TJHalva, where I plan to post site updates.