Screens or weights?
23 November, 2012
I probably shouldn't be spending so much of my time thinking about U.S.
election polls: I have no special expertise, and everyone else in the
country has lost interest by now. But I've just gotten some new information
about a question that was puzzling me throughout the recent election
campaign: What do pollsters mean when they refer to a likely voter screen?
At first I thought "likely voter screen" sounds like the pollsters ask a
bunch of questions designed to ferret out a respondent's likelihood of
voting (like, are you likely to vote?), and then "screen out" -- dismiss
from the sample -- respondents who are deemed unlikely to vote. But then I
realised that this wouldn't make any sense. A "screen" makes sense when
talking about a property that is easy to evaluate and determinative for the
goals of your survey. For instance, if you're trying to find out what
Hispanic residents of the US think about current immigration policy, you
might as well just ask them up front (Are you Hispanic? -- or, if you want
to make sure they're following the same definition that you are, perhaps
some more specific questions, like Are you an immigrant from a majority
Spanish-speaking country or a child of same? Is Spanish your primary
language? Did you grow up speaking Spanish at home?), and if they don't
answer the right way you might as well hang up at that point. At that point,
you'll call your sample "Hispanic" rather than "likely Hispanic", though of
course you could be wrong.
When it's a question of probabilities and proportions, what you want to do
is reweighting. This is what the pollsters do with demographic variables.
Suppose you're doing a survey about dietary habits, and your sample includes
5% African Americans, 30% Hispanics, and 65% white; but you know the target
population includes only 10% African Americans but 20% Hispanics. So in
making up an average for the population you count each AA from your sample
double, each Hispanic 2/3, and the white respondents are slightly upweighted
(70/65) as well. (Alternatively, you stratify, meaning that you
report the three groups separately.)
So, let's say you do your calling, and find that 60% of the respondents are
"likely voters" according to your screen, and 45% of them support Obama,
with 50% supporting Romney. Of the remainder, the "unlikely voters", 55%
support Obama and 40 % support Romney. What number should you report? It's
not as though you have reason to think that the likely voters are going to
come out and vote, and the others are going to stay home. Even if your
questions are well chosen, the best you might be able to say would be
something like "80% of the likely voters will ultimately vote, and 30% of
the rest." Once they show up at the polls (or don't) it doesn't matter which
group they belonged to, so the only thing to do is to mix the populations to
obtain a estimate for the true electorate. That yields
(0.8 x 0.6 x 45% + 0.3 x 0.4 x 55%)/(0.8 x 0.6 + 0.3 x 0.4) = 47% support
for Obama, and 48% support for
On further reflection I decided that "likely voter screen" was really just
imprecise journalese for reweighting. After all, professional pollsters
certainly know how to do reweighting.
First thought, best thought, as they say. The fact that it's nonsense
doesn't mean that that's not what they do. Maybe I'm missing something? Here
article about the Gallup likely voter screen.
Gallup uses a series of seven questions,
including if they voted in previous elections, if they plan to vote this
year, and if they know where their polling place is. Those who score
highest on these measures are classified as likely voters.
On a somewhat related point, I posted some comments
about Nate Silver's situation as an attractor for Republican ire during the
election campaign, that pulled in powerful currents of right-wing
anti-intellectualism and anti-science sentiment.
Robert Waldmann has posted a test
on the blog Angry Bear, and the
result is extremely impressive. He tested Silver's state-by-state
predictions, each of which came with a central estimate and a standard
error. Assuming a normal model, each of these can easily be turned into a
p-value for the true result. If his predictions were not just on the right
side of zero, but genuinely were unbiased and even had the right sized
confidence intervals, then these p-values should be like 50 samples from a
uniform distribution on (0,1) (not independent though). So if you put the 50
p-values in order and plot them, they should line up approximately on a
straight diagonal on the unit square. And that's pretty much what you see.
This is a fairly sensitive test, and proves pretty conclusively that the
predictions were genuine and carefully done. The most obvious temptation for
someone who was being aggressively accused of exaggerating his certainty, or
even biasing the predictions, would have been to find new sources of
uncertainty -- it's easy to come up with a convincing story -- to inflate
the standard errors. This would have given him a convenient argument after
the fact -- "Sure, Obama lost state X when I said he'd probably win, but I
only put a 65% probability on it", or whatever -- which is exactly the
argument he was accused of making. (This is bullshit! He's only predicting
probabilities! He can always say he wasn't actually "wrong", no matter which
way it turns out.) 50 samples are enough for the law of large numbers, so if
he'd done that, it would have shown up conspicuously with p-values bunched
near the middle, and a sigma-shaped plot.
In other words, not only did Silver get most of his predictions "right", he
also got the predicted number of predictions "wrong", to essentially the