(Updated November 30, 2016)
We've previously discussed how we use analyses of sample statistics to make inferences about population parameters, because surveying entire populations of hundreds of millions of people can only rarely be done. Ideally, a well-drawn sample will be a microcosm of the larger population (same percentage distribution by gender, race/ethnicity, age, education, etc.). We can use our obtained sample statistic to produce an estimate of the true population parameter, as illustrated in the following graphic (which you can click to enlarge).
However, even with the most rigorous sampling methods (e.g., random-digit-dial telephone sampling), there is likely to be some discrepancy between the characteristics of the sample and of the population. Just as a sequence of 100 coin-flips can deviate from exactly 50 heads and 50 tails (e.g., 51 heads and 49 tails) purely via random processes, a sample of respondents may end up with too many men (relative to the population ratio), too many women, too many young people, too many old people, etc. This is known as random sampling error. To allow for random sampling error, the statistic we obtain from a sample, such as the percentage of likely voters supporting Hillary Clinton for President in the 2016 election, must carry a margin of error (MoE).
The range we end up with is also known as a confidence interval (CI). A 95% CI is most commonly used; it allows us to say that, if the full population were surveyed, we would be 95% confident the true population parameter would be inside that range. (The definition is actually a little more technical, as shown in this document, but the previous statement is a good approximation.)
Election night a few weeks ago was a rough one for pollsters. In forecasting the total number of votes each candidate would receive nationally (which Clinton won), many leading polls did not have Clinton's true percentage within their confidence intervals (or Trump's, for that matter).
(The necessary data came from CNN for the national popular vote, Polling Report for most of the national polls, and the websites of the Rasmussen and Los Angeles Times/USC polls; the MoE for the latter was obtained here.)
Polls in the key states, which were crucial to Trump's victory, appear to have been off by even more than the national polls.
Confidence intervals can be put around any kind of sample-based statistic, such as a percentage, a mean, or a correlation, to produce a range for estimating the true value in the larger population (as long as the sample appears to be representative of the larger population).
Besides allowing one to see the likely range of possible values of a parameter in the population, CI's can also be used for significance-testing. A 95% CI is most commonly used, corresponding to p < .05 significance
The following chart presents visual depictions of 95% confidence intervals, based upon correlations reported in this article on children's physical activity. (On some computers, the image below may stall before its bottom portion appears; if you click on the image to enlarge it, you should get the full picture, which features a continuum of correlations from -.30 to .30 at the bottom.)
An important thing to notice from the chart is that there's a direct translation from a confidence interval around a sample correlation to its statistical significance. If a CI does not include zero (i.e., is entirely in positive "territory" or entirely in negative "territory"), then the result is significantly different from zero, and we can reject the null hypothesis of zero correlation in the population. On the other hand, if the CI does include zero (i.e., straddles positive and negative territory), then the result cannot be significantly different from zero, and Ho is maintained. As Westfall and Henning (2013) put it:
...the confidence interval for [a parameter] provides the same information [as the p value from a significance test] as to whether the results are explainable by chance alone, but it gives you more than just that. It also gives the range of plausible values of the parameter, whether or not the results are explainable by chance alone (p. 435).
The dichotomous nature of null hypothesis significance testing (NHST) -- the idea that a result either is or is not significantly different from zero -- makes it less informative than the CI approach, where you get an estimated range within which the true population value is likely to fall. Therefore, many researchers have called for the abolition of NHST in favor of CI's (for an example, click here).
Of course, even if only CI's are presented, an interpretation in terms of statistical significance can still be made. Thus, it seems, we can have our cake and eat it too! Of that you can be confident.
This next posting shows how to calculate CI's and also includes a song...