Thursday, October 11, 2007

How Significance Cut-Offs for Correlations Vary by Sample Size

Here's a more elaborate diagram (click to enlarge) of what I started sketching on the board at yesterday's class. It shows that, with smaller sample sizes, larger (absolute) values of r (i.e., further away from zero) are needed to attain statistical significance, than is the case with larger samples. In other words, with smaller samples, it takes a stronger correlation (in a positive or negative direction) to reject the null hypothesis of no true correlation in the full population (rho = 0) and rule out (to the degree of certainty indicated by the p level) that the correlation in your sample (r) has arisen purely from chance.

As you can see, statisticians sometimes talk about sample sizes in terms of degrees of freedom (df). We'll discuss df more thoroughly later in the course in connection with other statistical techniques. For now, though, suffice it to say that for ordinary correlations, df and sample size (N) are very similar, with df = N - 2 (i.e., the sample size, minus the number of variables in the correlation).

For a partial correlation that controls (holds constant) one variable beyond the two main variables being correlated (a first-order partial), df = N - 3; for one that controls for two variables beyond the two main ones (a second-order partial), df = N - 4, etc.

This web document also has some useful information.