Sunday, October 15, 2006

Partial Correlations

NOTE: I have edited and reorganized some of my writings on correlation to present the information more coherently (10/11/2012).

Partial correlations "hold constant" or "control for" one or more "lurking" variables extraneous to your two primary variables. The idea is that one or both of the primary variables may also be correlated with a third (or fourth, etc.) variable, which could obscure our understanding of the original two-variable relationship.

Here's a quote from the book Think of a Number (by Malcolm E. Lines, 1990) that puts the technique of partial correlation in some perspective:

It is probably the statistician's most difficult task of all to assure himself or herself that no unrecognized confounding factor lies hidden in the sampling groups which are being tested (p. 37).

In an example I've used for many years (available in the links section), we can examine whether the relationship between Body Mass Index (BMI) and success at In Vitro Fertilization (IVF) might be complicated by BMI being positively correlated with age. A bivariate (or zero-order) correlation between BMI and IVF could be ambiguous because BMI might be carrying some influence of age.

(We also speculated about whether the age-IVF relationship might be complicated by the role of BMI, for which similar considerations as above would apply.)

The answer, of course, would be to obtain partial correlations (e.g., rBMI-IVF "dot" age). A third variable that may be correlated with one or both of the two primary variables in an analysis and thus could possibly complicate matters -- such as age in this example -- would be known as a "confound" or "confounder." "Con" means "with" or "together" in some languages, such as "arroz con pollo" (Spanish for "chicken and rice"). Confound thus means "found with" or "found together," such as age being "found with" BMI (i.e., we put on extra weight as we get older).

Another example we may examine (time permitting) is the following:

Ceci, S. J., & Williams, W. M. (1997). Schooling, intelligence, and income. American Psychologist, 52, 1051-1058.

Here's an example comparing "regular" bivariate correlations and partial correlations in SPSS:

(Note that, in the bivariate output, you get the correlation coefficient [r], significance [probability] level, and sample size [N] for each pair of variables. The partial correlation outputs likewise give you the partial correlation and significance level. However, instead of N, you get something closely related, called "degrees of freedom" [df]. For correlational analyses, df simply equals sample size minus number of variables in the correlation [N - 2 for a bivariate correlation, N - 3 for a first-order partial correlation, N - 4 for a second-order partial, etc.])

For anyone who is interested, this document by W.C. Burns probes the issues of confounds, third variables, and spurious relationships in greater depth.

Now, for a song...

Partial Correlation
Lyrics by Alan Reifman
(May be sung to the tune of “Crystal Blue Persuasion,” James/Vale/Gray)

Studying variables, with potential confounds,
Don’t want a paper, where confusion abounds,
There’s a technique, now, for consideration,
What you need to use, is partial correlation,

The connection of interest, is between A and B,
Each may be related, to the third factor, C,
There’s a formula out there, analysts would approve,
The influence of C, now, it will remove,

(Instrumental build-up)

Partial correlation,
For a strict-er determination,
Partial correlation,
A simple, calculation,

You’re purging C’s variance, from A and from B,
Thus from C’s linkage, the others are free,
The new A and B, now, you test for correlation,
The r, yes-sirree, AB-dot-C…

Partial correlation…


Partial correlation,
For a strict-er determination,
Partial correlation,
A simple, calculation,

Partial correlation ….
(Fade out)