Tuesday, October 30, 2007

While grading the correlation assignments, I came across an interesting finding in one of the students' papers (we use the "GSS93 subset" practice data set in SPSS, and each student can select his or her own variables for analysis).

Among the variables selected by this one student were number of children and frequency of sex during the last year. At the bivariate, zero-order level, these two variables were correlated at r = -.102, p < .001 (n = 1,327).

The student then conducted a partial correlation, focusing on the same two variables, but this time controlling for age (the student used the four-category age variable, although a continuous age variable is also available). This partial correlation turned out to be r = .101, p < .001.

Having graded several dozen papers from this assignment over the years, my impression was that, at least among the variables chosen by my students from this data set, partialling out variables generally had little impact on the magnitude of correlation between the two focal variables. Granted, neither of the correlations in the present example are all that huge, but the changing of the correlation's sign from negative (zero-order) to positive (first-order partial), with each of the respective correlations significantly different from zero, was noteworthy in my mind.

Also of interest was that age had both a fairly substantial positive correlation with number of children (r = .437) and a comparably powerful negative correlation with frequency of sex (r = -.410).

To probe the difference between the zero-order and partial correlations between number of children and frequency of sex, I went back to the (PowerPoint) drawing board, and created a scatter plot, color coding for age (also, in scatter plots created in SPSS, a dot only appears to indicate the presence of at least one case at a given spot on the graph, not the number of cases, so I attempted to remedy that, too).

My plot is shown below (you can click to enlarge it). I added trend lines after studying the SPSS scatter plots to see where the lines would go. As can be seen, the full-sample trend is indeed of a negative correlation, whereas all the age-specific trends are positive. We'll discuss this further in class.