Wednesday, October 22, 2008

t-Test Overview

(Updated December 24, 2024)

We will now be covering t-tests (for comparing the means of two groups) for the next week or so. As we'll discuss, there are two ways to design studies for a t-test:

INDEPENDENT SAMPLES, where a participant in one group (e.g., Trump voters in the 2024 election) cannot be in the other group (Harris voters). The technical term is that the groups are "mutually exclusive." The Trump and Harris voters could be compared, for example, on their average income.

PAIRED/CORRELATED GROUPS, where the same (or matched) person(s) can serve in both groups. For example, the same participant could be asked to complete math problems both during a period where loud hard-rock music is played and during a period where quiet, soothing music is played. Or, if you were comparing men and women on some attitude measure and your participants were heterosexual married couples, that would be considered a correlated design.

The Naked Statistics book briefly discusses the formula for an independent-samples t-test on pp. 164-165. Here's a simplified graphic I found from the web (original source):

Notice from the "Xbar1 - Xbar2" portion that the t statistic is gauging the amount of difference between the two means, in the context of the respective groups' standard deviations (s) and sample sizes (n). Your obtained t value will be compared to the t distribution (which is similar to the normal z distribution) to see if it is extreme enough to be unlikely to stem from chance. You will also need to take account of "degrees of freedom," which for an independent-samples t-test are closely based on total sample size.

There's an online graphic that visually illustrates the difference between z (normal) and t distributions (click on this link and then, when the page comes up, on "Click to View"). As noted on this page from Columbia University, "tails of the t-distribution are thicker and extend out further than those of the Z distribution. This indicates that for a given confidence level, t-scores [needed for significance] are larger than Z scores."

More technically, as Westfall and Henning (2013) point out, "Compared to the standard normal distribution, the t-distribution has the same median (0.0) but with variance df/(df-2), which is larger than the standard normal's variance of 1.0" (p. 423). Remember that the variance is just the standard deviation squared.

In this table are shown values your obtained t statistic needs to exceed (known as "critical values") for statistical significance, depending on your df and target significance level (typically p < .05, two-tailed). 

This website provides a nice overview of one- and two-tailed tests. One-tailed tests are appropriate when there is a directional hypothesis (i.e., among students with no prior calculus instruction, those who receive calculus instruction during a summer workshop will score higher, on average, on a calculus post-test than will students who did not receive a summer calculus workshop, with the opposite prediction making no sense). Despite one-tailed tests seeming to be the best choice in some situations, however, two-tailed tests are nearly always used, presumably because they are more conservative (i.e., harder to obtain significance with). This 2024 article argues for greater use of one-tailed tests.

I have created a little tutorial on how to interpret SPSS output for independent-samples t-tests.

Finally, we take up the paired/correlated/dependent samples t-test at this link.

Thursday, October 02, 2008

Hypothesis Testing with Correlations

NOTE: I have edited and reorganized some of my writings on correlation to present the information more coherently (10/11/2012).

The correlation statistic presents the first instance in which we'll be examining statistical significance (here and here). The question is whether we can reject the null hypothesis (Ho) that the correlation between a given pair of variables in the full population is zero (RHO = 0).

We, of course, obtain correlations (r) for our sample, and then see if our sample correlation is sufficiently different from zero (in either a positive or negative direction) so that it would have been sufficiently unlikely to have arisen from pure chance when the population RHO was truly zero. That's what we mean by statistical significance. When we achieve statistical significance, we can reject the Ho of zero RHO.

In order to have a statistically significant correlation, the correlation (r) itself should be appreciably different from zero, either above zero (a positive correlation) or below zero (a negative correlation).

Also, in order for the correlation to be significant, the significance (or probability) level displayed for a given correlation in your SPSS output must be very small (p < .05, or if the probability is even smaller, you can use one of the other conventional cut-off points, p < .01 or p < .001). Any time the probability p is larger than .05, the correlation is nonsignificant (in my opinion, if you get a correlation with a p level of .06 or .07, it's OK to note in your report that the correlation narrowly missed being significant under conventional standards).

Suppose you find that the correlation between two variables is r = .30, p < .01. This is telling us that, if the null hypothesis (Ho) is true -- that is, there truly is no correlation in the population from which the sample was drawn (rho = 0, where rho looks like a curvy capital P) -- then it would be extremely unlikely (p < .01) for a correlation of .30 to crop up purely by chance when the correlation throughout the society is truly zero.

[Here's a figure I've added in October 2007, to convey the idea of there truly being no correlation in a large population, but a correlation occurring in one's sample purely by random sampling error:


This web document is also helpful. The opposite problem, where the full population truly has a correlation, but you draw a sample that fails to show it, will be discussed later in the course.]

A significant correlation in our sample thus allows us to reject the null hypothesis and assert, based on an inference from our sample, that there is a correlation in the population. Again, note the inference from sample to population.

The essence of scientific hypothesis testing can thus be distilled to three steps:

1. State the null hypothesis (Ho) that there is no correlation between your two variables in the population (rho = 0). The investigator probably doesn't believe Ho (generally, we seek to uncover significant relationships), but Ho is part of the scientific protocol.

2. Obtain the sample correlation (r) between your two variables and the associated significance (p) level.

3. If the correlation is statistically significant (r is well above or well below zero, and p < .05), reject Ho. If the correlation is nonsignificant (r is close to zero and p is larger than .05), then the null hypothesis that there is no correlation between your two variables in the population must be maintained. We never accept the truth of the null hypothesis for certain; we just say it cannot be rejected.

I've stated above that, in order to be significant, a correlation (r) needed to be well above or well below zero. That's not always true, however. As we saw in some of our SPSS illustrations, with a very large sample size (n = 1,000 or more), a correlation does not necessarily need to be that far from zero to be significant. If a correlation appears to be small, yet is listed in the output as being significant (probably only due to the large sample size), you can say the correlation was "significant, though weak." A Wikipedia document on correlation (which I've just added to the links section on the right) displays guidelines developed by the late statistician Jacob Cohen for labeling correlations as "small, medium, and large."

Two concepts we will be taking up later in the course, statistical power and confidence intervals, will elaborate upon the issue of small correlations sometimes being significant and, conversely, relatively large correlations not being significant.