Thursday, October 22, 2009

Further t-Test Info (SPSS Output, Independent Samples)

(Updated October 26, 2014)

I have just created a new graphic on how to interpret SPSS print-outs for the Independent-Samples t-test (where any given participant is in only one of two mutually exclusive groups).


This new chart supplements the t-test lecture notes I showed recently, reflecting a change in my thinking about what to take from the SPSS print-outs.

One of the traditional assumptions of an Independent-Samples t-test is that, before we can test whether the difference between the two groups' means on the dependent variable is significant (which is of primary interest), we must verify that the groups have similar variances (standard-deviations squared) on the DV. This assumption, which is known as homoscedasticity, basically says that we want some comparability to the two groups' distributions in terms of their being equally spread out, before we can compare their means (you might think of this as "outlier protection insurance," although I don't know if this is technically the correct characterization of the problem).

If the homoscedasticity (equal-spread) assumption is violated, all is not lost. SPSS provides a test for possible violation of this assumption (the Levene's test) and, if violated, an alternative solution to use for the t-test. The alternative t-test (known as the Welch t-test or t') corrects for violations of the equal-spread assumption by "penalizing" the researcher with a reduction of degrees of freedom. Fewer degrees of freedom, of course, make it harder to achieve a statistically significant result, because the threshold t-value to attain significance is higher.

Years ago, I created a graphic for how to interpret the Levene's test and implement the proper t-test solution (i.e., the one for equal variances or for unequal variances, as appropriate). Even with the graphic, however, students still found the output confusing. Stemming from these student difficulties and some literature of which I have become aware, I have changed my opinion.

I now subscribe to the opinion of Glass and Hopkins (1996) that, “We prefer the t’ in all situations” (p. 305, footnote 30). Always using the t-test solution for when the two groups are assumed to have unequal spread (as depicted in the top graphic) is advantageous for a few reasons.

It is simpler to always use one solution than go through what many students find to be a cumbersome process for selecting which solution to use. Also, despite the two solutions (assuming equal spread and not assuming equal spread) being different and having different formulas in part, the bottom-line conclusion one draws (e.g., that men drink significantly more frequently than do women) often is the same under both solutions. If anything, the preferred (not assuming equal spread) solution is a little more conservative; in other words, it makes it a little harder to obtain a significant difference between means than does the equal-spread solution. As a result, our findings will have to be a little stronger for us to claim significance, which is not a bad thing.

Reference

Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in psychology and education (3rd ed.). Needham Heights, MA: Allyn & Bacon.

Friday, October 09, 2009

Significance Testing for Correlations

UPDATED 10/15/24

This website nicely illustrates one- vs. two-tailed significance tests.

Here's a photo of the board from a long ago class, covering correlation and significance testing (thanks to Kristina for the photo). I've annotated the information with some additional clarification.

Saturday, September 26, 2009

z-Scores and Percentiles

(Updated July 17, 2013)

If a body of data is normally distributed (i.e., follows the bell-shaped curve), we can convert an individual's z-score on a given variable into a percentile. A percentile refers to the percentage of sample members an individual stands above on the variable.

If we look at page 26 of Naked Statistics, we can translate any particular z-score into a percentile (again, as long as the distribution is normal). For instance, we can see that a z-score of +1 (shown along the bottom of the diagram in the book as mu + 1 sigma) places someone at roughly the 84th percentile. Fifty percent of the sample lies below mu, and another 34.1% lie between mu and mu +1 sigma, thus adding up to 84.1.

On this webpage, you can see how z-scores and percentiles correspond to areas under the normal curve. There is also a website where you can simply type in a z-score and get the corresponding percentile.

This photo of the board from a recent class meeting (thanks to Kristina) summarizes some of the major properties of z-scores.