Thursday, October 22, 2009

I have just created a new graphic on how to interpret SPSS print-outs for the Independent-Samples t-test.


This new chart supplements the t-test lecture notes I showed yesterday, reflecting a change in my thinking about what to take from the SPSS print-outs.

One of the traditional assumptions of an Independent-Samples t-test is that, before we can test whether the difference between the two groups' means on the dependent variable is significant (which is of primary interest), we must verify that the groups have similar variances (standard-deviations squared) on the DV. This assumption, which is known as homoscedasticity, basically says that we want some comparability to the two groups' distributions in terms of their being equally spread out, before we can compare their means (you might think of this as "outlier protection insurance," although I don't know if this is technically the correct characterization of the problem).

If the homoscedasticity (equal-spread) assumption is violated, all is not lost. SPSS provides a test for possible violation of this assumption (the Levene's test) and, if violated, an alternative solution to use for the t-test. The alternative t-test (known as the Welch t-test or t') corrects for violations of the equal-spread assumption by "penalizing" the researcher with a reduction of degrees of freedom. Fewer degrees of freedom, of course, make it harder to achieve a statistically significant result, because the threshold t-value to attain significance is higher.

Years ago, I created a graphic for how to interpret the Levene's test and implement the proper t-test solution (i.e., the one for equal variances or for unequal variances, as appropriate). Even with the graphic, however, students still found the output confusing. Stemming from these student difficulties and some literature of which I have become aware, I have changed my opinion.

I now subscribe to the opinion of Glass and Hopkins (1996) that, “We prefer the t’ in all situations” (p. 305, footnote 30). Always using the t-test solution for when the two groups are assumed to have unequal spread (as depicted in the top graphic) is advantageous for a few reasons.

It is simpler to always use one solution than go through what many students find to be a cumbersome process for selecting which solution to use. Also, despite the two solutions (assuming equal spread and not assuming equal spread) being different and having different formulas in part, the bottom-line conclusion one draws (e.g., that men drink significantly more frequently than do women) often is the same under both solutions. If anything, the preferred (not assuming equal spread) solution is a little more conservative; in other words, it makes it a little harder to obtain a significant difference between means than does the equal-spread solution. As a result, our findings will have to be a little stronger for us to claim significance, which is not a bad thing.

Lastly, some additional notes: In searching the web to prepare this write-up, I discovered a neat graphical website that lets you see how the t-distribution begins to converge with the z-distribution as the t-test is specified with increasing degrees of freedom. Here's another good all-around site regarding t-tests, from which I learned about the distribution graphic.

Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in psychology and education (3rd ed.). Needham Heights, MA: Allyn & Bacon.

Friday, October 09, 2009

Here's a photo of the board from today's class, covering correlation and significance testing (thanks to Kristina for the photo). I've annotated the information with some additional clarification.

Saturday, September 26, 2009

Here are a couple of photos of the board (thanks to Kristina) from our coverage of z-scores during recent class meetings. An older photo that's also important can be accessed by clicking here.




What we did in the photo immediately above is examine each year's Texas Tech starting quarterback in terms of where he stood in relation to his Big 12 quarterbacking peers from the same year in average passing yards per game. The bigger the z-score, the better the Tech QB was relative to his same-year peers.

The Big 12 football statistics are available here. If statistics for just Big 12 conference games were available, we used those, as it would help hold the quality of opposition constant. If not, we used statistics from all of a team's cames (including non-conference). To take a concrete example, let's look at 2008 (statistics other than average passing yards per game have been edited out):

PASSING AVG/GAME........Team....Class...Games...Avg/Game
-----------------------------------------------------------------------------
1. Harrell, Graham............. TTU..... SR..... 8... 396.8
2. Bradford, Sam............... OU...... SO..... 8... 348.4
3. Daniel, Chase................. MU...... SR..... 8... 308.5
4. McCoy, Colt................. UT........ JR...... 8... 303.4
5. Ganz, Joe...................... NU........ SR..... 8... 291.9
6. Reesing, Todd............... KU....... JR..... 8... 271.2
7. Arnaud,Austen............... ISU.... SO...... 8... 268.6
8. Johnson, Jerrod............. TAMU.. SO.... 8... 247.9
9. Robinson, Zac................ OSU... JR...... 8... 235.8
10.Freeman, Josh............... KSU... JR...... 8... 230.0
11.Griffin, Robert............... BU.... FR....... 8... 166.9
12.Hawkins, Cody............... CU.... SO...... 8... 135.5

The mean and standard deviation of the 12 numbers in the last column were obtained (the class was broken up into three-person groups, with each group getting a different year). We would take Harrell's value (396.8), subtract the mean (267.1), then divide the result by the sample SD (72.1). The result is Harrell's 2008 z-score, 1.79 (or perhaps 1.80, however one handles rounding issues). The mean and SD of a distribution can conveniently be obtained at an online calculating site called Statiscope.

As we discussed in class, Harrell's 2008 z-score was not as much above the league average as were Harrell's 2006 and 2007 z-scores and other Tech quarterbacks' z-scores for their respective years. In no way does this suggest that Harrell had a subpar year in '08. Rather, the Big 12 had a number of great quarterbacks, which made it harder for Harrell to distance himself from the others. Harrell's QB counterparts included Oklahoma's Sam Bradford and Texas's Colt McCoy, who finished first and second, respectively, in the balloting for the Heisman Trophy, the award given to the finest college football player in the nation.

Thursday, August 27, 2009

Welcome back, Fall 2009 students! Here is the calendar portion of our course syllabus...

Wednesday, March 04, 2009

If it's March, it must be time for our annual compendium of statistics and methodology workshops being offered this upcoming summer (focusing primarily on the U.S.). Although details vary by program, they are generally open to graduate students, post-docs, and faculty. Direct quotations in the text below come from messages sent by program organizers to listservs or from program websites.

APA Advanced Training Institutes (LINK):

Ø Non-Linear Methods for Psychological Science (June 8-12, Univ. of Cincinnati)

Ø Research Methods with Diverse Racial & Ethnic Groups (June 22-26, Michigan State Univ.)

Ø Structural Equation Modeling in Longitudinal Research (June 29-July 1, Univ. of Virginia)

Ø Exploratory Data Mining in Behavioral Research (July 20-24, Univ. of Southern California)

Claremont Graduate University. August 21-26. Several workshops, with a focus on program evaluation and applied/social issues (LINK). Updated May 5, 2009

Curran-Bauer Analytics (North Carolina). Multilevel Linear Models, June 1-5, Rizzo Conference Center, University of North Carolina at Chapel Hill (LINK).

Data Analysis Training Institute of Connecticut. HLM, SEM, and Dyadic Analysis will be offered in week-long workshops ranging from June 8-26, taught by David Kenny (about whom I've written a song), Betsy McCoach, and others (LINK).

Johns Hopkins University (Bloomberg School of Public Health). Summer Institute Courses in Mental Health Research, June 22-July 2. Features courses in epidemiology, prevention, policy, grant-writing, randomized trials, longitudinal latent-variable analysis, and missing data (LINK). Added March 9, 2009

Muthen M-plus Short Courses. Held at Johns Hopkins University and other locations (LINK).

National Centre for Research Methods (United Kingdom). See under "Training & Events" on the group's website (LINK).

Oregon State University. July 7-10, on Latent Growth Modeling and M-plus (LINK).

Portland State University (Oregon). A series of two-day workshops in mid-June, on topics such as factor analysis/SEM, longitudinal analysis (including survival analysis), secondary data, and complex survey design (LINK). Added March 25, 2009

Scientific Software International (SSI). Workshops on SEM/LISREL and HLM, September 1-3, Chicago (LINK). Added April 21, 2009

Statistical Horizons. Brief workshops, covering different topics, offered in Philadelphia and elsewhere (LINK).

Texas A&M (Educational Psychology). A wide variety of courses will be offered from June 7-19. "This year we are very excited to have two more new courses: Mediation and Moderation Analysis by Matt Fritz (Virginia Tech) [and] Meta Analysis by Victor Willson (Texas A&M University)." Existing courses include Mixed Methods Research, Item Response Theory, Exploratory and Confirmatory Factor Analysis, SEM, HLM, and Nonparametric Statistics. Discounted "Early Bird" registration by March 31 (LINK).

University at Buffalo (Sociology). Workshops of one week or shorter in May, covering SEM, HLM, meta-analysis, and evaluation (LINK).

University of Kansas. "... is pleased to announce an expanded set of 5-day workshops on quantitative methodology, to take place June 1-19, 2009 in Lawrence, Kansas. Workshop topics this year include: Structural Equation Modeling, Multilevel Modeling, Advanced Longitudinal Modeling, Meta-Analysis, Item Response Theory, and Social Network Dynamics" (LINK).

University of Maryland (Center for Integrated Latent Variable Research). Diagnostic Measurement: Theory, Methods, and Applications, May 28-29. "This short course is targeted to measurement professionals with at least basic training in statistics and who are interested in learning about the theory, methods, and applications of modern latent-variable models for classifying respondents. The models that are the focus of this course are known as diagnostic classification models (DCMs) or, alternatively, cognitive diagnosis models and restricted latent class models..." (LINK).

University of Massachusetts-Amherst (Center for Research on Families). Workshops entitled "Modeling Diary and Dyadic Data" will be held June 8-11 (LINK).

University of Michigan...

Ø ICPSR Quantitative Methods

Ø Survey Research Center

Ø School of Public Health -- Epidemiology

Virginia Commonwealth University (CARMA). Brief courses on a wide range of data-analytic topics, both quantitative and qualitative, offered from May 11-16 (LINK).

Thursday, November 20, 2008

(Updated after Friday, November 21's class)

Friday we'll be covering Confidence Intervals (CI), focusing on my notes from two years ago.

This year, I'd like to add some links to sites on the calculation of different types of CI's. The general form for calculating CI's is:

95% CI = Sample estimate +/- (1.96) (Standard Error)
............(e.g., r or Mean)

The specific forms of this calculation for CI's around a mean, a correlation, and a proportion, respectively, are shown here, here, and here. Note how increasing one's sample size (N) will shrink the SE and hence, the CI. I also have written a new song:

True Value
Lyrics by Alan Reifman
(May be sung to the tune of “Moon Shadow,” Cat Stevens)

Within your CI, you get the true value, true value, true value,
With 95%, you get the true value, true value, true value,

You get a sample statistic, a sample r, or sample M,
You then take plus-or-minus two (it’s really 1.96…), standard errors beyond your stat,
And within this new interval, we can be, so confident,
That the true value, mu or rho, will be somewhere… inside…, our confidence interval,

Within your CI, you get the true value, true value, true value,
With 95%, you get the true value, true value, true value,

Wednesday, November 05, 2008

Here are some quick links to the main chi-square notes (here and here). Also, the exit polls from last night's election are available in such a format that we can apply chi-square analysis to them.