Saturday, April 17, 2021

2021 (Virtual) Summer Stats & Methods Courses

Given the continuing uncertainties regarding COVID-19, such as the uneven distribution of successful vaccinations and emergence of variants of the virus, most summer statistical and methodological workshops will remain virtual for another year. More specifically, instruction will be synchronous, with online lectures taking place live at a specified time, unless noted otherwise. Links to programs offering workshops appear below, roughly in chronological order of when they will take place. Please e-mail me (faculty webpage at right) with additions. 

Enablytics, May

University of Texas, May

Data Orbit (added 4/29; Data Visualization), May

YoungStatS (added 5/3; Composite-based SEM/PLS), May

Andy Supple/Mplus, May-June

Center for Statistical Training/Curran-Bauer, May June

University of Pittsburgh, May-July

University of Michigan (ICPSR), May-August 

APA Advanced Training Institutes, June

Advanced Methods Institute/Culturally Responsive Research, June

Global School Empirical Research Methods (GSERM), June

Stats Camp, June

University of Maryland/Center for Integrated Latent Variable Research (added 5/5; Bayesian Modeling), June

John Sakaluk (added 5/8), Data Management, Visualization, and Analysis in R, June

BYU Family Studies Center (added 5/20), Longitudinal SEM, June

Kirsten Morehouse (added 5/25), FREE "Advanced Crash Course in R," June

Johns Hopkins/Bloomberg School of Public Health (Epidemiology & Biostatistics; virtual with small number of in-person courses), June-July 

U. Michigan Survey Research Center (flipped format, combination of asynchronous and synchronous), June-July

Research Expertise Centre for Survey Methodology (RECSM; added 5/14), June-July

Stats Whisperer (added 5/17; mediation and moderation), June-July

U. Michigan School of Public Health, July

Ljubljana, July

Utrecht, July-September

"Figure It Out," September-October

Summer Institute on Innovative Methods (tentatively planned to be in person, but could change to virtual), October

Statistical Horizons (added 4/22), ongoing series all year, including "On Demand" (self-paced, asynchronous) courses

Mplus (added 4/24), ongoing series all year

QUANTFiSH/Christian Geiser (added 5/25), latent state-trait modeling, self-paced/asynchronous, videos go live on June 1

Friday, October 10, 2014

Writing Up Statistical Results in APA Style

Dr. Shera Jackson, our lab instructor, has compiled the following list of web resources for how to write up statistical results in APA style:

Reporting Statistics in APA Style (Matthew Hesson-McInnis, Illinois State University)

Reporting Results of Common Statistical Tests in APA Format (Psychology Writing Center, University of Washington)

Statistics in APA Style (Craig Wendorf, University of Wisconsin-Stevens Point); see summary table on page 5

Monday, August 25, 2014

Welcome Message

The message below is old, as I haven't taught Intro Stats in several years. However, it contains some potentially useful links, as well as my overview of the course.

Welcome to QM I for Fall 2014, and, for those of you who haven't been in school here in the past, welcome to Texas Tech! You'll be visiting this welcoming page a lot, as it contains the links for our lecture notes.

I'll do my best to provide a lot of practical, real-world exercises in analyzing data, and I'll try to keep things fun. This passage from a book I read several years ago, Coincidences, Chaos, and All That Math Jazz, by Edward B. Burger and Michael Starbird, provides a concise overview of what statistics can offer:

Statistics can help us understand the world. It is a powerful and effective tool for placing economic, social welfare, sports, and health issues into perspective. It molds data into digestible morsels and shows us a measured way to look at situations that have either random or unknown features. But we must use common sense when applying statistics or other tools that draw on our experience of the world to shape data into meaningful conclusions (p. 60).

In addition, the following article sets forth some goals for what you should learn in this class (and other classes). We can access this article via the Texas Tech Library website or Google Scholar.

Utts, J. (2003). What educated citizens should know about statistics and probability. American Statistician, 57(2), 74-79.

LECTURE NOTES (asterisked [*] pages are from my undergraduate research-methods class).

Units of analysis*

Sampling*

Types of Measures*

Visual depictions of a data distribution (examples):
  • Histograms (overview; determining interval/bin widths; SPSS instructions here and here). UPDATE 9/10/14: King and Minium (2003) offer some advice on interval widths and the appearance of histograms, citing the "convention that the height of the figure should be about three-quarters of the width." Also, "When we have relatively few cases and wish to see if a pattern exists, we can often reduce irregularity due to chance fluctuation by using fewer class intervals than usual" (pp. 56-57).
  • Frequency tables (click here and then select output), which contain similar information to histograms; the cumulative percentages also are roughly similar to percentiles (for a given score, you can see what percent of the sample falls below it)
  • Shapes of distributions
  • As a class exercise, we will attempt to reproduce via SPSS this histogram of U.S. Presidents' ages upon assuming office (note that Grover Cleveland, who served two non-consecutive terms is counted as being "two presidents," the 22nd and 24th)
Descriptive statistics:* Central tendency (mean, median, and mode) and spread (standard deviation); moments of a distribution; and z-scores (here, here, here, and here)

Probability (here and here)

Correlation and significance-testing

t-tests

Chi-square

Non-parametric statistics

Statistical power

Confidence intervals

Tuesday, November 16, 2010

Statistical Power (Overview)

This week we'll be covering statistical power (also known as power analysis). Power is not a statistical technique like correlation, t-test, and chi-square. Rather, power involves designing your study (particularly getting a large enough sample size) so that you can use correlations, t-tests, etc., more effectively. The core concept of power, like so much else, goes back to the distinction between the population and a sample. When there truly is a basis in the population for rejecting the null hypothesis (e.g., a non-zero correlation, a non-zero difference between means), we want to increase the likelihood that we reject the null from the analysis of our sample. In other words, we want to be able to pronounce a result significant, when warranted. Here are links to my previous entries on statistical power.

Introductory lecture

Why a powerful design is needed: The population may truly have a non-zero correlation, for example, but due to random sampling error, your sample may not; plus, some songs on statistical power!

Remember that there's also the opposite kind of error: The population truly has absolutely no correlation, but again due to random sampling error, you draw a sample that gives the impression of a non-zero correlation.

How to plan a study using power considerations

Wednesday, November 03, 2010

Chi-Square

My introductory stat notes for methods class have some introductory information on chi-square.

Here are direct links to some old chi-square blog postings. This one discusses the reversibility error and how properly to read an SPSS printout of a chi-square analysis. The other one illustrates the null hypothesis for chi-square analyses in terms of equal pie-charts.

The following photo of the board, containing chi-square tips, was added on November 15, 2011 (thanks to Selen).


Plus a song (added November 1, 2011):

One Degree is Free
Lyrics by Alan Reifman
(May be sung to the tune of “Rock and Roll is Free,” Ben Harper)

Look at your, chi-square table,
If it is, 2-by-2,
One cell can be filled freely,
While the others take their cue,

The formula that you can use,
Come on, from the columns, lose one,
And one, as well, from the rows,
Multiply the two, isn’t this fun?

One degree is free, in your table,
With con-tin-gen-cy, in your table,
One degree is free, in your table,
…free in your table,
…free in your table,

Say, your table is larger,
Maybe it’s 2-by-4,
Multiply one by three,
3 df are in store,

The df’s are essential,
To check significance,
Go to your chi-square table,
And find the right instance,

Three degrees are free, in your table,
With con-tin-gen-cy, in your table,
Three degrees are free, in your table,
…free in your table,
…free in your table,

(Guitar Solo)

Wednesday, October 06, 2010

Correlation

NOTE: I have edited and reorganized some of my writings on correlation to present the information more coherently (10/11/2012).

Our next topic is correlational analysis. There are four major areas to address:

1. A general introduction to correlation, which is available here amidst my Research Methods lecture notes.

2. Running correlations in SPSS. This graphic of SPSS output tries to make clear that a sample correlation and its significance/probability level are two different things (although related to each other).


Second, in graphing the data points and best-fitting line, you start in "Graphs," go to "Legacy Dialogs," and select "Scatter/Dot." Then, select "Simple Scatter" and click on "Define." You will then insert the variables you want to display on the X and Y axes, and say "OK." When the scatter plot first appears, you can click on it to do more editing. To add the best-fit line, under "Elements," choose "Fit Line at Total."

Initially the dots will all look the same throughout the scatter plot. To make each dot represent the number of cases at that point (either by thickness of the dot or through color-coding), click on the "Binning" icon (circled below in red). Thanks to Xiaohui for finding this!

3. Statistical significance and testing the null hypothesis, as applied to correlation. Subthemes within this topic include how sample size affects the ease of getting a statistically significant result (i.e., rejecting the null hypothesis of zero correlation in the full population), and one- vs. two-tailed significance.

4. Partial correlation (i.e., the correlation between two variables, holding constant one or more "lurking" variables).

Here are some additional tips:

5. In evaluating the meaning of a correlation that appears as positive or negative in the SPSS output, you must know how each of the variables is keyed (i.e., does a high score reflect more of the behavior or less of the behavior?).

6. Statistical significance is not necessarily indicative of social importance. With really large sample sizes (such as we have available in the GSS), even a correlation that seems only modestly different from zero may be statistically significant. To remedy this situation, the late statistician Jacob Cohen devised criteria for "small," "medium," and "large" correlations.

7. Correlations should also be interpreted in the context of range restriction (see links section on the right). Here's a song to reinforce the ideas:

Restriction in the Range 
Lyrics by Alan Reifman
(May be sung to the tune of “Laughter in the Rain,” Sedaka/Cody)

Why do you get such a small correlation,
With variables you think should be related?
Seems you’re not studying the full human spectrum,
Just looking at part of bivariate space,
All kinds of thoughts start to race, through your mind…

Ooh, there’s restriction in the range,
Dampening the slope of the best-fit line,
Ooh, I can correct r for this,
Put a better rho estimate in its place...

Thursday, October 22, 2009

Further t-Test Info (SPSS Output, Independent Samples)

(Updated October 26, 2014)

I have just created a new graphic on how to interpret SPSS print-outs for the Independent-Samples t-test (where any given participant is in only one of two mutually exclusive groups).


This new chart supplements the t-test lecture notes I showed recently, reflecting a change in my thinking about what to take from the SPSS print-outs.

One of the traditional assumptions of an Independent-Samples t-test is that, before we can test whether the difference between the two groups' means on the dependent variable is significant (which is of primary interest), we must verify that the groups have similar variances (standard-deviations squared) on the DV. This assumption, which is known as homoscedasticity, basically says that we want some comparability to the two groups' distributions in terms of their being equally spread out, before we can compare their means (you might think of this as "outlier protection insurance," although I don't know if this is technically the correct characterization of the problem).

If the homoscedasticity (equal-spread) assumption is violated, all is not lost. SPSS provides a test for possible violation of this assumption (the Levene's test) and, if violated, an alternative solution to use for the t-test. The alternative t-test (known as the Welch t-test or t') corrects for violations of the equal-spread assumption by "penalizing" the researcher with a reduction of degrees of freedom. Fewer degrees of freedom, of course, make it harder to achieve a statistically significant result, because the threshold t-value to attain significance is higher.

Years ago, I created a graphic for how to interpret the Levene's test and implement the proper t-test solution (i.e., the one for equal variances or for unequal variances, as appropriate). Even with the graphic, however, students still found the output confusing. Stemming from these student difficulties and some literature of which I have become aware, I have changed my opinion.

I now subscribe to the opinion of Glass and Hopkins (1996) that, “We prefer the t’ in all situations” (p. 305, footnote 30). Always using the t-test solution for when the two groups are assumed to have unequal spread (as depicted in the top graphic) is advantageous for a few reasons.

It is simpler to always use one solution than go through what many students find to be a cumbersome process for selecting which solution to use. Also, despite the two solutions (assuming equal spread and not assuming equal spread) being different and having different formulas in part, the bottom-line conclusion one draws (e.g., that men drink significantly more frequently than do women) often is the same under both solutions. If anything, the preferred (not assuming equal spread) solution is a little more conservative; in other words, it makes it a little harder to obtain a significant difference between means than does the equal-spread solution. As a result, our findings will have to be a little stronger for us to claim significance, which is not a bad thing.

I've also created a graphic to interpret the SPSS output of a paired t-test.


I have also taken a screenshot from this University of Georgia webpage, which gives the formula for a paired t-test. The main focus is, of course, comparing the means of two variables, but as shown below, I have highlighted where the correlation r between the two variables enters the formula.


Reference

Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in psychology and education (3rd ed.). Needham Heights, MA: Allyn & Bacon.