Wednesday, November 28, 2007

Practical Issues in Power Analysis

Below, I've added a new chart, based on things we discussed in class. William Trochim's Research Methods Knowledge Base, in discussing statistical power, sample size, effect size, and significance level, notes that, "Given values for any three of these components, it is possible to compute the value of the fourth." The table I've created attempts to convey this fact in graphical form.


You'll notice the (*) notation by "S, M, L" in the chart. Those, of course, stand for small, medium, and large effect sizes. As we discussed in class, Jacob Cohen developed criteria for what magnitude of result constitutes small, medium, and large for correlational studies and those studies comparing means of two groups (t-test type studies, but t itself is not an indicator of effect size).

When planning a new study, naturally you cannot know what your effect size will be ahead of time. However, based on your reading of the research literature in your area of study, you should be able to get an idea of whether findings have tended to be small, medium, or large, which you can convert to the relevant values for r or Cohen's d. These, in turn, can be submitted to power-analysis computer programs and online calculators.

I try to err on the side of expecting a small effect size. This will have the effect of requiring me to obtain a large sample size, to be able to detect a small effect, which seems like good practice, anyway.

UPDATE 1: Westfall and Henning (2013) argue that post hoc power analysis, which is what the pink column depicts in the above table, is "useless and counterproductive" (p. 508).

UPDATE 2: Lakens (2022, full text) provides extensive practical advice on sample-size determination and power analysis, including the option of "sensitivity power analysis" when one's sample size is already fixed.

Tuesday, November 27, 2007

Illustration of a "Miss" in Hypothesis Testing (and Relevance for Power Analysis)

My previous blog notes on this topic are pretty extensive, so I'll just add a few more pieces of information (including a couple of songs).

As we've discussed, the conceptual framework underlying statistical power involves two different kinds of errors: rejecting the null (thus claiming a significant result) when the null hypothesis is really true in the population (known as a "false alarm"); and failing to reject the null when the true population correlation (rho) is actually different from zero (known as a "miss"). The latter is illustrated below:



And here are my two new power-related songs...

Everything’s Coming Up Asterisks
Lyrics by Alan Reifman (updated 11/18/2014)
(May be sung to the tune of “Everything’s Coming Up Roses,” from Gypsy, Styne/Sondheim)

We've got a scheme, to find p-values, baby.
Something you can use, baby.
But, is it a ruse? Maybe...

State the null! (SLOWLY),
Run the test!
See if H-oh should be, put to rest,

If p’s less, than oh-five,
Then H-oh cannot be, kept alive (SLOWLY),

With small n,
There’s a catch,
There could be findings, you will not snatch,

There’s a chance, you could miss,
Rejecting the null hypothesis… (SLOWLY),

(Bridge)
H-oh testing, how it’s always been done,
Some resisting, will anyone be desisting?

With large n,
You will find,
A problem, of the opposite kind,

Nearly all, you present,
Will be sig-nif-i-cant,

You must start to look more at effect size (SLOWLY),
’Cause, everything’s coming up asterisks, oh-one and oh-five! (SLOWLY)

Believe it or not, the above song has been cited in the social-scientific literature:




Find the Power
Lyrics by Alan Reifman
(May be sung or rapped to the tune of “Fight the Power,” Chuck D/Sadler/Shocklee/Shocklee, for Public Enemy)

People do their studies, without consideration,
If H-oh, can receive obliteration,
Is your sample large enough?
To find interesting stuff,

P-level and tails, for when H-oh fails,
Point-eight-oh’s the way to go,
That’s the kind of power,
That you’ve got to show,

Look into your mind,
For the effect, you think you’ll find,

Got to put this all together,
Got to build your study right,
Got to give yourself enough, statistical might,

You’ve got to get the sample you need,
Got to learn the way,
Find the power!

Find the power!
Get the sample you need!

Find the power!
Get the sample you need!

Monday, November 12, 2007

Non-Parametric/Assumption-Free Statistics

(Updated November 2, 2024)

This week, we'll be covering non-parametric (or assumption-free) statistical tests (brief overview). Parametric techniques, which include the correlation r and the t-test, refer to the use of sample statistics to estimate population parameters (e.g., rho, mu).* Thus far, we've come across a number of assumptions that technically are required to be met for doing parametric analyses, although in practice there's some leeway in meeting the assumptions.

Assumptions for parametric analyses are as follows (for further information, see here):

o Data for a given variable are normally distributed in the population.

o Equal-interval measurement.

o Random sampling is used.

o Homogeneity of variance between groups (for t-test).

One would generally opt for a non-parametric test when there's violation of one or more of the above assumptions and sample size is small. According to King, Rosopa, and Minium (2011), "...the problem of violation of assumptions is of great concern when sample size is small (< 25)" (p. 382). In other words, if assumptions are violated but sample size is large, you still may be able to use parametric techniques (example). The reason is something called the Central Limit Theory. I've annotated the following screenshot from Sabina's Stats Corner to show what the CLT does.


It is important first to distinguish between a frequency plot of raw data (which appears in the top row of Sabina's diagram) and something else known as a sampling distribution. A sampling distribution is what you get when you draw repeated random samples from a full population of individual persons (sometimes known as a "parent" population) and plot the means of all the samples you have drawn. Under the CLT, a frequency plot of these means (the sampling distribution) tends toward normality regardless of the parent shape. Further, the sampling distribution increasingly resembles a bell-curve shape as the size of the multiple samples increases. In essence, what the CLT does for us is get us back to normal distributions when the original parent population distribution is non-normal.

We'll be doing a neat demonstration with dice that conveys the role of large samples in salvaging data from the normal-distribution assumption, under the CLT.

Now that we've established that non-parametric statistics typically are used when one or more assumptions of parametrics statistics are violated and sample size is small, we can ask: What are some actual non-parametric statistical techniques?

To a large extent, the different non-parametric techniques represent analogues to parametric techniques. For example, the non-parametric Mann-Whitney U test (song below) is analogous to the parametric t-test, when comparing data from two independent groups, and the non-parametric Wilcoxon signed-ranks test is analogous to a repeated-measures t-test. This PowerPoint slideshow demonstrates these two non-parametric techniques corresponding to t-tests. As you'll see, non-parametric statistics operate on ranks (e.g., who has the highest score, the second highest, etc.) rather than original scores, which may have outliers or other problems.

The parametric Pearson correlation has the non-parametric analogue of a Spearman rank-order correlation. Let's work out an example involving the 2024-25 Miami Heat, a National Basketball Association team suggested by one of the students. The roster of one team gives us a small sample size of players, on whom we will correlate their annual salary with their career performance on a metric called Win Shares (i.e., how many wins are attributed to each player based on his points scored, rebounds, assists, etc.). Here are our data (as of November 1, 2024):


These data clearly have some outliers. Jimmy Butler is the highest-paid Heat player at $48.8 million per year and he is credited statistically with personally contributing 115.5 wins to his teams in his 14-year career. Other, younger players have smaller salaries (although still huge in layperson terms) and have had many fewer wins attributed to them. The ordinary Pearson correlation and Spearman rank-order correlations are shown at the bottom of this posting,** if you'd like to calculate them in suspense. Which do you think would be larger and why?

Finally, we'll close with our song...

Mann-Whitney U
Lyrics by Alan Reifman
(May be sung to the tune of “Suzie Q.,” Hawkins/Lewis, covered by John Fogerty)

Mann-Whitney U,
When your groups are two,
If your scaling’s suspect, and your cases are few,
Mann-Whitney U,

The cases are laid out,
Converted to rank scores,
You then add these up, done within each group,
Mann-Whitney U,

(Instrumental)

There is a formula,
That uses the summed ranks,
A distribution’s what you, compare the answer to,
Mann-Whitney U

---
*According to King, Rosopa, and Minium (2011), "Many people call chi-square a nonparametric test, but it does in fact assume the central limit theorem..." (p. 382).

**The Pearson correlation is r = .57 (p = .03), whereas the Spearman correlation is rs = .40 (nonsignificant due to the small sample size).