Dr. Alan Reifman's Intro Stats Page: Intro to z-Scores

Wednesday, September 26, 2007

Intro to z-Scores

(Updated July 17, 2013)

We'll next be moving on to standardized (or z) scores and how they relate to the normal/bell curve and percentiles.

For any given body of data, each individual participant can be assigned a z-score on any variable. A z-score is calculated as:

Individual's Raw Score on a Variable - Sample Mean on that Variable

------------------------------------------------------------------------------------

Sample Standard Deviation on that Variable

This website has a good overview of z-scores.

If we wanted to compare the performances of two or more individuals on some task, the ideal way, of course, would be to administer the same measures, under the same conditions, to everyone, and see who scores highest. Sometimes, however, it's clear that the people you're trying to compare have not been assessed under identical conditions.

As one example, a university may have a large, amphitheatre-type lecture class of 400 students, with each student also attending a TA-led discussion section of 25 students. There are eight TA's, each of whom leads two sections. Overall course grades may be based 80% on uniform in-class exams taken at the same time by all the students, and 20% on section performance (mini-paper assignments and spoken participation). The kicker is that, for the 20% of the grade that comes from the sections, different students have different TA's, who can differ in the toughness or easiness of their grading. To account for differences in TA difficulty on the 20% of the course grade that comes from discussion sections, we could compute z-scores for the section grades.

Another example, which comes from an actual published study, involves comparing the home-run prowess of sluggers from different eras. As those of you who are big baseball fans will know, Babe Ruth held the single-season home-run record for many years, with the 60 he hit in 1927. Roger Maris then came along with 61 in 1961, and that's where things stood for another few decades. Within the last decade, we then saw Mark McGwire hit 70 in 1998 and Barry Bonds belt 73 in 2001.

Given the many differences between the 1920s and now, is Bonds's 2001 season really the most impressive? The initial decades of the 1900s were known as the Dead-ball Era, due to the rarity of home runs. In contrast, the last several years have been dubbed "The Live-ball Era," "The Goofy-ball Era," "The Juiced-ball (or Juiced-player) Era," "The Steroid Era," etc. It's not just steroids that are suspected of inflating the home-run totals of contemporary batters; smaller stadiums, more emphasis on weightlifting, and league expansion (which necessitates greater use of inexperienced pitchers) have also been suggested as contributing factors.

A few years ago, a student named Kyle Bang (a great name for analyzing home runs) published an article in SABR's Baseball Research Journal, applying z-scores to the problem (to access the article, click here, then click on Volume 32 -- 2003). Wrote Bang:

The z-score is best considered a measure of domination, since it only determines how well a hitter performed with respect to his contemporaries within the same season (p. 58).

In other words, for any given season, the home-run leader's total could be compared to the mean throughout baseball for the same season, with the difference divided by the standard deviation for the same season. A player with a high season-specific z score thus would have done well relative to all the other players that same season. (Bang actually used homer per at-bat as each player's input, so that, for example, a player's home-run performance would not be penalized if the player missed games due to injury.)

Under this method, Ruth has had the six best individual seasons in Major League Baseball history. The overall No. 1 season was Ruth's in 1920, where his z-score was 7.97 (i.e., Ruth's homer output in 1920 minus the major-league mean in 1920, and dividing the difference by the 1920 SD, equalled 7.97). Ruth's 1927 season, where his absolute number of homers established the record of 60, produced a z-score of 5.57, good for fourth on the list.

Bonds's 2001 season of 73 homers produced a z-score of 5.14, good for seventh on the list, and McGwire's 1998 season of 70 homers produced a z of 4.69, for eighth on the list.

Again, the point is that Ruth exceeded his contemporaries to a greater degree than Bonds and McGwire did theirs. The aforementioned factors that may have been inflating home-run totals in the 1990s and 2000s (steroids, small ballparks, etc.) would thus have helped Bonds and McGwire's contemporaries also hit lots of homers, thus raising the yearly means and weakening Bonds and McGwire's season-specific z-scores (although their z's were still pretty far out on the normal distribution).

As with any method, the z-score approach has its limitations. Among them, notes Bang, is that just by chance, some eras may have a lot of great hitters coming up at the same time, which raises the mean and weakens the top players' z-scores.

How would this logic apply to the example of the discussion sections of a large lecture class? Each TA could convert his or her students' orignal grades for section performance to z-scores, relative to that TA's grading mean and SD. If a particular TA were an easy grader, that TA's mean would be high and thus the top students would get their z-adjusted grade knocked down a bit. Conversely, a hard-grading TA would have a lower mean for his or her students, thus allowing them to get their grades bumped up a bit in the z-conversion.

Also, because z-scores have a mean of 0 and an SD of 1, and the section component would be counting 20% of students' overall course grades, the z-converted scores would have to be renormed. To get the students' section grades to top out at (roughly) 20, perhaps they could be converted to a system with a mean of 16 and an SD of 2.

The example of the discussion section grades is based on a true story. While in graduate school at the University of Michigan, I was a TA for a huge lecture class, and I suggested a z-based renorming of students' section grades. The professor turned the idea down, citing the increased complexity and grading time that would be involved.

I really believe the z-score approach gives you a lot of "bang" for your buck, but not everyone may agree.