Wednesday, September 13, 2006

Normal and Non-Normal (J-Shaped) Distributions

(Updated July 10, 2015)

We're going to move beyond simple histograms and focus more intensively on types of distributions. These two web documents describe types of distributions (here and here).

Much of statistics is based on the idea of a normal "bell-shaped" curve, where most participants have scores in the middle of the distribution (on whatever variable is being studied) and progressively fewer have scores as you move to the extremes (high and low) of the distribution. Variables such as self-identified political ideology in the U.S. and people's height have at least a roughly normal distribution. However, data sets only rarely seem to follow the normal curve. The following article provides evidence that many human characteristics do not follow a normal bell-shaped curve:

O'Boyle, E., & Aguinis, H. (2012). The best and the rest: Revisiting the norm of normality of individual performance. Personnel Psychology, 65, 79-119. (Accessible on TTU computers at this link; see especially Figure 2)

In recent years, however, there has been a lot of interest in "J-shaped" curves. Such curves, which are also known as "power-law," "scale-free," and Pareto distributions, represent the situation where most people have relatively low scores, a few people have moderately high scores, and a very few have extremely high scores. The term "power law" comes from the fact that such distributions follow an equation such as [y = 1/(x-squared)]; we can plot y when x equals 1, 2, 3, etc.         

Distributions in surveys of respondents' number of sexual partners (see Figure 2 of linked document) illustrate what we mean by "scale free." The participants in the linked study illustrate a J-shaped curve by themselves, but if you add in Gaetan Dugas, labeled as "Patient Zero" in the AIDS epidemic due to his reported 2,500 sex partners, the curve becomes extremely elongated to the right-hand side. With most people appearing to have 10 or fewer lifetime sexual partners, thus warranting increments of 1 on the X-axis, 2,500 is clearly "off the scale."

Other examples of J-shaped curves include:
  • Wealth distributions. In the United States, "the top 1% of households (the upper class) owned 34.6% of all privately held wealth" (link). Thus, at the extreme high end of the X-axis, where the high monetary values would be located, the frequency of occurrence (Y-axis) would be very small (1%).
  • Alcohol consumption. See here.
  • Number of lifetime romantic partners. See here.
An example of a bimodal distribution is shown in Table 2 of:
  • James, J. D., Breezeel, G. S., & Ross, S. D. (2001). A two-stage study of the reasons to begin and continue tailgating. Sport Marketing Quarterly, 10, 212-222.
Most of the statistical techniques we will learn in this course assume normal distributions. As far as dealing with non-normal data, O'Boyle and Aguinis (2012) suggest the following:

Based on the problems of applying Gaussian [normal] techniques to [a] Paretian distribution, our first recommendation for researchers examining individual performance is to test for normality. Paretian distributions will often appear highly skewed and leptokurtic.

(We will learn about skew and kurtosis [where the term "leptokurtic" comes from] in an upcoming lecture.)