Saturday, September 30, 2006

Probability -- Intro Lecture

(Updated October 1, 2013)

This week, we'll be learning about probability. Probability is an important foundation for statistical analysis as, when you obtain a finding in your research (e.g., a higher percentage of couples show improvement after receiving a novel, experimental marital therapy than do control-group couples who received an older form of therapy), you need to know how likely it is this result came up purely by chance.

Although some topics can get fairly complex, the core idea to keep in mind regarding probability is how many possible ways can something turn out. A coin can come up heads or tails, thus the probability of a head is 1/2 (same for a tail).

A website called "Math Goodies" has some excellent lecture notes on probability. In this class, we'll want to look at the units from this site on "Introduction to Probability," "Sample Spaces," "Addition Rules for Probability," and "Independent Events."

We're mainly interested in two principles known as the multiplication/and rule (which requires the events to be independent) and the addition/or rule (which requires the events to be mutually exclusive). Here is a description of non-mutually exclusive events and how to adjust the addition/or procedure accordingly.


A tool that has many useful applications in statistics and probability is the "n choose k" formulation (see the links section on the right for an online calculator). I alluded to this approach in talking about how the 2005-06 St. Louis University men's basketball team had a perfect alternation of wins and losses through its first 19 games (WLWLWLWLWLWLWLWLWLW). As discussed on my Hot Hand blog, I framed the question in the form of: How many possible different ways are there to distribute 10 items (wins) in 19 boxes (games)?

To simplify things, let's imagine that a sorority needs two of its members to oversee an event, but four members volunteer (Cammy, Pammy, Sammy, and Tammy). How many possible duos can be chosen from four members? This would be stated as "4 choose 2" (there would be a 4 on top of a 2 in parentheses, but this is not a fraction). Keeping in mind that n = 4 and k = 2, the following description from Ken Ross (p. 43) may be helpful:

...the denominator is the product of k numbers from k down to 1...

...the numerator is the product of numbers starting with n and going down until you also have k numbers in the numerator...

Mathematically, we would take:

4 X 3

2 X 1

which equals 6. The possible duos are thus:

Cammy with Pammy
Cammy with Sammy
Cammy with Tammy
Pammy with Sammy
Pammy with Tammy
Sammy with Tammy


Finally, let's look at how binomial distributions (e.g., coin-tossing) lead up to the normal, bell-shaped curve (here and here).

Coin-tossing is an example of a Bernoulli process. Westfall and Henning's book shows how to simulate coin-tossing in Excel. Here are the specifications if anyone wants to try this at home:

You need to download Excel's “Analysis ToolPak” add-in. Then, access the Random Number Generator through the tabs "Data" and "Data Analysis."  Number of variables is 10 and the number of random numbers is 1000. The distribution to select is "Bernoulli," with p value = 0.5. The output range should be:  $A$1. In column K of the first line of data (after the data are generated): =SUM(A1:J1). How to drag down to get sum for all rows is explained here. Finally, making histograms in Excel is explained here.