Monday, November 25, 2024

Fall 2024 HDFS 5349 Quantitative Methods I

Welcome to QM I, the department's introductory statistics class. I'm a bit unusual in using a blog to organize class materials, but I think it's worked well over the years. Also, for those of you who haven't been in school here, welcome to Texas Tech! You'll be visiting this welcoming page a lot, as it contains the links for our lecture notes.

I'll do my best to provide a lot of practical, real-world exercises in analyzing data, and I'll try to keep things fun. This passage from a book I read several years ago, Coincidences, Chaos, and All That Math Jazz, by Edward B. Burger and Michael Starbird, provides a concise overview of what statistics can offer:

Statistics can help us understand the world. It is a powerful and effective tool for placing economic, social welfare, sports, and health issues into perspective. It molds data into digestible morsels and shows us a measured way to look at situations that have either random or unknown features. But we must use common sense when applying statistics or other tools that draw on our experience of the world to shape data into meaningful conclusions (p. 60).

In addition, the following article sets forth some goals for what you should learn in this class (and other classes). We can access this article via the Texas Tech Library website or Google Scholar.

Utts, J. (2003). What educated citizens should know about statistics and probability. American Statistician, 57(2), 74-79.

LECTURE NOTES (asterisked [*] pages are from my undergraduate research-methods class).

Units of analysis*

Sampling*

Types of Measures*

Visual depictions of a data distribution:

  • Histograms (overview). Your textbook authors King and colleagues (2018) offer some advice on interval widths and the appearance of histograms, noting that "it is customary to make the height of the distribution about  three-quarters of the width" (pp. 32-33). To adjust the width of the columns, click on histogram in your output, then go to "Options," "Un-bin Element," and "Bin Element," then in small square in upper-right of screen, under "X Axis," select "Custom," and then enter desired interval width.
  • Frequency tables contain similar information to histograms. The cumulative percentages also are roughly similar to percentiles (for a given score, you can see what percent of the sample falls below it).
  • Shapes of distributions
  • As a class exercise, we will attempt to reproduce via SPSS this histogram of U.S. Presidents' ages upon assuming office (note that Grover Cleveland, who served two non-consecutive terms is counted as being "two presidents," the 22nd and 24th)

Descriptive statistics:* Central tendency (mean, median, and mode) and spread (standard deviation); moments of a distribution; and z-scores (hereherehere, and here)

Probability (here and here)

Correlation and significance-testing

t-tests

Chi-square

Non-parametric statistics

Statistical power

Confidence intervals

Big data

Writing Up Statistical Results in APA Style

Sunday, November 24, 2024

Big Data

(Cross-posted and modified from several years ago at Reifman Multivariate Blog

The amount of capturable data people generate is truly mind-boggling, from commerce, health care, law enforcement, sports, and other domains. For example, according to one article, "Walmart controls more than 1 million customer transactions every hour..." The study of Big Data (also known as "Data Mining") applies statistical techniques such as correlation to discern patterns in the data and make predictions. Let's start with a brief video, listing numerous examples. Overview articles are available in the Harvard Business Review and on the Wikipedia (plus many other places I'm sure you can find via a Google search). A critique of the approach is available here

Perhaps the most common use of Big Data is in the business world. A good entry point in this area is the book Supercrunchers by Ian Ayres. Probably the best-known story about Big Data is "How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did." What Target's statisticians did, essentially, was look for associations of whether customers' names were on the baby registry and whether they made greater purchases of certain products than the average person. Another potential business usage could come from this data-visualization of taxi and Uber trips in New York City (compare to layout of the city).

In the sports domain, I recommend three books pertaining to baseball: Moneyball (by Michael Lewis) on the Oakland Athletics; The Extra 2% (by Jonah Keri) on the Tampa Bay Rays; and Big Data Baseball (by Travis Sawchik) on the Pittsburgh Pirates. These three teams play in relatively small cities by Major League Baseball standards and thus bring in less local-television revenue than teams in bigger cities such as New York and Los Angeles. Therefore, teams such as Oakland, Tampa Bay, and Pittsburgh must use statistical techniques to (a) discover "deceptively good" players, whose skills are not well-known to other teams and who can thus be paid non-exorbitant salaries, and (b) identify effective strategies that other teams aren't using (yet). The Pirates' signature strategy, now copied by other teams, is the defensive shift, using statistical "spray charts" unique to each batter. (Major League Baseball later passed rules to limit shifting.) 

One final book I would recommend is The Victory Lab by Sasha Issenberg on political campaigns, namely how candidates try to convince people to vote for them and get their supporters to actually vote. 

To end the course, here's one final song...

Big Data 
Lyrics by Alan Reifman 
May be sung to the tune of “Green Earrings” (Fagen/Becker for Steely Dan) 

Sales, patterns, 
Companies try, 
Running equations, 
To predict, what we’ll buy, 

Big data, 
Lots of numbers, 
Floating in, the cloud, 
For computers, 
To analyze, now, 
We know, how, 

Sports, owners, 
Wanting to win, 
Seeking, advantage, 
In the numbers, it’s no sin, 

Big data, 
Lots of numbers, 
Floating in, the cloud, 
For computers, 
To analyze, now, 
We know, how 

Instrumentals/solos 

Big data, 
Lots of numbers, 
Floating in, the cloud, 
For computers, 
To analyze, now, 
We know, how 

Instrumentals/solos