# Math 225

## Introduction to Biostatistics

### Notes from Lecture #15

1. #### Confidence Intervals for a Mean

A Motivating Problem. Each year since 1985, there have been 64 teams in the mens NCAA basketball tournament. (In some years, there have been play-in games where a few teams played an extra game to make the field of 64 teams.) These 64 teams are split into four regionals of 16 teams each. The sixteen teams are seeded from 1 (best) to 16 (worst) and paired up for first round games with a #1 seed playing a #16 seed, a #2 seed playing a #15 seed, and so on. There are a toal of 32 first round games, and each year there are upsets with a lower seeded team (with a higher seed number) winning. It is a national pasttime to predict the entire tournament before it begins. A question is, when filling out your NCAA tournament brackets, how many upsets should you predict in the first round?

2. A model. We want to base our inference on past data and estimate a mean with confidence. In our standard model, we represent a population as a bucket filled with numbered balls. We could say that we have observed the entire population of 16 tournaments, and there would be no need for inference because we sampled the entire population. Instead, we will think of a tournament as being some sort of random process with the outcomes each year being determined at random from the same process. With this model, we can use the inference techniques of the course.

3. Data. Here are the number of first round upsets in each tournament since 1985.

 1985 7 <1993> 6 1986 6 <1994> 9 1987 9 <1995> 8 1988 5 <1996> 9 1989 12 <1997> 7 1990 7 <1998> 9 1991 9 <1999> 12 1992 8 <2000> 2

There is no strong skewness or extreme outliers, although two upsets in 2000 was unusually low. The mean and standard deviation of this sample are x-bar = 7.8 and s = 2.5.

4. Confidence Interval. In this problem, the population mean mu represents the mean number of first round upsets of the hypothetical random process which produces NCAA tournaments. The general form for a confidence interval is

(estimate) ± (multiplier)(standard error)

A confidence interval for a population mean mu takes the form

x-bar ± t* s/sqrt(n)

where x-bar is the sample mean, n is the sample size, t* is the value such that the area between -t* and t* under a t distribution with n-1 degrees of freedom is the confidence level, and s is the sample standard deviation.

Plugging in numbers, we are 95% confident that mu is in the interval:

7.8 ± (2.131)(2.5)/sqrt(16)

or

7.8 ± 1.3

In the context of the problem, I am 95% confident that the process that produces NCAA tournaments has an average number of first round upsets between 6.5 and 9.1.