Section 6.3: The t Distribution

Key Concepts

When the population standard deviation is unknown, it is reasonable to use the sample standard deviation s in place of the unknown in the confidence interval formula. However, the standard error is now estimated instead of known. To compensate for this additional uncertainty, we need a larger multiplier.

When samples are small, using a larger multiplier is crucial, since confidence intervals can have probabilities of actually containing the mean that are far below the reported confidence levels.

When samples are large, there is very little difference between the multiplier from a t distribution and the multiplier from a normal distribution.

An example demonstrates the use of the t distribution.

Properties of t Distributions

There are many t distributions, and unlike normal distribution, there is no single standard with which all can be compared. The different t distributions are labeled by their degress of freedom. In these problems, the number of degrees of freedom is one less than the sample size.

All of the t distributions are centered at 0 and bell-shaped, much like the standard normal curve. However, they are spread out farther. When the number of degrees of freedom is small, this spread is noticeable. When the number of degrees of freedom is relatively large (more than 30, or so) there is very little practical difference between the t distribution and the standard normal distribution.

Confidence Intervals

Constructing a confidence interval when the population standard deviation is unknown involves simply plugging into a slightly different equation. Use the sample standard deviation s instead of the population standard deviation, and choose a multiplier from the t distribution with n-1 degrees of freedom. The multipliers are found on page 690 of your textbook. Note that the table gives areas all the way to the left, while you need to find a value with the appropriate area in the middle. For example, the mulitpliers for 95% confidence intervals will be in the column headed by t.975, since the area between -t.975 and t.975 is .95.

The formula to use is .

This formula is appropriate to use whenever the standard error is estimated and the underlying population is roughly symmetric and mound shaped. It need not be perfectly normal. The t distribution should never be used for proportions, since the sample data in proportions are all 0's and 1's, which looks nothing like a symmetric mound. For larger samples, the need to look approximately normal is not great. The central limit theorem is taking over, and the shape of the sampling distribution will be approximately normal, which is all that the theory relies on.

Example

14 infants were randomly sampled from recent births in Boston, and the mean and standard deviations of their birth weights were 114.0 and 18.4 ounces respectively. Examination of the data shows that it is roughly symmetrical and has no obvious outliers or marked skewness. Give a 95% confidence interval for the population mean.

The shape of the sample indicates that that the shape of the sampling distribution of is sufficiently normal to not worry about it.

We do not know the standard error for exactly, so we must estimate it by 18.4 / sqrt( 14 ) = 4.92.

Since we are estimating the SE, we should use the t distribution instead of the normal. There are 14 - 1 = 13 degrees of freedom. The appropriate t* is 2.1604, slightly larger than the 1.96 we've become accustomed to using. Plugging into the formula gives an answer 114.0 +/- 10.6. This can interpreted as "we are 95% confident that the unknown mean weight of all newborn infants in Boston is between 103.4 and 124.6 ounces".