### Math 225 Course Notes

### Section 6.3: The t Distribution

When the population standard deviation is unknown,
it is reasonable to use the sample standard deviation s
in place of the unknown
in the
confidence interval formula.
However,
the standard error is now *estimated*
instead of known.
To compensate for this additional uncertainty,
we need a larger multiplier.
When samples are small,
using a larger multiplier is crucial,
since confidence intervals can have probabilities
of actually containing the mean
that are far below the reported confidence levels.

When samples are large,
there is very little difference between the multiplier
from a
t distribution
and the multiplier from a normal distribution.

An example demonstrates the use of the t distribution.

There are many t distributions,
and unlike normal distribution, there is no single
standard with which all can be compared.
The different t distributions are labeled by their
*degress of freedom*.
In these problems,
the number of degrees of freedom is one less than the sample size.
All of the t distributions are centered at 0
and bell-shaped, much like the standard normal curve.
However, they are spread out farther.
When the number of degrees of freedom is small,
this spread is noticeable.
When the number of degrees of freedom is relatively large
(more than 30, or so)
there is very little practical difference between the t distribution
and the standard normal distribution.

Constructing a confidence interval when the population standard deviation
is unknown
involves simply plugging into a slightly different equation.
Use the sample standard deviation s instead of
the population standard deviation,
and choose a multiplier from the t distribution
with n-1 degrees of freedom.
The multipliers are found on page 690 of your textbook.
Note that the table gives areas all the way to the left,
while you need to find a value with the appropriate area in the middle.
For example, the mulitpliers for 95% confidence intervals
will be in the column headed by t_{.975},
since the area between
-t_{.975} and t_{.975} is .95.
The formula to use is
.

This formula is appropriate to use whenever the standard error is estimated
and the underlying population is roughly symmetric and mound shaped.
It need not be perfectly normal.
*The t distribution should never be used for proportions*,
since the sample data in proportions are all 0's and 1's,
which looks nothing like a symmetric mound.
For larger samples,
the need to look approximately normal is not great.
The central limit theorem is taking over,
and the shape of the sampling distribution
will be approximately normal,
which is all that the theory relies on.

14 infants were randomly sampled from recent births in Boston,
and the mean and standard deviations
of their birth weights were 114.0 and 18.4 ounces respectively.
Examination of the data shows that it is roughly symmetrical
and has no obvious outliers or marked skewness.
Give a 95% confidence interval for the population mean.
The shape of the sample indicates that
that the shape of the sampling distribution of
is sufficiently normal
to not worry about it.

We do not know the standard error for
exactly,
so we must estimate it by 18.4 / sqrt( 14 ) = 4.92.

Since we are estimating the SE, we should use the t distribution
instead of the normal.
There are 14 - 1 = 13 degrees of freedom.
The appropriate t* is 2.1604,
slightly larger than the 1.96 we've become accustomed to using.
Plugging into the formula
gives an answer 114.0 +/- 10.6.
This can interpreted as
"we are 95% confident that the unknown mean weight of all newborn infants
in Boston is between 103.4 and 124.6 ounces".

Last modified: Feb 26, 1996

Bret Larget,
larget@mathcs.duq.edu