### Math 225 Course Notes

### Section 8.2: ANOVA

ANOVA can be used to test the hypothesis
that all population means are the same
for more than two populations.
The data from the samples are summarized in an
ANOVA table.
The final value in the table will have an F distribution
if the null hypothesis is true.
Comparing this value to an F distribution
(with the appropriate numbers of degrees of freedom)
is called an
F Test.
This is examined in an example.

An ANOVA table is a helpful tool for placing all of the data
from all samples into a large complicated expression,
resulting in a single test statistic.
The expected value of this test statistic is a little more than 1,
if the null hypothesis is true.
(If d is the df in the denominator,
the expected value is d/(d-2).)
If this test statistic is substantially larger than 1,
there is strong evidence that the null hypothesis is incorrect.

Source SS DF MS F p-value
==================================================================
Among Samples SSA k-1 SSA/(k-1) MSA/MSW
Within Samples SSW N-k SSW/(N-k)
==================================================================
Total SST N-1

k is the number of populations.
N is the total number of observations in all samples.

The formula for SSA and SSW are in the textbook.
A simpler formula for SSW, depending on the sample standard deviations, is

SSW = (n_{1}-1)s_{1}^{2} + ... + (n_{k}-1)s_{k}^{2}

If s is the sample standard deviation of all N observations,
then
SST = (N-1)s^{2}

It is also true that SST = SSA + SSW.
Finding these sums of squares
is the computationally tedious part of the computation.
The remainder of the computations are straightforward.

There are many different F distributions.
They are labeled by two numbers;
numerator degrees of freedom and denominator degrees of freedom.
These two degrees of freedom come from the ANOVA table in the df column.
The numerator degrees of freedom is k-1, one less than the number of samples.
The denominator degrees of freedom is the sum of the degrees of freedom
from each sample, or equivalently, N-k, the total number of observations
minus the number of samples.
The F statistic from the ANOVA table is compared to an F distribution
with these degrees of freedom.
The p-value is the area to the right of this F statistic.
This p-value is interpreted like any other p-value.
It is the probability of observing a result at least as extreme as
the actual test statistic, assuming the null hypothesis is true.
Low p-values are indications of strong evidence
against the null hypothesis.

A researcher would like to confirm his belief that the mean
number of homeruns major league players hit in a season
differ by position.
He samples seven firstbasemen, six shortstops, and seven outfielders
who played full time in 1995.
Their homerun totals were
Firstbasemen | 31, 25, 21, 23, 9, 18, 16
Shortstops | 13, 1, 2, 14, 5, 2
Outfielders | 14, 5, 11, 23, 24, 36, 18

Test the hypothesis that mean homerun totals are the same.
The ANOVA table is:

Source SS DF MS F p-value
==================================================================
Among Samples 764.974 2 382.4869 6.009 0.0106
Within Samples 1081.976 17 63.6457
==================================================================
Total 1846.950 19

From the limited tables in the textbook, we may conclude that
the p-value is between .01 and .025 since the F statistic
is between 4.62 and 6.11.
**Example 2:**

You are given a partial ANOVA table
for a problem in which there are three samples
of size 5, 3, and 3.

Source SS DF MS F p-value
==================================================================
Among Samples 20
Within Samples
==================================================================
Total 50

This is completed as follows:
SSW = 50 - 20 = 30
dfA = 3 - 1 = 2
dfW = 11 - 3 = 8
MSA = 20 / 2 = 10
MSW = 30 / 8 = 3.75
F = 10 / 3.75 = 2.667
p = area to right of 2.667 under F dist with 2 and 8 df
= 0.1296

The tables in the book would allow us to conclude that the p-value
is more than .10.

Last modified: April 16, 1996

Bret Larget,
larget@mathcs.duq.edu