### Section 8.2: ANOVA

#### Key Concepts

ANOVA can be used to test the hypothesis that all population means are the same for more than two populations. The data from the samples are summarized in an ANOVA table. The final value in the table will have an F distribution if the null hypothesis is true. Comparing this value to an F distribution (with the appropriate numbers of degrees of freedom) is called an F Test. This is examined in an example.

#### ANOVA Table

An ANOVA table is a helpful tool for placing all of the data from all samples into a large complicated expression, resulting in a single test statistic. The expected value of this test statistic is a little more than 1, if the null hypothesis is true. (If d is the df in the denominator, the expected value is d/(d-2).) If this test statistic is substantially larger than 1, there is strong evidence that the null hypothesis is incorrect.

Source        SS        DF        MS          F        p-value
==================================================================
Among Samples   SSA       k-1     SSA/(k-1)     MSA/MSW
Within Samples  SSW       N-k     SSW/(N-k)
==================================================================
Total           SST       N-1
k is the number of populations.

N is the total number of observations in all samples.

The formula for SSA and SSW are in the textbook. A simpler formula for SSW, depending on the sample standard deviations, is

SSW = (n1-1)s12 + ... + (nk-1)sk2
If s is the sample standard deviation of all N observations, then
SST = (N-1)s2
It is also true that SST = SSA + SSW.

Finding these sums of squares is the computationally tedious part of the computation. The remainder of the computations are straightforward.

#### The F Test

There are many different F distributions. They are labeled by two numbers; numerator degrees of freedom and denominator degrees of freedom. These two degrees of freedom come from the ANOVA table in the df column. The numerator degrees of freedom is k-1, one less than the number of samples. The denominator degrees of freedom is the sum of the degrees of freedom from each sample, or equivalently, N-k, the total number of observations minus the number of samples.

The F statistic from the ANOVA table is compared to an F distribution with these degrees of freedom. The p-value is the area to the right of this F statistic. This p-value is interpreted like any other p-value. It is the probability of observing a result at least as extreme as the actual test statistic, assuming the null hypothesis is true. Low p-values are indications of strong evidence against the null hypothesis.

#### Example

A researcher would like to confirm his belief that the mean number of homeruns major league players hit in a season differ by position. He samples seven firstbasemen, six shortstops, and seven outfielders who played full time in 1995. Their homerun totals were
Firstbasemen | 31, 25, 21, 23,  9, 18, 16
Shortstops   | 13,  1,  2, 14,  5,  2
Outfielders  | 14,  5, 11, 23, 24, 36, 18
Test the hypothesis that mean homerun totals are the same.

The ANOVA table is:

Source        SS        DF        MS          F        p-value
==================================================================
Among Samples   764.974    2     382.4869     6.009      0.0106
Within Samples 1081.976   17      63.6457
==================================================================
Total          1846.950   19
From the limited tables in the textbook, we may conclude that the p-value is between .01 and .025 since the F statistic is between 4.62 and 6.11.

Example 2:

You are given a partial ANOVA table for a problem in which there are three samples of size 5, 3, and 3.

Source        SS        DF        MS          F        p-value
==================================================================
Among Samples   20
Within Samples
==================================================================
Total           50
This is completed as follows:
SSW = 50 - 20 = 30
dfA = 3 - 1 = 2
dfW = 11 - 3 = 8
MSA = 20 / 2 = 10
MSW = 30 / 8 = 3.75
F   = 10 / 3.75 = 2.667
p   = area to right of 2.667 under F dist with 2 and 8 df
= 0.1296
The tables in the book would allow us to conclude that the p-value is more than .10.