## Math 225 - Introduction to Biostatistics

### Interpreting Analysis of Variance

At the end of this assignment, you should be able to:
1. interpret a one-way ANOVA table
2. interpret a multi-way ANOVA table
3. know the assumptions of ANOVA
This document contains background information and interpretation problems. You need not turn in any of these problems. Problems similar to these will likely be on the final examination.

#### More background on one-way ANOVA Tables

An ANOVA table is a helpful tool for placing all of the data from all samples into a large complicated expression, resulting in a single test statistic. The expected value of this test statistic is a little more than 1, if the null hypothesis is true. (If d is the df in the denominator, the expected value is d/(d-2).) If this test statistic is substantially larger than 1, there is strong evidence that the null hypothesis is incorrect.

```  Source        SS        DF        MS          F        p-value
===========================================================================
Between         SSb       k-1     SSb/(k-1)     MSb/MSw  area to right of F
Within          SSw       N-k     SSw/(N-k)
===========================================================================
Total           SSt       N-1
```
k is the number of populations.

N is the total number of observations in all samples.

The formula for SSb and SSw are in the textbook. A simpler formula for SSw, depending on the sample standard deviations, is

```  SSw = (n1-1)s12 + ... + (nk-1)sk2
```
If s is the sample standard deviation of all N observations, then
```  SSt = (N-1)s2
```
It is also true that SSt = SSb + SSw.

Finding these sums of squares is the computationally tedious part of the computation. The remainder of the computations are straightforward.

There are many different F distributions. They are labeled by two numbers; numerator degrees of freedom and denominator degrees of freedom. These two degrees of freedom come from the ANOVA table in the df column. The numerator degrees of freedom is k-1, one less than the number of samples. The denominator degrees of freedom is the sum of the degrees of freedom from each sample, or equivalently, N-k, the total number of observations minus the number of samples.

The F statistic from the ANOVA table is compared to an F distribution with these degrees of freedom. The p-value is the area to the right of this F statistic. This p-value is interpreted like any other p-value. It is the probability of observing a result at least as extreme as the actual test statistic, assuming the null hypothesis is true. Low p-values are indications of strong evidence against the null hypothesis.

### Assumptions of ANOVA

The p-value computed with an F test is calculated assuming that the null hypothesis is true, like all p-values. The F test also makes two additional assumptions.
1. The individual populations are all normally distributed.
2. The individual population standard deviations are all equal.
In practice these two assumptions are rarely completely true. However, the F test is still reasonable in many situations.

For the first assumption, it is really only important that the individual sample sizes are sufficiently large so that the sampling distribution of the sample mean is approximately normal in each case, as the central limit theorem implies. For small samples (say n < 10), outliers in the samples or extensive skewness may invalidate the F test. A side-by-side boxplot of the data grouped by the categories of the explanatory variable should show each sample is fairly symmetric. For larger samples, some skewness in the population is not a problem.

For the second assumption, as long as the population standard deviations are within a factor of 10 or so from one another, the lack of exact equality can be safely ignored.

### Problems

Problem 1:

You are given a partial ANOVA table for a problem in which there are three samples of size 5, 3, and 3.

```  Source        SS        DF        MS          F        p-value
==================================================================
Between         20
Within
==================================================================
Total           50
```

Problem 2:

Below is S-PLUS output from a launcher experiment in which the ball type was the only variable factor.

```          Df Sum of Sq  Mean Sq  F Value     Pr(F)
ball       2   183.267 91.63333 1.048481 0.3643071
Residuals 27  2359.700 87.39630
```

1. How many different types of balls were used in the experiment?
2. How many total measurements were made?
3. Which statement is the most appropriate interpretation of the data?
1. There is strong evidence that the mean distance each ball travels is not the same for all balls.
2. The data is consistent with the hypothesis that the mean distance each ball travels is the same. However, we cannot claim with high confidence that the population means are exactly equal.
3. There is strong evidence that the population means are all equal.

Problem 3:

In a larger experiment with two factors (explanatory variables), the distance a ball travels is modeled to depend on ball type and angle of the launcher. The S-PLUS output is below.

```          Df Sum of Sq  Mean Sq  F Value     Pr(F)
ball       2     1.056    0.528    0.418 0.6618089
angle      2  3012.389 1506.194 1193.830 0.0000000
Residuals 31    39.111    1.262
```

1. How many different types of balls were used in the experiment?
2. How many different angles were used in the experiment?
3. How many total measurements were made?
4. Which statement(s) are the most appropriate interpretation of the data?
1. There is strong evidence that the different balls travel different distances on average.
2. The data is consistent with there being no difference in the balls regarding the average distance they travel.
3. There is overwhelming evidence that changing the angle changes the average distance the balls travel.
4. It seems likely that changing the angle changes the average distance the balls travel, but there is still room for reasonable doubt.
5. The data is consistent with the angle having effect on the average distance the ball travels.