Chapter 8

The Big Picture

Analysis of Variance (ANOVA) is a general technique for analyzing data in which the variable of interest (the response variable) is quantitative, and the explanatory variables are categorical.

We will concentrate only on one specific question, namely, testing whether the population means for a variable measured in several different populations are all the same or not. Each individual is measured (the response variable) and categorized into one of the populations (the explanatory variable).

The basic idea is simple. Find the mean from each sample. If there is a great deal of variation among the sample means, this is evidence that the population means are not all the same, since we expect the sample means to be "close" to the population means. "Close" is determined by the amount of variation within each sample and the sample sizes. If there is a great deal of variation within the samples, this indicates that there is uncertainty about the location of the population mean, and hence, it is more plausible that the population means are all the same. In contrast, if there is very little variation within the samples, we have a good idea of where the population means are, and potential evidence that the population means are not all the same.

To conduct a formal test, we compare these two variations.

``` F = variation among sample means  /  variation within samples
```
If this F statistic is large, this is evidence that the population means are not all equal (since variation among sample means is large compared to variation within samples). If F is not large, this is consistent with the hypothesis that all population means are equal. An ANOVA table is a device to facilitate this computation. To decide whether F is large or not, the value of the test statistic should be compared to the the F distribution with the correct numbers of degrees of freedom.