### Chapter 7

#### The Big Picture

Hypothesis testing is a formal way of using data and statistical reasoning to answer whether or not statements about populations are plausible. The basic procedure is to:
1. Assume something specific about a population. As an example, assume that the mean effectiveness of a new drug is the same as the mean effectiveness of an old drug.
2. Find a test statistic and the corresponding sampling distribution. In our example, we would want to compare the effectiveness of the two drugs on two different samples by comparing sample means. Under the hypothesis of no difference in effectiveness, we expect the difference in the sample means to be nearly zero, although there will almost certainly be some chance variation. Knowledge of the sampling distribution for the difference in sample means will allow us to determine whether or not an observed deviation from zero is consistent with chance variation, or a truly unusual and rare occurance.
3. Find a p-value. A p-value is the chance that, were we to repeat the experiment on different randomly chosen samples, we would again get a result at least as extreme as the one actually observed. A small p-value indicates that something rare happened ... if we maintain our belief in our assumed hypothesis. This is evidence against the assumed hypothesis since rare events happen rarely. Alternatively, a large p-value indicates that what was observed could be reasonably be explained by chance variation. This does not confirm that the assumed hypothesis is true, but merely indicates that it is one (of several) decent explanations for the observed data.
Deciding whether a p-value is "large" or "small" is certainly subjective, and should be determined by the context of the problem and the consequences of any decisions based on this determination.

It is common (but questionable) practice in the health sciences to compare p-values to arbitrary fixed significance levels such as .05 or .01. When p-values fall below these levels, results are called statistically significant or highly statistically significant.

In addition to considering the statistical significance of a result, one should also look at its practical importance. A large study might very well indicate that a drug is more effective by resulting in a low p-value. However, the increase in effectiveness, while real, may be of inconsequential practical importance or not worth an increase in cost or side effects.

In contrast, an observed difference might be of great practical importance, but result from a small study not sufficiently powerful enough to convincingly demonstrate that the result is not due to coincidental chance variation.