### Section 6.4: Confidence Interval for the Difference Between Two Population Means

#### Key Concepts

This confidence interval is appropriate whenever the situation involves two independent samples from different populations and the sample means are compared. As in all examples presented in this chapter, the central limit theorem enables us to conclude that the sampling distribution for the difference in sample means is approximately normal.

In this situation, there are different formula for the mean and standard error, but the same logic and procedure for solving problems remains the same.

When (as is usually the case) the population standard deviations are unknown, the standard error must be estimated and a multiplier from a t-distribution should be used. The textbook presents two ways of doing this, neither of which is currently believed to be the best way to handle the problem. These notes will demonstrate an alternative method.

Examples will demonstrate the methodology.

#### Formula

The sampling distribution of is summarized by:

mean() =

and SE() =

The shape will be approximately normal for sufficiently large samples. For most practical applications, this will hold if each sample has at least 25 or 30 observations.

When the population standard deviations are known, and the samples are sufficiently large, plug into the general confidence expression

```(estimate) +/- (reliability coefficient)(standard error)
```

#### Unknown Population Standard Deviations

This information is not in the textbook. The textbook states two approaches to finding confidence intervals when population standard deviations must be estimated. These are:
1. assume the unknown standard deviations are the same, and pool the information from both samples to estimate this common standard deviation; and
2. take a weighted average of the multipliers you would use for each sample alone.
The preferred technique, with modern statistical software that can accurately compute areas under t distributions with noninteger degrees of freedom is to estimate the degrees of freedom with the formula

#### Example

A method to assess the effectiveness of a drug is to measure its concentration in the urine after a period of time. Twenty people are given the first brand and twenty-five are given the second brand. Suppose that it is known that the population standard deviations for concentration are 8.6 and 7.8 (mg%) respectively. For the two samples, the mean concentrations are 19.2 and 15.6 (mg%) one hour after ingestion. Give a 95% confidence interval for ().

Solution:

The estimate for the difference in population means is the difference in sample means, 19.2 - 15.5 = 3.7. The exact standard error is

```   SE = sqrt( (8.6^2 / 20) + (7.8^2 / 25) ) = 2.476
```
Since we know the standard error for the population, and do not need to estimate it from sample data, we can use the reliability coefficient 1.96 from the normal table.
```  3.7 +/- (1.96)(2.476)
```
We are 95% confident that the difference in means is in the interval
```  3.7 +/- 4.9
```
Example 2 (unknown population SDs)

A method to assess the effectiveness of a drug is to measure its concentration in the urine after a period of time. Twenty people are given the first brand and twenty-five are given the second brand. For the two samples, the mean concentrations are 19.2 and 15.6 (mg%) with standard deviations of 8.6 and 7.8 (mg%) respectively, one hour after ingestion. Give a 95% confidence interval for ().

Solution:

The estimate for the difference in population means is the difference in sample means, 19.2 - 15.5 = 3.7. The estimated standard error, found by replacing population standard deviations with sample standard deviations is

```   SE = sqrt( (8.6^2 / 20) + (7.8^2 / 25) ) = 2.476
```
The estimated number of degrees of freedom is
```   df = 2.476^4 / ( (1/19)(8.6^2 / 20)^2 + (1/24)(7.8^2 / 25)^2 ) = 38.9
```
With a table, we'll round down to 35 degrees of freedom.

The multiplier for 35 degrees of freedom is 2.0301. (With software, we could have done better and found 2.0229.)

```  3.7 +/- (2.0301)(2.476)
```
We are 95% confident that the difference in means is in the interval
```  3.7 +/- 5.0
```