Section 5.5: Distribution of the Sample Proportion

Key Concepts

This sampling distribution should be used whenever a single sample is drawn and the statistic of interest is counting the proportion in the sample that fall into a given category. As in all examples presented in this chapter, the central limit theorem allows the use of the normal distribution to find probabilities.

In this situation, there are different formula for the mean and standard error, but the same logic and procedure for solving problems remains the same.

Since sample proportions can only take on a finite number of possible values for a given sample size, there can be error caused by approximating a discrete distribution with a continuous one. This can be fixed using the correction for continuity.

Another example shows how to use the sampling distribution.

Formula

The sampling distribution of is summarized by:

mean( ) = p

and SE( ) = The shape will be approximately normal for sufficiently large samples. A general rule of thumb is that if np > 5 and n(1-p) > 5, then the distribution will be approximately normal. When n is small, the approximation can be improved greatly by using the correction for continuity.

Correction for Continuity

Suppose that p = .60 and a sample of size 100 is randomly chosen. Find the probability that the sample proportion is between 0.56 and 0.65.

Draw a sketch!

The sampling distribution of has a mean of .6 and an SE of = .0490. The z-scores are respectively

z = (.56 - .60) / .0490 = -0.82 and z = (.65 - .60) / .0490 = 1.02

The area between -0.82 and 1.02 under the normal curve is .8461 - .2061 = .6400.

An alternative way to approach the problem is to say that

P(.56 <= <= .65) = P(56 <= X <= 65)

where = X/100, and X is the count of successes in the sample. X is a binomial random variable with a mean of 100(.6) = 60, and standard deviation of 4.90. The probability that X is exactly equal to a number x is well approximated by the area between x - 1/2 and x + 1/2 under a normal curve with mean = 60 and standard deviation = 4.90. Thus P(56 <= X <= 65) should be well approximated by the area between 55.5 and 65.5. The z-scores are

z = (55.5 - 60)/4.9 = -0.92 and (65.5 - 60)/4.9 = 1.12

The area under the standard normal curve between these values is .8686 - .1788 = .6898.

The exact probability from the binomial distribution is .6908. Thus, the straightforward calculation using the correction for continuity is much more accurate, even with a sample of size 100.

Example

2% of all American women aged 50--54 have breast cancer. Suppose that 1000 women in this age group are randomly selected. What is the probability that the proportion of women in the sample, ( ), exceeds 4%?

The sampling distribution for the sample proportion will be approximately normal since (1000)(.02) = 20, which is much larger than 5. Since the sample size is so large, the correction for continuity should not make much difference.

The distribution is centered at .02 and has an SE of = .00443. The z-score is

z = (.04 - .02) / .0043 = 4.65

Since the area to the right of 4.65 under the standard normal curve is essentially 0, it is quite unlikely to observe such a result by chance. If the sample proportion actually is .04, it is likely that either p is not .02, or the method for sampling was biased.