# Math 225

## Introduction to Biostatistics

### Highlights from Lecture #6

The binomial distribution is described in Chapter 5 in section 5-1. You will also be responsible for the Poisson distribution, the normal distribution, and the central limit theorem. You are not responsible for the negative binomial distribution.

1. The Binomial Distributions describe random variables that arise in this setting:
1. A fixed number of trials, designated as the parameter n.
2. Each trial has two possible outcomes. The one we count is called a "success", the other a "failure".
3. The trials are independent.
4. Each trial has the same probability of success. The success probability on an individual trial is designated as the parameter p.
The random variable is the number of successes in the trials.

2. A binomial distribution is determined completely by the parameters n and p.

3. The possible values of a binomial random variable are 0, 1, 2, ..., n.

4. To find a general formula for the probability of exactly x successes, consider the possible sequences of trials with exactly x successes and n-x failures.

Each of these sequences has probability px(1-p)n-x.

There are nCx ways to choose which x of the n trials have the successes.

Therefore,

P(exactly x successes) = nCx px(1-p)n-x.

5. The sum of the binomial probabilities for all possible outcomes is 1.

We can now use this to solve several problems.

6. ### Problem 1

The proportion of male births in the United States is 0.513. If a couple has three children (without identical twins), what is the probability that they have exactly one boy?

Solution: First check that the problem fits the binomial setting.
1. There are a fixed number of trials (the couple decides to have three children before knowing the sex of the children, so n=3);
2. Each trial has two possible outcomes, boy and girl.
3. The trials are independent because the sex of one child does not affect the sex of other children.
4. We assume that the couples chance of having a male child is equal to the national average of 0.513, so p=0.513.

P(exactly one boy) = 3C1 (0.513)1 (0.487)2 = 0.3650.

7. ### Problem 2

In a common laboratory blood test, the number of white blood cells and their types are of critical interest. One of the categories of white blood cells is the neutrophils. Suppose that in an individual, sixty percent of all white blood cells are neutrophils. Five white blood cells are sampled at random.

(a) What is the probability that only the second and fifth white blood cells are neutrophils?
(b) What is the probability that only the first and third white blood cells are neutrophils?
(c) What is the probability that exactly two of the five white blood cells are neutrophils?

Solution: Notice that the binomial setting applies with n=5 and p=0.6, where the total number of neutrophils is a binomial random variable. However, the first two questions ask about a specific sequence of trials.

(a) (0.4)*(0.6)*(0.4)*(0.4)*(0.6) = (0.6)2(0.4)3 = 0.02304.

(b) (0.6)*(0.4)*(0.6)*(0.4)*(0.4) = (0.6)2(0.4)3 = 0.02304.

(c) P(X=2) = 5C2 (0.6)2 (0.4)3 = 10 (0.6)2 (0.4)3 = 0.2304.

8. ### Problem 3

The national incidence of chronic bronchitis in children in the first year of life is 5%. In a random sample of twenty households with one-year-old children where both parents have chronic bronchitis, three of the children also have chronic bronchitis. If we assume that these children have the same risk of chronic bronchitis as the national average, what is the probability that three or more households with one-year-old children would have a child with this condition?

Solution: Check that the binomial setting applies with n=20 and p=0.05.

The probability of three or more could be found by plugging into the binomial probability formula for the outcomes from 3 to 20 and summing them. However, it is less work to plug in for the outcomes from 0 to 2, sum them and subtract this sum from 1. This uses the fact that all of the binomial probabilities sum to 1.

Let X be the number of sampled households with a child with chronic bronchitis.

P(X >= 3) = 1 - P(X < 3)
= 1 - P(X=0) - P(X=1) - P(X=2)
= 1 - 20C0 (0.05)0 (0.95)20 - 20C1 (0.05)1 (0.95)19 - 20C2 (0.05)2 (0.95)18
= 0.0755.

Thus, if it really true that on-year-olds in households where both parents have chronic bronchitis have the same risk as children in all households nationally, there is about a 7.5% chance that there would be three or more cases out of twenty sampled. This probability is small enough to suggest that there may be higher risk in kids where parents have the condition, but this probability is not so small to be very convincing. The study is not large enough to separate the effects of random chance from genetics with much certainty.

9. ### Problem 4

In the previous problem, what is the balancing point (mean or expected value) of the distribution? What is the size of a typical deviation from this mean (standard deviation)?

Solution: The mean or balancing point of the binomial distribution is np. In the previous problem, if 5% of all kids have chronic bronchitis, you might expect the proportion of sampled kids with chronic bronchitis to be close to 5% as well, and 5% of 20 is 20*0.05 = 1.

The size of a typical deviation from the mean has a more complicated formula. It is SD = square-root(n*p*(1-p)). For the previous problem, sqrt(20*0.05*0.95) = 0.975. In other words, the sampled number of households with a child with chronic bronchitis is expected to be close to one, but if the actual count differs from this mean by about one, that is fairly typical.