### Section 4.3: The Binomial Distribution

#### Key Concepts

The binomial distribution arises from counting the number of heads in a prespecified number of tosses, or trials. This is a model for the way that data is produced for a vast number of examples in statistics. In particular, we will use this model when examining the proportion of a random sample that belongs to a particular category.

Combinations are the way to count the number of ways that x things can be chosen from n. These are needed to determine the probability that a binomial random variable takes on a particular value.

Probabilities for binomial random variables can be determined by formula or by using a table in the back of the book. You should know both.

Every binomial random variable is described by two numbers, or parameters. n is the number of trials, and p is the probability of a success.

#### A Motivating Example:

Suppose that 60% of students at Duquesne University are female and that 40% are male. If three are randomly chosen, what is the probability that two are female?

Since the number of Duquesne students is very large compared to our sample size, we can model this random sampling by taking a box with 6 black balls (for the females) and 4 white balls (for the males) and taking a sample of size three, one at a time, replacing the drawn ball each time.

Here is a list of all the possible sequences of outcomes.

```WWW  which has probability   (.4)(.4)(.4) = (.4)^3
WWB  which has probability   (.4)(.4)(.6) = (.4)^2 (.6)
WBW  which has probability   (.4)(.6)(.4) = (.4)^2 (.6)
BWW  which has probability   (.6)(.4)(.4) = (.4)^2 (.6)
WBB  which has probability   (.4)(.6)(.6) = (.4) (.6)^2
BWB  which has probability   (.6)(.4)(.6) = (.4) (.6)^2
BBW  which has probability   (.6)(.6)(.4) = (.4) (.6)^2
BBB  which has probability   (.6)(.6)(.6) = (.6)^3
```
There are three sequences with exactly two black balls, so the probability is 3 (.4) (.6)^2.

#### The Binomial Setting

A Bernoulli trial has one of two possible values. One is called a "success" and the other is called a "failure". We want to count the number of successes.

The binomial distribution is appropriate when we have this setting:

1. there are a fixed number of Bernoulli trials;
2. there are two possible outcomes for each trial;
3. the trials are independent of one another;
4. there is the same chance of success for each trial;
The random variable is the total number of successes in the trials.

#### Combinations

In the example, there were three sequences with exactly two successes. If there are n trials, there will be exactly nCx trials with exactly x successes, where
```    nCx = n! / (x! (n-x)! )
```
The notation ! is called "factorial". For example,
```5! = 5*4*3*2*1 = 120
4! =   4*3*2*1 =  24
3! =     3*2*1 =   6
2! =       2*1 =   2
1! =         1 =   1
0! =               1
```
0! is defined as 1 so that the formula nCx makes sense when x=0 or x=n.

Here are some example calculations:

```3C2 = 3! / (2! 1!) = (6 / 2) = 3
5C2 = 5! / (2! 3!) = (5*4) / (2*1) = 10
```
When doing by hand, it is generally best to cancel as much as you can first. Many of you can do this calculation on your calculators directly.

#### Binomial Random Variables

If X is a random variable that comes from a binomial setting, where there are n trials and p is the probability of success for a single trial, the probability that X = x is

```   P( X = x ) = nCx p^x (1-p)^(n-x)   for x = 0, 1, 2, ..., n
```
Show how this formula holds for the example.

#### Binomial Table

If n is small and p is computed to two decimal places, binomial probabilities can be found from the extensive table from pages 653 to 681.

Here are a few examples:

Example: n = 15, p = .4.

1. Find P(X = 10).
2. Find P(X < 10).
3. Find P(3 < X < 10).
4. Find P(X > 10).
Solutions:
1. P(X = 10) = P(X <= 10) - P(X <= 9) = .9907 - .9662 = .0245
2. P(X < 10) = P(X <= 9) = .9662
3. P(3 < X < 10) = P(X <= 9) - P(X <= 3) = .9662 - .0905 = .8757
4. P(X > 10) = 1 - P(X <= 10) = .9907

#### Binomial Parameters

n and p are the two binomial parameters. n is the number of trials, and p is the probability of success on an individual trial.

np = mu is the mean of the distribution.

sqrt(n*p*(1-p)) = sigma is the standard deviation of the distribution.