Math 225

Introduction to Biostatistics


The Binomial Distribution

Prerequisites

This lab assumes that you already know how to:
  1. Login, find course Web page, run S-PLUS
  2. Use the Commands Window to execute commands

Technical Objectives

This lab will teach you to:
  1. Use S-PLUS to make binomial distribution calculations.
  2. Load in and run a program.
  3. Use S-PLUS to visualize the binomial distribution.

Conceptual Objectives

In this lab you should learn to:
  1. Understand when the binomial distribution is an appropriate model.
  2. Begin to understand how probability underlies statistical inference.

The Binomial Distribution

The binomial distribution is the discrete probability distribution which counts the number of ``successes'' in a fixed number of independent ``trials''. You may think of a trial as a random experiment that has two possible outcomes. The classic example is counting the number of heads in a fixed number of coin tosses. Many situations in the health sciences may be modeled with this distribution. One example is random sampling from a very large population where a variable of interest is categorical. Example variables include smoking status, gender, survival status, whether an individual is hypertensive or not, etcetera. Genetics is another area where the binomial distribution is often applicable.

The Binomial Setting

The binomial distribution is appropriate when these four conditions hold:
  1. There are a fixed number of trials. That is, the number of trials is determined before the trials occur.
  2. Each trial has two possible outcomes. The outcome which is being counted is often called a ``success'' and the other outcome is called a ``failure''.
  3. Each trial has the same probability of success.
  4. The trials are independent, meaning that the outcome of one trial does not affect the outcome of any other trial.

The binomial distribution is completely described by two parameters - n, which is the number of trials, and p, which is the success probability on any individual trial.

The probability that there are exactly x successes in n trials with success probability p is

Prob(exactly x successes) = n!/(x!(n-x)!) p^x (1-p)^(n-x)
Just as you can compute the mean and standard deviation of data to measure ``center'' and ``spread'', you can do the same for the binomial distribution.
mean = n*p
sd   = sqrt(n*p*(1-p))
By the way, sqrt is computer short hand for "square root".

In S-PLUS, the two most important functions for calculating binomial probabilities are dbinom which calculates the probability that exactly x successes occur in n trials with success probability p and pbinom which calculates the probability that x or fewer successes occur in n trials with success probability p. The ``d'' in dbinom refers to ``density'' and the ``p'' refers to ``probability''. This nomenclature is more appropriate for continuous random variables, but this is what it is. You will also load in a local function gbinom to graph binomial distributions for different parameter values.

S-PLUS help is available in this on-line guide.


Note that you can use the mouse to highlight a command from Netscape, switch over to S-PLUS, and paste the command into the Commands Window. This can save on typing. Also, you may use the arrow keys to retrieve and edit previous commands.

In-class Activities

  1. Open a Commands Window. [How?]
  2. Calculate a single binomial probability using dbinom. To find the probability that there are exactly 2 successes in 6 trials when the success probability is 0.4, type
    > dbinom(2,6,0.4)
    
  3. You could also calculate all such binomial probabilities in one command.
    > dbinom(0:6,6,0.4)
    
    S-PLUS interprets 0:6 as the array of integers from 0 to 6.
  4. Find the probability of two or fewer successes using pbinom.
    > pbinom(2,6,0.4)
    
  5. Find the probability of 2 or more successes. Note that the sum of all binomial probabilities is one, so that the desired probability is one minus the probability of one or fewer successes.
    > 1 - pbinom(1,6,0.4)
    
    Alternatively, we could sum up the individual binomial probabilities.
    > sum(dbinom(2:6,6,0.4))
    
  6. Load in the function gbinom by following these steps.
    1. Click on the gbinom link above.
    2. Save the file onto the Desktop.
    3. Switch over to S-PLUS.
    4. Under the file menu, select Open.
    5. Open the file gbinom.ssc. You may need to change the box ``Look in'' to Desktop and the box ``File type'' to either all files or *.ssc files. This opens up a Script Window.
    6. Under the Script menu, choose Run. This will load the function gbinom into S-PLUS.
    7. Close the Script Window by clicking the x-button in the upper right corner.
  7. Graph the binomial distribution with n=6 and p=0.4.
    > gbinom(6,0.4)
    
  8. Graph the binomial distribution with n=100 and p=0.4 over the entire range.
    > gbinom(100,0.4)
    
    You may also scale the graph so that the x-axis contains only high probability values
    > gbinom(100,0.4,scale=T)
    
    or graph a specified range.
    > gbinom(100,0.4,low=20,high=30)
    
  9. Graph the binomial distribution with n=30 for p ranging from 0.1 to 0.9 (by 0.1).
    > for(p in seq(0.1,0.9,0.1)){gbinom(30,p)}
    
    Click on the Page tabs to see each of the nine graphs.
  10. Graph the binomial distribution with p=0.5 for n ranging from 10 to 100 (by 10).
    > for(n in seq(10,100,10)){gbinom(n,0.5)}
    

Homework Assignment

Load the function gbinom into S-PLUS (if it has not already been done) and answer the questions below. You should write your answers on this form and turn it in to your lab instructor by the due date.

Further S-PLUS help is available in this on-line guide.

  1. A couple who are both carriers of a genetic disease have a 0.25 probability of passing the disease on to any offspring. If they have five children, a random number will have the disease. Use S-PLUS to find the binomial probability of each possible outcome. Verify by hand calculation using the binomial probability formula the probability that exactly two children inherit the disease.

  2. Ten percent of African-Americans are carriers for the genetic disease sickle-cell anemia. In a random sample of seventy-five African-Americans, what is the probability that four or fewer are carriers for the disease?

  3. Many athletes wear the Breathe-Right nasal strip in the hope that it will improve their athletic perfomance by allowing them to breathe easier. A scientist tests the claim that these strips improve the body's ability to process oxygen by conducting an experiment which measures the oxygen processing of an athlete while the athlete rides an exercycle. Each athlete is measured both with and without the nasal strip on separate days. If there is no effect, one would expect that the better performance would be equally likely to occur with the strip or without. In an experiment with twenty athletes, thirteen have a better performance while wearing the nasal strip. What is the probability that thirteen or more athletes would exhibit an improvement with the nasal strip, assuming that there is no benefit?

  4. The mean of the binomial distribution is n*p and the standard deviation is sqrt(n*p*(1-p)). For a distribution with n=500 and p=0.5, what are the mean and standard deviation? What is the probability that a binomial random variable with these parameters is within one standard deviation of the mean?

  5. Plot the binomial distribution with n = 8 successively for p ranging from 0.1 to 0.9 by 0.1.
    > for(p in seq(0.1,0.9,0.1)){gbinom(8,p)}
    
    Examine the skewness in each graph.

    When p is less than __________, the distribution is skewed to the __________.

    When p is greater than __________, the distribution is skewed to the __________.

    When p equals __________, the distribution is perfectly symmetric.

  6. Plot the binomial distribution with p = 0.12 successively for n ranging from 5 to 95 by 10 (with the scale=T).
    > for(n in seq(5,95,10)){gbinom(n,0.12,scale=T)}
    
    Examine the shape in each graph.

    As the sample size increases, the skewness (increases/decreases).

    Say that a probability is nonnegligible if it is visible in a plot of the distribution. As the sample size increases, the absolute range of values for which the probability is nonnegligible (increases/decreases).

    As the sample size increases, the proportion of possible values for which the probability is nonnegligible (increases/decreases).

    As the sample size increases, the general shape resembles a __________ curve.


Last modified: February 1, 2001

Bret Larget, larget@mathcs.duq.edu