Math 225

Introduction to Biostatistics


The Normal Distribution

Prerequisites

This lab assumes that you already know how to:
  1. Login, find course Web page, run S-PLUS
  2. Use the Commands Window to execute commands
  3. Load and run an S-PLUS program

Technical Objectives

This lab will teach you to:
  1. calculate normal probabilities
  2. calculate normal quantiles
  3. graph the normal distribution
  4. use the normal distribution to approximate the binomial distribution

Conceptual Objectives

In this lab you should learn to:
  1. make interpretations in problems which use the normal distribution
  2. understand when the normal approximation to the binomial distribution is appropriate

Normal Distribution

The normal distribution is the continuous probability distribution popularly known as the ``bell-shaped curve''. For continuous random variables, probability is represented by the area under a curve. The total area under the curve is 1. The probability that a random variable falls between numbers a and b is the area under the curve between a and b.

Parameters of the Normal Distribution

The normal distribution is described by two parameters. The mean ``mu'' is the location of the balancing point of the distribution, which by symmetry, is also the median. The standard deviation ``sigma'' represents the distance from the mean to either point where the curve becomes steepest, one below the mean and one above.

The Standard Normal Curve

The standard normal curve is the normal distribution with mean mu=0 and standard deviation sigma=1. Probabilities for any normal distribution may be determined from the standard normal curve by the standardization formula

z = (x-mu)/sigma.

In particular, the probability a normal random variable x with mean mu and standard deviation sigma is between numbers a and b is equal to the probability that a standard normal random variable z is between (a-mu)/sigma and (b-mu)/sigma.

The 68-95-99.7 Rule

It is useful to keep a few benchmark figures in mind.
  1. For any normal curve, the area within one standard deviation of the mean is about 68%.
  2. For any normal curve, the area within two standard deviations of the mean is about 95%.
  3. For any normal curve, the area within three standard deviations of the mean is about 99.7%.

The Normal Distribution on S-PLUS

In S-PLUS, the two most useful functions for calculating normal probabilities and quantiles pnorm which calculates the probability that the normal random variable is x or lower and qnorm which finds the pth quantile of a distribution. The ``p'' in pnorm refers to ``probability'' and the ``q'' in qnorm refers to ``quantile''. You will also load in a local function gnorm to graph normal distributions for different parameter values.

> pnorm(x,mu,sigma)
is the area to the left of x under a normal curve with mean mu and standard deviation sigma.
> pnorm(x)
is the area to the left of x under the standard normal curve.
> qnorm(p,mu,sigma)
is the number x for which the area to the left of x under a normal curve with mean mu and standard deviation sigma is p.
> qnorm(p)
is the number x for which the area to the left of x under the standard normal curve is p.

S-PLUS help is available in this on-line guide.


Note that you can use the mouse to highlight a command from Netscape, switch over to S-PLUS, and paste the command into the Commands Window. This can save on typing. Also, you may use the arrow keys to retrieve and edit previous commands.

In-class Activities

  1. Open a Commands Window. [How?]
  2. Find the area to the left of 90 under a normal curve with mean 100 and standard deviation 10.
    > pnorm(90,100,10)
    
  3. Find the area to the right of 120 under a normal curve with mean 100 and standard deviation 10.
    > 1 - pnorm(120,100,10)
    
  4. Find the area between 90 and 120 under a normal curve with mean 100 and standard deviation 10.
    > pnorm(120,100,10) - pnorm(90,100,10)
    
  5. Notice that standardizing 90 and 120 give -1 and 2 respectively. Verify that the same probabilities could have been determined with areas under the standard normal curve.
    > pnorm(-1)
    > 1 - pnorm(2)
    > pnorm(2) - pnorm(-1)
    
  6. Find the lower and upper quartiles of a normal distribution with mean 500 and standard deviation 100. SAT scores are scaled to be close to this distribution.
    > qnorm(0.25,500,100)
    > qnorm(0.75,500,100)
    
  7. Verify the 68-95-99.7 rule mentioned above.
    > pnorm(1) - pnorm(-1)
    > pnorm(2) - pnorm(-2)
    > pnorm(3) - pnorm(-3)
    
    Now check it again with mu=100 and sigma=15.
    > pnorm(115,100,15) - pnorm(85,100,15)
    > pnorm(130,100,15) - pnorm(70,100,15)
    > pnorm(145,100,15) - pnorm(55,100,15)
    
  8. Load in the function gnorm by following these steps.
    1. Click on the gnorm link above.
    2. Save the file onto the Desktop.
    3. Switch over to S-PLUS.
    4. Under the file menu, select Open.
    5. Open the file gnorm.ssc. You may need to change the box ``Look in'' to Desktop and the box ``File type'' to either all files or *.ssc files. This opens up a Script Window.
    6. Under the Script menu, choose Run. This will load the function gnorm into S-PLUS.
    7. Close the Script Window by clicking the x-button in the upper right corner.
  9. Graph the normal distribution with mu=500 and sigma=100.
    > gnorm(500,100)
    
  10. Graph the normal distribution with mu=500 and sigma=100 and display the area between 400 and 600.
    > gnorm(500,100,prob=T,a=400,b=600)
    
  11. Graph the normal distribution with mu=500 and sigma=100 and find the 0.01 quantile.
    > gnorm(500,100,quantile=T,p=0.01)
    

Homework Assignment

Load the function gnorm into S-PLUS (if it has not already been done) and answer the questions below. You should write your answers on this form and turn it in to your lab instructor by the due date.

Further S-PLUS help is available in this on-line guide.

  1. Weights of six-year-old boys are normally distributed with a mean of 48 pounds and a standard deviation of 6 pounds.
    1. Find the proportion of six-year-old boys with weights between 45 and 55 pounds.
    2. Find the proportion of six-year-old boys who weigh more than 65 pounds.
    3. Between what two values are the middle 80% of the weights?
    4. What is the z-score of a 35 pound six-year-old boy? What proportion of six-year-old boys weigh less? Is a 35 pound six-year-old boy unusually light?

  2. Diastolic blood pressure in hypertensive women is approximately normally distributed with a mean of 100 mm Hg and a standard deviation of 14 mm Hg.
    1. What proportion of blood pressures are between 94 and 114 mm Hg?
    2. What proportion of blood pressures are greater than 120 mm Hg?
    3. What is the 90th percentile of this distribution?

  3. If sigma=sqrt(n*p*(1-p)) is at least three or so, exact binomial probabilities may be approximated by areas under a normal curve where mu=n*p and sigma=sqrt(n*p*(1-p)). The probability of exactly x successes would be approximated by the area from x-0.5 to x+0.5 under a normal curve with mean and standard deviation as above. The exact probability Prob(a < x < b) would then be approximated by the area between a+0.5 and b-0.5. Note that with a computer, it is preferable to simply make the exact binomial calculation. Without a computer, the exact binomial calculation requires computing many individual binomial probabilities, whereas the normal approximation calculation is relatively short.

    1. Ten percent of African Americans are carriers for sickle cell anemia. In a random sample of 1200 people from this population, what is the exact binomial probability that 140 or more individuals are carriers for the disease?
    2. Use the normal approximation to the binomial to give a numerical value to the previous problem.

      (Hint: The mean of the binomial distribution is 1200*0.1. The standard deviation of the binomial distribution is sqrt(1200*0.1*0.9). The probability of 140 or more successes is approximated as the area to the right of 139.5 under the normal curve with mean and standard deviation in agreement with the binomial distribution.)

  4. A physician reads that thyroid stimulating hormone (TSH) levels in people with healthy thyroid glands are normally distributed with a mean of 3.2 and a standard deviation of 0.9. A blood test shows that a patient has a TSH level of 45. How many standard deviations is this measurement above the mean? Does this patient have a healthy thyroid gland? Explain.

Last modified: February 1, 2001

Bret Larget, larget@mathcs.duq.edu