Math 225

Introduction to Biostatistics

Notes from Lecture #9

Chapter 4
A population is a group of individuals about which we wish to make an inference. We usually do not gather information from the entire population.
A sample is a subset of the population. We usually have data on the sampled individuals.
A statistical inference is a conclusion about a population based on sampled observations.
A parameter is a numerical characteristic of a population, such as a mean or standard deviation.
A statistic is a numerical characteristic of a sample. Statistics may be calculated from data in a sample.
A simple random sample is a sample drawn at random from a population in such a way that every possible sample of the same size has the same chance of being selected.
You are not responsible for using a random number table to take a random sample.
A random variable is a numerical quantity that can take various values with various probabilities.
The possible values of discrete random variables may be listed.
The possible values of continuous random variables are a continuous range.
The distribution of a random variable is a description of the probability that a random variable takes a value in (almost) any set.
A common way to describe the distribution of a discrete random variable is with a table that lists the possible values and their probabilities which must sum to one.
A common way to describe the distribution of a continuous random variable is with a density curve where the probability that the random variable falls in an interval is the area under the curve above the interval. The total area under a density curve is one.
The sampling distribution of a statistic is the distribution of possible values if the statistic over all random samples of a given size.
Facts about the sampling distribution of x-bar.
1. The mean of the sampling distribution of x-bar is mu, the population mean.
2. The standard deviation of the sampling distribution of x-bar is sigma/sqrt(n), the population standard deviation divided by the square root of the sample size. This entity is called the standard error of the sample mean. Notice that the standard error gets smaller as the sample size increases. This reflects the fact that the mean from a large sample is more likely to be close to mu than the mean from a small sample. The standard error may be interpreted as a typical distance between the sample mean and the population mean.

Example:

Consider a very small population with four values, 2, 4, 4, and 10. The mean of this population is

mu = (2+4+4+10)/4 = 5

and the standard deviation is

sigma = sqrt(((2-5)^2 + (4-5)^2 + (4-5)^2 + (10-5)^2) / 4) = sqrt(9) = 3

Consider all possible samples (with replacement) of size 2. There are sixteen of these.

        Second
       2  4  4  10
------------------
F  2 | 2  3  3   6
i  4 | 3  4  4   7
r  4 | 3  4  4   7
s 10 | 6  7  7  10
t

Each of these sixteen values has the same chance of being the sample mean.

The mean of this sampling distribution is

mean = (2 + 3 + 3 + ... + 10) / 16 = 5

and the standard deviation is

SE = sqrt(((2-5)^2 + (3-5)^2 + ... + (10-5)^2)/16)
   = sqrt(4.5)
   = sqrt(9/2)
   = 3 / sqrt(2)
   = sigma / sqrt(2)

Last modified: February 7, 2001

Bret Larget, larget@mathcs.duq.edu

Math 225

Introduction to Biostatistics

Notes from Lecture #9

Chapter 4

Facts about the sampling distribution of x-bar.

Example: