#### Chapter 4

A**population**is a group of individuals about which we wish to make an inference. We usually do not gather information from the entire population.- A
**sample**is a subset of the population. We usually have data on the sampled individuals. - A
**statistical inference**is a conclusion about a population based on sampled observations. - A
**parameter**is a numerical characteristic of a population, such as a mean or standard deviation. - A
**statistic**is a numerical characteristic of a sample. Statistics may be calculated from data in a sample. - A
**simple random sample**is a sample drawn at random from a population in such a way that every possible sample of the same size has the same chance of being selected. - You are not responsible for using a random number table to take a random sample.
- A
**random variable**is a numerical quantity that can take various values with various probabilities.The possible values of

**discrete random variables**may be listed.The possible values of

**continuous random variables**are a continuous range. - The
**distribution**of a random variable is a description of the probability that a random variable takes a value in (almost) any set.A common way to describe the distribution of a discrete random variable is with a table that lists the possible values and their probabilities which must sum to one.

A common way to describe the distribution of a continuous random variable is with a density curve where the probability that the random variable falls in an interval is the area under the curve above the interval. The total area under a density curve is one.

- The
**sampling distribution**of a statistic is the distribution of possible values if the statistic over all random samples of a given size. #### Facts about the sampling distribution of x-bar.

- The mean of the sampling distribution of x-bar is mu, the population mean.
- The standard deviation of the sampling distribution of x-bar is sigma/sqrt(n),
the population standard deviation divided by the square root of the sample size.
This entity is called the
**standard error**of the sample mean. Notice that the standard error gets smaller as the sample size increases. This reflects the fact that the mean from a large sample is more likely to be close to mu than the mean from a small sample. The standard error may be interpreted as a typical distance between the sample mean and the population mean.

#### Example:

Consider a very small population with four values, 2, 4, 4, and 10. The mean of this population ismu = (2+4+4+10)/4 = 5

and the standard deviation is

sigma = sqrt(((2-5)^2 + (4-5)^2 + (4-5)^2 + (10-5)^2) / 4) = sqrt(9) = 3

Consider all possible samples (with replacement) of size 2. There are sixteen of these.

Second 2 4 4 10 ------------------ F 2 | 2 3 3 6 i 4 | 3 4 4 7 r 4 | 3 4 4 7 s 10 | 6 7 7 10 t

Each of these sixteen values has the same chance of being the sample mean.

The mean of this sampling distribution is

mean = (2 + 3 + 3 + ... + 10) / 16 = 5

and the standard deviation is

SE = sqrt(((2-5)^2 + (3-5)^2 + ... + (10-5)^2)/16) = sqrt(4.5) = sqrt(9/2) = 3 / sqrt(2) = sigma / sqrt(2)

Last modified: February 7, 2001

Bret Larget, larget@mathcs.duq.edu