Math 225 Course Notes


Section 2.4: Descriptive Statistics - Measures of Central Tendency


Key Concepts

Data may be thought of to make up a sample or a population. A statistic is a number that can be calculated from the data in a sample. A parameter is a number that can be calculated from the data in a population. The mean is the average of many numbers that you should already know. The median is the middle number, for which half the numbers are smaller or larger. While both the mean and the median measure the center, they do so in different ways.

Statistics and Parameters

A descriptive measure computed from the data in a sample is called a statistic.

A descriptive measure computed from the data in a population is called a parameter.

In practice, the values of parameters are usually not known. We will usually calculate statistics from data that we have sampled, and then, on the basis of the data in the samples, make claims about the parameters which describe the population from which we sampled the data.

The remainder of this section gives formula for calculating statistics and parameters. The notation is different, and the formulas for measures of spread differ slightly for samples and populations.

The Mean

For both samples and populations, the mean is simply
    The sum of all the observations
    -------------------------------
    The number of observations
The notation for the sample mean is an x with a bar over it.

The notation for the population mean is the Greek letter mu.

The mean is the "balancing point" of a group of numbers.

Example: The mean of the numbers

  4  6  2  9  2
is 23/5 = 4.6.

It is usually appropriate to round off the value for the mean with one more place of accuracy than the original data.


The Median

The median is the middle number, after they have been put into order. If there are an odd number of observations, there is only one middle number. If there are an even number of observations, there are two middle numbers and the median is the mean of them. The median divides the observations into equal parts.

The median will be the (n+1)/2 number in a list, after they have been put in order.

Example: The median of the numbers

  4  6  2  9  2
is 4, since 4 is the middle number after they have been ordered.

Also, (5+1)/2 = 3, and 4 is the third number in the ordered list.


Comparisons between the Mean and the Median

The mean is greatly affected by outliers. If there are outliers present, the mean might not be a good representation of a "typical" value.

Example:

  4  5  6
has a mean of 5, a typical value in this sample, while
  4  5  600
has a mean of 203, which is not very typical of its sample.

The median is robust to outliers, and its value can almost always be thought of as being typical. The median in both examples above is 5.

The mean and the median each are different measures of the center of a distribution. If the distribution is symmetric, then they will be in the same place. if the distribution is skewed to the right, then the mean will be larger than the median. If the distribution is skewed to the left, then the mean will be smaller than the median.

An advantage of the median over the mean, is that it is less susceptible to the effects of outliers, and is thus more likely to be close to a "typical" value for skewed distributions.

An advantage of the mean over the median, is that it is easier to compute, since it depends only on the sum of the data, not the entire set of data. With large sets of data, it is much faster to compute the mean than the median on a computer.

Also, the mean allows one to find the total.

Example: If the mean of ten numbers is 15.7, then the total of the numbers is 157.

If the median of ten numbers is 15.7, then we cannot specify the total.



Last modified: Jan 15, 1996

Bret Larget, larget@mathcs.duq.edu