# Math 225

## Introduction to Biostatistics

### Highlights from Lecture #1

1. Course information is at this web page.

2. Further course information is on the Course Info page.

To access this:

1. Use your browser to go to the page `berne.cc.duq.edu`.
You may change this by going to My Blackboard, then Personal Tools, and finally Personal Information Editor. You should also set your regular e-mail address here if it is different.

### Chapter 1

3. The field of statistics includes methods for gathering data (sampling and experimental design), summarizing data (exploratory data analysis with both graphs and numbers), and statistical inference (making generalizations to a larger population based on observations from a sample).

To understand the methods of statistical inference, we need to know some probability.

4. You are not yet responsible for the definitions on pages 6-8. We will see these all later in the semester.

5. A typical data set is often represented with a matrix of information.
Each row represents an individual or unit, while each column represents a variable.

6. Variables may be categorical, where each individual is categorized into a discrete set, or quantitative, where each individual is measured on a numerical scale.

Quantitative variables are either discrete (only take values from some discrete set of possible values) or continuous (take values from a continuous range of possible values, although the recorded measurements are rounded).

### Chapter 2 (sections 2-1 through 2-2)

7. You should understand summation notation. (The Greek capital sigma.)

8. The mean and the median are two common measures of center.

The mean is calculated by summing the values and dividing by the number of values. It is the balancing point of he distribution of numbers.

The median is the middle number after they have been sorted. If there are an odd number of values, the median is the number at the unique middle position. If there are an even number of values, the median is average of the values at the two middle positions.

The median and the mean will be about the same for nearly symmetric distributions.

If a distribution is skewed to the right (the right half is more spread out than the left half), the mean will be larger than the median.

If a distribution is skewed to the left (the left half is more spread out than the right half), the mean will be smaller than the median.

The mean is more affected by extreme values. It may not be "typical" when there are extreme values present.

The median is a more "robust" measure of spread and is not affected by extreme values. The median may often be "typical" even when there are extreme values present.

9. You are not responsible for the mode (other than as a descriptive term regarding histograms), the geometric mean, the harmonic mean, or the weighted mean.

10. Histograms are bar graphs of quantitative variables. A numerical interval that spans the values of the variable is divided into a number of smaller equally sized intervals and a bar is drawn covering each smaller interval where the height is proportional to the count of observations in the corresponding range.

Histograms with too few intervals over-summarize the shape of the distribution of numbers.

Histograms with too many intervals look like broken combs and emphasize too many minor features of a distribution.

A good histogram often has between 5 and 20 intervals, more when there are more values.

The median of a distribution may be estimated by finding a vertical line that divides the shaded area into two regions of equal area.

The mean of a distribution may be estimated by finding the place where the shaded part would balance if it were made from a uniform solid material.

Histograms do a good job at displaying the shape, center, and spread of a distribution.