Statistics 50
Lecture 5


LECTURE 5: THE NORMAL APPROXIMATION FOR DATA

A. Density Curve

A Density Curve is a curve that

B. The Standard Normal Distribution: A special density curve

  1. Background
    A mathematician named Quetelet measured all sorts of things about large samples (e.g. 5000) of people -- height, weight, eyesight -- and found a regular pattern. Most people were average and smaller numbers were below average and above average. The distribution of traits had a "bell shape." This curve is an ideal to which other distributions can be compared.
  2. Properties
    a. Symmetric, bell-shaped

    b. Mean 0, SD 1

    c. The median is where 50% (half) of the observations are on either side. In this distribution, the mean is equal to the median.

    d. Area under the curve is equal to 1 or (100% when expressed as a proportion. Area under the curve represent proportions of the observations.

    e. 68%-95%-almost 100% rule (see p. 64)
    About 68% fall within plus or minus 1 SD of the mean
    About 95% fall within plus or minus 2 SD of the mean
    Nearly 100% (99.7%) fall within plus or minus 3 SD

  3. Using a standard normal table (Inside front cover of your book or part of the yellow foldout in the back)
    a. First step: Draw a picture...

    ( i) Draw the x-axis. These are the STANDARDIZED SCORES or Z-SCORES.

    ( ii) Draw a normal curve above it. The area under the curve gives PERCENTAGES.

    (iii) Shade in the area that you are looking for (or know). b. Solve for the area you want in terms of "left hand areas"

  4. Examples
    a. What percent of a standard normal curve is less than 1.2?
    (first step ... draw a picture)
    Table gives .8849, or 88.5%

    b. What percent of a standard normal curve is greater than 1.2?
    (first step ... draw a picture)
    100% - 88.49%, or 11.51%.

    c. What percent of a standard normal curve lies between -0.83 and 1.25?
    (first step ... draw a picture)
    89.44% - 20.33%, or 69.11%

    d. What z-score has 3.84% area to the left of that score?
    (first step ... draw a picture)
    z = -1.77

    e. What z-score has 29.12% area to the right of that score?
    (first step ... draw a picture)
    29.12% to the right means 100%-29.12%=70.88% to the left.
    Thus z = 0.55 <--- note: this is NOT an area but a SCORE

C. Standard Units

  1. Definition

    A score z is in STANDARD UNITS if tells how many SD's the original score is above or below the average. For example, if z=1.3, then the original score was 1.3 SD's above average; if z = -0.57, then the original score was 0.57 SD's BELOW average.

  2. Examples of standard units

    Women's heights in the United States are normally distributed with a mean of 65.5 inches (about 5 feet 5 inches) and a standard deviation of 2.5 inches.

    We select a woman and measure her height. She is 5'9 inches tall (69 inches). Her height, in standard units is:

    z = 69 - 65.5 / 2.5 = 1.40

    She is 1.40 standard units ABOVE the average height. This makes her taller than 92% of all women in the US.

    At the other extreme, a different woman is 5 feet tall (60 inches). Her height in standard units is:

    z = 60 - 65.5 / 2.5 = -2.20

    She is 2.20 standard units BELOW the average height. In other words, only 1.4% of all women are shorter than she is.

    Suppose I tell you that a woman's standardized height is -1.56. How tall is she?

    actual height = (-1.56*2.5) + 65.5 = 61.6

    She is 61.6 inches tall or about 5'2".

D. Recapping the Properties of the Normal

1. The highest point of the curve is the mean.
2. The median is the same as the mean.
3. The normal curve is symmetric about its middle.
4. As you move away from the middle in either direction, the height decreases in such a way that the curve has a bell-shaped appearance
5. The total area under the curve is 100%
6. 68% of the area will be within +1 and -1 SD from the mean. 95% will fall within +2 and -2 SD from the mean. 99.7% will fall within +3 or -3 SD.
7. The curve never actually crosses the horizontal axis. It gets close (and the area under it gets very small) but it never crosses.

E. Assessing Normality (optional)

1. Common sense: if the normal curve implies nonsense results (for example, that people have negative incomes, or that some women have a negative number of children), the normal curve doesn't apply.

Example: in 1980, the average number of children born per woman was 1.95, with an SD of 1.91. Does the normal curve apply? Try calculating how many children a woman would have if she is 2 standard deviations BELOW the mean.

(No; the data have a long right hand tail this distribution is skewed to the right. A woman who is 2 SD below the mean has -1.87 children..)

2. Do a histogram: if the data look like a normal curve, the normal curve applies; otherwise, it does not.

F. More on Using the Normal Curve

1. IF the data are normally distributed, then raw scores can be converted into standard units to find percentages; also, percentages can be converted into standard units and then converted into raw scores.

2. If the data are NOT normally distributed, then using the normal curve will give the wrong answer!

3. Examples

a. SAT math scores are normally distributed with a mean of 500 and an SD of 100. What percentile rank is a math score of 650?

(first step: draw a picture ... want an area)
(next step: convert to standard units; z = 1.50)
(final step: look up the area; rank is 93.3 percentile)

b. What SAT math score defines the top 10%?
(first step: draw a picture ... want an original score)
(next step: using the area, find the standardized score: z=1.28)
(final step: convert back from standard units; score is 628)

c. If the mean number of children born to American women is 1.95

4. Other examples

United Parcel Express, a package delivery service claims that its average delivery time is 34 hours. It's standard deviation is 1.5 hours. What is your chance of getting a package delivered within 36 hours?

To answer this, you might draw a curve then fill in the information, then calculate a Z score.

z = (36 - 34) / 1.5 = 1.33

Your chance of getting a package delivered in less than 36 hours is .9082 from the table or about 91%

Suppose I tell you that a standardized delivery time is Z = -1.56. How do you translate that into actual hours?

(1.56 x 1.5) + 34 = 31.66


button Return to the Fall 1997 Statistics 50 Home Page

Last Update: 2 October 1997 by VXL