## STAT 13

(Sec. 1a-1c)

Introduction to Statistical Methods for the Life and Health Sciences

## Instructor: Ivo Dinov, Asst. Prof.

Departments of Statistics & Neurology
 http://www.stat.ucla.edu/~dinov/

# Lab 7

Also see this for an additional tutorial.

# Thursday, Nov. 15, 2001

Commands

set obs n

gen x=m+s*invnorm(uniform())

edit

graph varname, histogram normal

summarize x y, detail

drop x

Objective:  This lab will give you practice with the normal distribution.

Activity 1.

We discussed in class that the data that you collect in a study arise from some kind of underlying process.  For example, the underlying process that gives rise to an individual’s height involves a summing of the effects of multiple genes and environmental factors.  In fact, as you will shortly learn, any variable that arises from a “summing of things,” will look normally distributed when plotted.

You can collect measurements on the heights of a sample of people and plot a histogram of your data.  Will your histogram of raw data look like a normal curve?  In this activity, let’s assess how sample size affects the shape of the histogram of raw data.

Instead of actually going out and measuring the heights of a sample of people, you will generate a sample of height measurements assuming that height follows a normal curve with m = 64.5 and s = 2.5.

1.  Type

set obs 15                                                        [set the number of observations to 15]

gen height=64.5+2.5*invnorm(uniform())            [generate numbers from a normal distribution with   mean=64.5 and standard deviation = 2.5 and put them in a variable named height]

To look at the values of the 15 observations:

Type

Edit                                                                  [opens the file that contains your data]

a)  Make a histogram of the 15 observations.  Do the data roughly follow a normal curve?

b)  Now superimpose the idealized normal curve on the histogram of your data.

Type

graph height, histogram normal

OR try the chist command

c)  Do the data appear to follow a normal curve?

d)  Compute the sample average and sample standard deviation of the 15 values you obtained and record them.  Comment on how similar or dissimilar these values are to the population mean and population standard deviation.

2.  Repeat the process for:

N = 50

N = 100

N = 10,000

To repeat the process, you have a choice.  Keep the variable named “height” and replace the first set of 15 observations with a new set of n observations (you will delete the first set of observations with this option)  OR  create an entirely new variable with a new name to hold the new set of n observations.

To just keep using the variable named “height”:                       To create a new variable:

Type                                                                                        Type

drop height                                                              set obs n

set obs n                                                                      gen newvarname=64.5+2.5*invnorm(uniform())

gen height=64.5+2.5*invnorm(uniform())             [you must specify the varname that you want]

3.  Draw a conclusion about the effect of sample size on the shape of the histogram of data.  Did you obtain your samples through a random sampling procedure?  Why didn’t your histograms match the normal curve perfectly?

Activity 2.

Histograms of data that look somewhat normally distributed can be idealized by a normal curve with mean m and standard deviation s.  Often, however, you will not know the true values of m and s.  Because the normal curve is an idealization of your data, the mean (m) and sd (s) of the normal curve can be estimated by your sample mean x and sd s.  Of course, you hope that the sample mean x and sd s are good estimates of the mean (m) and sd (s) of the population from which your sample data came.  Let’s find out.

Look at the sample mean and sd of the sample with n = 15.  Are x and s close in value to the m and s of the distribution from which the observations were drawn?  Maybe your first set of results were a fluke.  Let’s see what happens if we repeat the process of generating 15 observations from that normal distribution.

To accommodate the repetition, you have a choice.  Keep the variable named “height” and replace the first set of 15 observations with a new set of 15 observations (you will delete the first set of observations with this option)  OR  create an entirely new variable with a new name to hold the new set if 15 observations.

To just keep using the variable named “height”:                       To create a new variable:

Type                                                                                        Type

drop height                                                              set obs 15

set obs 15                                                                    gen newvarname=64.5+2.5*invnorm(uniform())

gen height=64.5+2.5*invnorm(uniform())             [you must specify the varname that you want]

Repeat 30 times the process of generating 15 observations from the N(64.5, 2.5) distribution and recording x.

4)  Make a histogram of the 30 values of x.  Briefly describe the shape of this distribution.  Is it roughly normal?  What is the sample average is this new distribution?

5)  Is there evidence that the sample average is a good estimate of the idealized population mean?  Explain why or why not.

Activity 3.

6.  Draw a sketch of X ~ N(64.5, 2.5).  Identify the area under the curve that corresponds to the proportion of individuals in the population with heights less than 61.9 inches.  Now determine the probability that a randomly selected individual will have a height less than 61.9 inches?

7.  What is the probability that a randomly selected individual will have a height greater than 65 inches?  Be

sure to draw a sketch and to identify the area that corresponds to this probability.

8.  What is the probability that a randomly selected individual will have a height that is between 63 and 65 inches?  Be sure to draw a sketch and to identify the area that corresponds to this probability.

\Ivo D. Dinov, Ph.D., Departments of Statistics and Neurology, UCLA School of Medicine/