Introduction to Statistical Methods for the Life and Health Sciences
|
Commands
set obs n
gen x=m+s*invnorm(uniform())
edit
graph varname, histogram
normal
summarize x y, detail
drop x
Objective:
This lab will give you practice with the normal distribution.
Activity 1.
We discussed in class that the data that you collect in a
study arise from some kind of underlying process. For example, the underlying process that gives rise to an
individual’s height involves a summing of the effects of multiple genes and
environmental factors. In fact, as you
will shortly learn, any variable that arises from a “summing of things,” will
look normally distributed when plotted.
You can collect measurements on the heights of a sample of
people and plot a histogram of your data.
Will your histogram of raw data look like a normal curve? In this activity, let’s assess how sample
size affects the shape of the histogram of raw data.
Instead of actually going out and measuring the heights of a
sample of people, you will generate a sample of height measurements assuming
that height follows a normal curve with m = 64.5 and s = 2.5.
1. Type
set obs 15
[set
the number of observations to 15]
gen
height=64.5+2.5*invnorm(uniform()) [generate
numbers from a normal distribution with
mean=64.5 and standard deviation = 2.5 and put them in a variable named
height]
To look at the values of the 15 observations:
Type
Edit
[opens
the file that contains your data]
a) Make a histogram
of the 15 observations. Do the data
roughly follow a normal curve?
b) Now superimpose
the idealized normal curve on the histogram of your data.
Type
graph
height, histogram normal
OR try the chist command
c) Do the data
appear to follow a normal curve?
d) Compute the
sample average and sample standard deviation of the 15 values you obtained and
record them. Comment on how similar or
dissimilar these values are to the population mean and population standard
deviation.
2. Repeat the
process for:
N = 50
N = 100
N = 10,000
To repeat the process, you have a choice. Keep the variable named “height” and replace
the first set of 15 observations with a new set of n observations (you will
delete the first set of observations with this option) OR
create an entirely new variable with a new name to hold the new set of n
observations.
To just keep using the variable named “height”: To
create a new variable:
Type
Type
drop height
set
obs n
set obs n
gen newvarname=64.5+2.5*invnorm(uniform())
gen
height=64.5+2.5*invnorm(uniform()) [you must specify the varname that
you want]
3. Draw a conclusion
about the effect of sample size on the shape of the histogram of data. Did you obtain your samples through a random
sampling procedure? Why didn’t your
histograms match the normal curve perfectly?
Activity 2.
Histograms of data that look somewhat normally distributed
can be idealized by a normal curve with mean m and standard deviation s. Often, however,
you will not know the true values of m and s. Because the normal curve is an idealization
of your data, the mean (m) and sd (s) of the
normal curve can be estimated by your sample mean x and sd s. Of course, you hope that the sample mean x
and sd s are good estimates of the mean (m) and sd (s) of the
population from which your sample data came.
Let’s find out.
Look at the sample mean and sd of the sample with n =
15. Are x and s close in value to the m and s of the
distribution from which the observations were drawn? Maybe your first set of results were a fluke. Let’s see what happens if we repeat the
process of generating 15 observations from that normal distribution.
To accommodate the repetition, you have a choice. Keep the variable named “height” and replace
the first set of 15 observations with a new set of 15 observations (you will
delete the first set of observations with this option) OR
create an entirely new variable with a new name to hold the new set if
15 observations.
To just keep using the variable named “height”: To
create a new variable:
Type
Type
drop height
set
obs 15
set obs 15
gen newvarname=64.5+2.5*invnorm(uniform())
gen
height=64.5+2.5*invnorm(uniform()) [you must specify the varname that
you want]
Repeat 30 times the process of generating 15 observations
from the N(64.5, 2.5) distribution and recording x.
4) Make a histogram
of the 30 values of x. Briefly describe
the shape of this distribution. Is it
roughly normal? What is the sample
average is this new distribution?
5) Is there evidence
that the sample average is a good estimate of the idealized population
mean? Explain why or why not.
Activity 3.
Answer the following questions.
6. Draw a sketch of
X ~ N(64.5, 2.5). Identify the area
under the curve that corresponds to the proportion of individuals in the
population with heights less than 61.9 inches.
Now determine the probability that a randomly selected individual will
have a height less than 61.9 inches?
7. What is the
probability that a randomly selected individual will have a height greater than
65 inches? Be
sure to draw a sketch and to identify the area that corresponds
to this probability.
8. What is the
probability that a randomly selected individual will have a height that is
between 63 and 65 inches? Be sure to
draw a sketch and to identify the area that corresponds to this probability.