Introduction to Statistical Methods for the Life and Health Sciences
|
This lab involves random number generation. It is meant to give you a feel for sampling variability and assessment of normality.
Some of the Stata commands you have already used are:
summarize
summarize, detail
graph varname, bin(10)
and some new ones:
generate
(or gen)qnorm varname,ylabel xlabel
First, set the seed to your 9-digit SID#:
set seed your_id
Then, generate 1000 standard normal observations as follows (this may seem a bit mysterious, but I want you to remember that we can generate samples of any distribution simply having an analytical description of its (invertable) CDF and applying its inverse to a random sample from a uniform distributiopn):
set obs 1000
gen x = invnorm(uniform())
(a) Get summary statistics for the observations (using summarize or summarize,detail). What values do you expect for the mean and SD? For the median, Q1, Q3, and the IQR? (Remember, these observations are generated from a standard normal distribution.) Are the summary statistics close to the expected values?
(b) Make a histogram of the observations (use at least 10 bins) and superimpose a normal curve on the graph. (To superimpose a normal curve, use the norm option for graph: graph varname, bin(10) norm.) How well does the curve seem to approximate the histogram?
(c) Another way to visually compare observations to a theoretical distribution is with a quantile plot. A quantile plot compares quantiles (values below which a certain fraction of the data lie, e.g.percentiles) of a variable's distribution with those of a theoretical distribution. If the observed quantiles fit the theoretical quantiles, the points should fall on the line of equality. This type of plot is described in more detail in your text.
Make a quantile normal plot using the command
qnorm x, ylabel xlabel
How well does the normal appear to fit the observations?
(d) Which of the 2 plots do you prefer using as a visual method to assess how well the normal distribution fits the data?
(e) Now, assess the normality of a few of the variables from the dataset available with the command
use http://www.ats.ucla.edu/stat/stata/wc/reg/elemapi.dta