STAT 13
Introduction to Statistical Methods for the Life and Health Sciences
Laboratory 3 - Data Visualization using STATA
Please remember to use your Lab ID# as your name and your nine digit UCLA
ID# as your password when you log-in. In this lab we will plot histograms
with various bin widths, add appropriate scales and titles, and look at other
plots to help us better understand the demographic data collected in a recent
census of Los Angeles County.
. use http://www.stat.ucla.edu/labs/datasets/smcensus.dta
Visual Display of Data: Suppose we are curious abut the ages of those in our sample.
. summarize age
This tells us that the youngest respondent was 16 while the oldest was 87,
with a mean of 43 years, give or take 18 years.The command
. summarize age, detail
gives us additional information about the distribution of ages.
Question 1: What is the median age of the respondents? What
is the age span for the middle fifty percent of those in our sample? If we
plotted the data by hand, we might use a horizontal scale from 0 to 100.
In Stata, we type:
. graph age, histogram xscale(0, 100)
Now try the command:
. graph age, histogram xscale(0, 100) norm
Question 2: What does Stata do when we add norm to the command
Adding Labels and Titles We can get a better sense of the age plot, if we
add more bins, but not too many bins or the plot will look like a city skyline.
. graph age, histogram xscale(0, 100) bin(5) norm
Question 3: Now change the number of bins from 5 to 10 and
20. Which graph do you find most pleasing and informational? Labeling the
horizontal axis may help us locate the center of the distribution. We can
also include additional information by adding two titles in four different
locations: top, bottom, right or left.
. graph age, histogram xscale(0, 100) bin(10) xlabel(0,
10, 20, 30, 40, 50, 60, 70, 80, 90) t1title(Ages of the Respondents in our
Sample)
If we are interested in comparing the ages of men and women, we want to look
at two histograms. We need to sort the data by gender before we plot.
. sort gender
. graph age, historgram by(gender)
Question 4: Reproduce the graph comparing
ages of men and women with appropriate labels, number of bins, and a title.
Describe what you see. Decoding Data Now we will consider the rent paid by
those in our sample. Type:
. summarize rent, detail
Before we plot the data, we can remove the zeros for those who paid no rent by storing the zeros as missing values. Type:
. mvdecode rent, mv(0)
Question 5: How many people paid no rent?
. summarize rent, detail
Question 6: How do the median and mean rents compare? What
does this tell us about our data? Next we plot our rent data. In this command
we simplified the xlabel command.
. graph rent, histogram xscale (0, 700) xlabel(0 100 to 800) t1title(Monthly Rents)
Question 7: Reproduce this graph again, changing the number
of bins to ten. Does this change the distribution? Now we will consider several
other questions about the rent data.
Question 8: Do women and men in our sample pay comparable rents?
What command is needed to find out? What are the findings? We may wonder
if there is a relationship between age and rent? To display paired data,
we type:
. graph rent age
Question 9: Do people pay more rent as they get older?
Question 10: Is there a relationship between the income someone
earns and the rent that they pay? What command is needed to find out? What
are the
findings?
Assignment
In this lab, we looked at age, rent, and gender. Select different variables
such as, marital status, race, or hours worked. Repeat the process
of summarizing and plotting the data. Select a carefully prepared plot with
appropriate titles and scaling that tells you something of interest about
the
people in our sample. Include a brief written intepretation of your findings.
Last modified on
by
.
Ivo D. Dinov, Ph.D., Departments of Statistics and Neurology,
UCLA School of Medicine