STAT 13

(Sec. 1a-1c)

Introduction to Statistical Methods for the Life and Health Sciences

Instructor: Ivo Dinov, Asst. Prof.

Departments of Statistics & Neurology
    http://www.stat.ucla.edu/~dinov/


Lab 5

Also see this for an additional tutorial.

Objectives:

  1. explore the relationship between two variables
  2. practice creating and interpreting 2-way tables
  3. learn some new STATA commands

 

 

For today’s lab, we will be using the dataset at the following location:

http://www.stat.ucla.edu/~rgould/datasets/m12s00.dta

 

This data set was collected from Prof Gould’s Stats M12 course in Spring 2000.  Students were asked seven questions:

1)      gender (m/f)

2)      height (inches)

3)      weight (pounds)

4)      Do you smoke? (0 = no; 1 = yes)

5)      Who do you want for President?  Bush Gore, other

6)      Rate your math ability:  (1,2,3,4,5) 1 is much below average, 3 is average, 5 is much above average

7)      Rate your math anxiety: (1,2,3,4,5) 1 is much below average, 3 is average, 5 is much above average

 

This data set provokes some interesting questions:

 

Are males more likely to smoke than females?

Does Gore have stronger support among men than women?

Do people who smoke weigh less than those who do not?

Are smokers less anxious about math?

Is there a relationship between gender and math anxiety?

Is there a relationship between gender and math ability?

And I’m sure there are other questions that you could pose from this data set.

 

Of course, the experimental design (which was very haphazard) does not really lend itself to answering these questions with much confidence.  But these questions should help motivate you to look at this data to learn how to use STATA to answer them when you have a better data set.

 

Commands for today and the future:

Tabulate varname1 varname2, cell column row            [gives you cell percentages, column percentages, row percentages]

 

Graph varname1, box by(varname2)            [plots a boxplot of the values of one variable by the categories of another variable]

 

Sort varname1

 

Replace varname1=”” if varname1==”.”         [to change the way that missing data is represented in the file]

 

 


Let’s try to answer the first question.  Are males more likely to smoke than females?  What we are really asking is:  is there a relationship between gender and smoking?  Because these are 2 qualitative (categorical) variables, we will construct a 2-way table to look at their relationship. 

 

Type

Tabulate gender smoke, cell column row

 

1.  Print out the 2-way table.  Label the cell proportions, the column proportions, and row proportions on the table.

 

2.  Look at the table and answer the following:

a.  The percentage of females in the sample is _____________.

b.  The percentage of male smokers in the sample is ___________.

c.  Of females, the percentage who smoke is _____________.

 

3.  Are males more likely than females to smoke?  Explain how you came up with your answer, referring specifically to the percentages (cell, row, or column) that you are basing your answer on.

 

 

4.  Now answer the following questions and explain how you arrived at your answer in each case, referring specifically to the percentages (cell, row, or column) that you are basing your answer on. 

 

a.  Does Gore have stronger support among men than women?

Note that you will discover some missing data in this example.  How does that affect the frequencies in your two-way table?  To suppress the missing data do the following:

Type

Replace presiden=”” if presiden==”.”

 

            If you run into missing data in the future, you should think about suppressing it in the analysis.

 

b.  Are smokers less anxious about math?

 

c.  Is there a relationship between math anxiety and gender?

 

d.  Is there a relationship between perceived math ability and gender?

 

 

Now let’s assess whether people who smoke weigh less than people who don’t smoke.  In this case, we have a quantitative variable (weight) and a qualitative variable (smoking status).  Is a two-way table the appropriate technique for assessing their relationship?  Try it and see what happens.  What do you think?

 

A better technique would be side-by-side boxplots.  Which variable will be on the x-axis, which variable will be on the y-axis?  In order to create these side-by-side boxplots, you will first have to sort by the variable on the x-axis.

 

Type

Sort smoke

Graph weight, box by(smoke)

 

5.  Interpret the boxplots in order to answer the question.

 


\Ivo D. Dinov, Ph.D., Departments of Statistics and Neurology, UCLA School of Medicine/