Comparing Two Groups
Part I: Before Class
Objective: This lab will give you practice examining differences between two groups of data. We will also analyze the significance of the test results in the context of the studies presented in this assignment.
Before starting this lab, you should
Part II: The Data for the Take Home Section
Risk Perception
Psychologists are interested in finding ways of measuring perception of risk since it is an important component in any decision-making process. The data provided here come from 611 participants who were asked to provide a judgment of risk on several activities using a scale of 0-100 (100 being high risk). In this lab assignment, we compare how two different groups, namely, men and women, perceive risks based on five different activities.
Source: Lisa K Carlstrom, J. Arthur Woodward, and Christina G.S. Palmer. "Evaluating the Simplified Conjoint Expected Risk Model: Comparing the Use of Objective and Subjective Information," Risk Analysis 3(2000) v. 20: 385-92.
http://www.stat.ucla.edu/~rgould/m12s01/risksmall.dta
Once you have downloaded it, you should be able to see the seven variables in that dataset.
SUBID
- ID of subject (or participant)
GENDER
- 0 = Female, 1 = Male
APPL
- How risky it is to use household appliances, on a scale of 0-100 (0 = low, 100 = high)
NUC
- How risky it is to live near a nuclear power station, on a scale of 0-100 (0 = low, 100 = high)
POOL
- How risky it is to swim in indoor public pool each weekend, on a scale of 0-100
(0 = low, 100 = high)
PLANE
- How risky it is to fly on a commercial airplane every month, on a scale of 0-100
(0 = low, 100 = high)
XRAY
- How risky it is to receive diagnostic x-rays every six months, on a scale of 0-100 (0 = low, 100 = high)
http://www.stat.ucla.edu/projects/datasets/risk_perception.html
Take-home questions:
Part III: Lab Exercises
These are the Stata commands you may find useful to do this assignment.
insheet using filename - reads unformatted ASCII (text) data. It is intended for reading files created by spreadsheet. This command is especially useful when you dont know the variables (their amount and names) of a dataset.
sort x - arranges the observations of a variable in ascending order of values. There is no limit to the number of the variables in the data. Missing values are interpreted as being larger than any other number and thus are placed last.
display tprob(df,t) - you need to specify the degrees of freedom and the
t-statistics. This function returns the p-value of the sample(s) test by performing a 2-tailed
t-distribution.
display tprob(df,t)/2 - returns the p-value of the sample(s) for an upper-tail
t-distribution.
display 1 - tprob(df,t)/2- returns the p-value of the sample(s) for a lower-tail
t-distribution.
ttest - performs one- and two-sample test on the equality of means. There are several
variations of this command, the most common ones are:
ttest x = # - tests the hypothesis that x has a mean of #.
ttest x, by(z) - performs a test of x controlling for z.
ttest x, by(z) unequal - same above. Except that the word "unequal" indicates that the
two-sample data are not to be assumed to have equal variances.
use filename - loads a Stata-format file dataset from the Web. If filename is specified without the Stata extension, ".dta" is assumed.
Activity 1: Hypothesis testing
We will be using the dataset collected by Prof. Gould in his Statistics 12 class, in Spring 2000. First, you should download the dataset from the following address:
http://www.stat.ucla.edu/~rgould/datasets/m12s00.dta
ttest weight = 150
Questions:
2) Next, we want to examine whether women have lower levels of math anxiety than men. If we assume the data represent independent random samples of men and women, we can apply a two-sample t-test. Also, notice that we need to perform a one-tailed t-test. First, we have to sort the variables by gender, as follows:
sort gender
Now, run the t-test.
ttest mathanxi, by(gender)
Questions:
Now, lets run another two-sample t-test. This time, lets assume that the two populations variances are unequal. Thus, type:
ttest mathanx, by (gender) unequal
Note that the degrees of freedom are different from the ones in the previous test
Activity2: Choosing "degrees of freedom" for t-tests in Stata
Stata uses a specific method (the Satterthwaite's equation) to compute the degrees of freedom. This might not be the same method you learned in this class. Notice that the way you choose to calculate the degrees of freedom of the test affects its p-values. The good news is that you can use Stata to find the p-values for a t-statistic with any degrees of freedom. The command you will need to use will depend on the hypothesis you are testing. For example:
i) For Ho: diff. = 0 and Ha: diff. ? 0. Type:
display tprob(df,t)
ii) For Ho: diff. 3 0 and Ha: diff. < 0. Type:
display 1 - tprob(df,t) / 2
iii) For Ho: diff. 2 0 and Ha: diff 0. Type:
display tprob(df,t) / 2
Where "diff." is the difference between the two sample means, "df" is the number of the degrees of freedom, and "t" is the value for the t-statistics.
Questions: