Data analysis work sheet

This week we'll examine the data set called "twins.dat". This data set was collected to determine the effect of education on earnings.
You can download the data and read about it from the course web page. You can also download this page from

http://www.stat.ucla.edu/~rgould/x401w00/worksheet1.txt

You can then edit it directly on the computer if you like.

Some of these questions are relatively short and precise. Others are quite long and involved. Do as much as you can. Interrupt the class with questions! The purpose is to spark discussion and learn new things.

1. Read the "about twins" file. What "research" questions interest you about this data set? Which variables do you think will be applicable to these questions?

2. Data Structure:

    a) How many variables? Are some of the variables related to each other in an obvious way?

    b) What type? (Continuous, discrete, ordinal, categorical?)

    c) Which are the response variable(s), which the explanatory?

3. Data Quality

    a ) Do you think the way the data were collected will allow you to answer the questions you're interested in?

    b) Are there errors or outliers in the variables?

    c) Are there missing observations? Might these interfere with your analysis and conclusions?

4. Summary Statistics

    a) Summarize the data both numerically and graphically in as many ways as you see fit. You might want to look at more graphs than you are willing to share. There are a great many possible ways to go about this with so many variables. I recommend trying to understand variables one-by-one first.

    b) What do you suspect the answers to your questions are? Can you tell?

5. Modifying the Data

    a) Are there any variables that should be transformed?

    b) Are there any "new" variables you wish to create?

6. Model

a) Propose a model in words to describe the data. (You might actually have several different models for different groupings of variables.) Your model should specify the nature of the relationship between the response and explanatory variable(s). You should also say what the "errors" are like: Normal? Independent?

7. Fit the Model

(This might just mean producing a confidence interval or performing a t-test.)

8. Evaluate the Fit

9. Interpret the model.