Stats 120
Winter 2004
Instructor: Robert Gould
email: rgould@stat.ucla.edu
Phone: 310-206-3381 (x3381 from on-campus)
Office: MS 8945
Office Hours: Monday 3-4 and by appointment
This course will focus on using linear regression. This is an
applications-oriented course, and we will not be concerned with
exploring
the theory of the Linear Model. However, we are interested in
examining
how well the linear model fits real-world processes, and our interest
in
examining differences between reality and the mathematical model will
lead
to discussions about the mathematical theory. But for the most
part
our concern will be in fitting models to real data and learning
techniques
and strategies for building linear models.
This is not a course in mathematical statistics, and so there will be
few proofs, and you will not be expected to prove theorems
mathematically. However, you will need to have enough mathematics
under your belt to be comfortable working with equations and abstract
notation. In particular we will do a fair amount of work with
matrix notation.
Required Text: Data Analysis and Graphics Using R, John
Maindonald and John Braun.
Recommended Text: Applied Regression Including Computing and Graphics,
Dennis Cook and Sanford Weisberg.
Required Software: The software we will use is called R. You can download it here.
I chose the textbook primarily because it provides a good introduction
to R, including instructions on where and how to get your own
copy. R runs on the Stats Department labs and best of all is
free, so that you can install it on your home computer. Read
Chapter 1 of Maindonald text for details. Note: R is a free-ware
version of the rather expensive Splus. Most things you read that
are about Splus will also be true for
R.
FINAL EXAM
DIRECTORY OF STUDENT DATA SETS
ALCOHOL
DATA
Midterm Solutions
Homework
Data Sets
Outline
Handouts
Datasets Project
Along with your first midterm, you are being asked to turn in three
datasets. The "rules" are posted here.
Many
of you have asked for further clarification, and so here is some
clarification (I hope).
For your final, you will be asked to analyze someone else's data.
You will be assigned this data. So the data you submit
should
a) be analyzable -- this means enough points that we can make sense of
it
b) should have a "story": what can we learn? What patterns should
we look for?
c) should have a context: how did you collect these observations.
How can we know they were independent? Random samples? Think of
the assumptions needed to do a statistical analysis, and try to tell us
enough about the data so that we can answer these questions.
d) After I get them, I will return them to you with comments for
changes, and we will need final versions two weeks later.