Syllabus. Winter 2009.
Statistics 201b: Regression Analysis.
Prof. Rick Paik Schoenberg.
Lectures: MW 3:00-4:20pm, MS 5128.
Office hours: Mondays, 1:30-2:30pm, MS 8965.
email: frederic@stat.ucla.edu
Course webpage:
http://www.stat.ucla.edu/~frederic/201b/W09
Required Texts:
"Linear Models with R" by J. Faraway.
"Extending the Linear Model with R" by J. Faraway.
Optional readings:
"Generalized Linear Models" by McCullagh and Nelder.
"Elements of Statistical Learning", by T. Hastie, R. Tishbirani,
J. Friedman (Springer, 2001).
Description: Applied regression analysis, with emphasis on general
linear model (e.g. multiple regression) and generalized linear model
(e.g. logistic regression). Special attention to modern extensions
of regression, including regression diagnostics,
graphical procedures, point process regression.
Grading:
Homework (20%).
Midterm. (25%).
Final Project. (25%)
Final exam. (30%).
The midterm is Monday, Feb 23, 3:00-4:20pm (in class).
The final exam is Tuesday, March 17, 8-11am (in class: MS 5128).
Homeworks will be announced in class.
Homeworks must be handed in at the beginning of class, or may be slipped under
my office door (MS 8965) any time before class. Each homework assignment
is graded out of 10 points.
Homeworks handed in between 5 and 10 minutes after class has begun
will be given a one-point deduction.
Those handed in between 10 and 20 minutes late will be given a two-point deduction.
Homeworks handed in between 20 minutes late and the end of class
will be given a three-point deduction.
Homeworks submitted after lecture is over will not be accepted. Homeworks
must be submitted in hard copy, rather than by email or fax.
Rough Outline:
Week 1: Introductory material, regression overview, function approximation, and
bias/variance tradeoffs.
Week 2: Linear regression, Gauss-Markov theorem, multiple regression.
Week 3: Subset selection, shrinkage.
Week 4: Regression of an indicator matrix, discriminant analysis.
Week 5: Logistic regression.
Week 6: Poisson regression, Kernel regression.
Week 7: Review and midterm.
Week 8: Generalized Additive Models.
Week 9: Point process regression.
Week 10: Projects, review.
hw1. Faraway "Linear Models with R" p23: 1,4,5. p50: 1,2. due Wed 1/14.
hw2: Faraway "Linear Models with R" p74 #1, p87 #4, due Mon 1/26.
hw3: Faraway "Linear Models with R" p107 #5, p 146 #4, due Mon Feb 9.
hw4: Faraway "Extending the Linear Model with R"
p52 #3(b-f), p66 #5, due Wed Feb 18.
Midterm Monday 2/23.
For the final projects, the written portion is due in class on March 11.
The oral presentations will be on March 9 and March 11. For these
projects, find a dataset and analyze it using the methods we have
discussed in class. Your response variable, Y, should be non-negative-integer-valued.
You should also have at least 2 explanatory variables, and at least 20
observations (n). Your topic may be non-academic, and should be based on
something you are genuinely interested in, such as a hobby or
extra-curricular activity of yours. Examples of response variables might
be:
-- the number of points scored by LeBron James, per game.
-- the number of votes, per political candidate.
-- the number of Americans who saw a movie, per movie.
-- the number of Facebook friends, per person.
You may find your dataset on the web or may collect it yourself, but if
you are collecting it yourself, make absolutely sure that there is
absolutely no risk
at all of any injury (either physical or emotional) to you or anyone
else. Analyze your data using OLS, Poisson regression, kernel regression,
and at least one other method (binomial regression, least
trimmed squares, GLS, WLS, m-estimation, ridge regression, PC
regression, or partial least squares).
For each of these methods, show your residuals and analyze goodness-of-fit
as appropriate.
Your written report should be 3-5 pages of written text, followed by as
many figures and tables as you'd like at the end, in an appendix. Do not
include the figures in your text -- instead, just have all the figures at
the end. There is no need to explain in your text what the different
methods you use are. Instead, focus on interpreting your results. Your
report should contain an Introduction (1/2 to 1 page) explaining why your
data are interesting or important, a Results section (2-3 pages), in which you comment
on each of your figures, tables, and results, and a a Conclusion (1/2 to 1
page), in which you summarize your main findings and describe problems
with your dataset and analysis. In the Results section, be sure to explain
the main interesting features of your figures, and in your Conclusion, you
are encouraged to speculate on how any problems with your data collection
or analysis may have influenced your results.
In your oral report, you will have 5 minutes each to present your main
results. Just show your 4 or 5 best figures, 1 per page. Do not show all
of your figures from your written report -- just the few best ones only.
Do not look at me when giving your presentation. Have another student warn
you when you have only 1 minute left. I will cut you off after 5 minutes,
but then you must spend 1 minute answering questions.
Note: email your figures to me, in pdf format, by Sunday, March 8, 10pm,
if you would like to present them on the projector for your talk.
The order of the talks will be:
Monday, March 9:
Morrow, Bray, Leung, Kim, Chen, Guo, Zhao, Wong, Nie, Blatnik,
Mayernik.
Wednesday, March 11:
Levinson, Shargel, Goff, Crutcher, Gao, Gharibans, Darren Liu, Jaynes,
Langholz, Fei Liu, Yajima.
You may switch with your classmates if you'd like, but let me know.