Syllabus. Winter 2009. Statistics 201b: Regression Analysis.
Prof. Rick Paik Schoenberg.
Lectures: MW 3:00-4:20pm, MS 5128.

Office hours: Mondays, 1:30-2:30pm, MS 8965.

email: frederic@stat.ucla.edu

Course webpage: http://www.stat.ucla.edu/~frederic/201b/W09

Required Texts:
"Linear Models with R" by J. Faraway.
"Extending the Linear Model with R" by J. Faraway.

Optional readings:
"Generalized Linear Models" by McCullagh and Nelder.
"Elements of Statistical Learning", by T. Hastie, R. Tishbirani, J. Friedman (Springer, 2001).

Description: Applied regression analysis, with emphasis on general linear model (e.g. multiple regression) and generalized linear model (e.g. logistic regression). Special attention to modern extensions of regression, including regression diagnostics, graphical procedures, point process regression.

Grading:
Homework (20%).
Midterm. (25%).
Final Project. (25%)
Final exam. (30%).
The midterm is Monday, Feb 23, 3:00-4:20pm (in class).
The final exam is Tuesday, March 17, 8-11am (in class: MS 5128).
Homeworks will be announced in class.

Homeworks must be handed in at the beginning of class, or may be slipped under my office door (MS 8965) any time before class. Each homework assignment is graded out of 10 points. Homeworks handed in between 5 and 10 minutes after class has begun will be given a one-point deduction. Those handed in between 10 and 20 minutes late will be given a two-point deduction. Homeworks handed in between 20 minutes late and the end of class will be given a three-point deduction. Homeworks submitted after lecture is over will not be accepted. Homeworks must be submitted in hard copy, rather than by email or fax.

Rough Outline:
Week 1: Introductory material, regression overview, function approximation, and bias/variance tradeoffs.
Week 2: Linear regression, Gauss-Markov theorem, multiple regression.
Week 3: Subset selection, shrinkage.
Week 4: Regression of an indicator matrix, discriminant analysis.
Week 5: Logistic regression.
Week 6: Poisson regression, Kernel regression.
Week 7: Review and midterm.
Week 8: Generalized Additive Models.
Week 9: Point process regression.
Week 10: Projects, review.

hw1. Faraway "Linear Models with R" p23: 1,4,5. p50: 1,2. due Wed 1/14.

hw2: Faraway "Linear Models with R" p74 #1, p87 #4, due Mon 1/26.

hw3: Faraway "Linear Models with R" p107 #5, p 146 #4, due Mon Feb 9.

hw4: Faraway "Extending the Linear Model with R" p52 #3(b-f), p66 #5, due Wed Feb 18.

Midterm Monday 2/23.

For the final projects, the written portion is due in class on March 11. The oral presentations will be on March 9 and March 11. For these projects, find a dataset and analyze it using the methods we have discussed in class. Your response variable, Y, should be non-negative-integer-valued. You should also have at least 2 explanatory variables, and at least 20 observations (n). Your topic may be non-academic, and should be based on something you are genuinely interested in, such as a hobby or extra-curricular activity of yours. Examples of response variables might be:
-- the number of points scored by LeBron James, per game.
-- the number of votes, per political candidate.
-- the number of Americans who saw a movie, per movie.
-- the number of Facebook friends, per person.
You may find your dataset on the web or may collect it yourself, but if you are collecting it yourself, make absolutely sure that there is absolutely no risk at all of any injury (either physical or emotional) to you or anyone else. Analyze your data using OLS, Poisson regression, kernel regression, and at least one other method (binomial regression, least trimmed squares, GLS, WLS, m-estimation, ridge regression, PC regression, or partial least squares). For each of these methods, show your residuals and analyze goodness-of-fit as appropriate. Your written report should be 3-5 pages of written text, followed by as many figures and tables as you'd like at the end, in an appendix. Do not include the figures in your text -- instead, just have all the figures at the end. There is no need to explain in your text what the different methods you use are. Instead, focus on interpreting your results. Your report should contain an Introduction (1/2 to 1 page) explaining why your data are interesting or important, a Results section (2-3 pages), in which you comment on each of your figures, tables, and results, and a a Conclusion (1/2 to 1 page), in which you summarize your main findings and describe problems with your dataset and analysis. In the Results section, be sure to explain the main interesting features of your figures, and in your Conclusion, you are encouraged to speculate on how any problems with your data collection or analysis may have influenced your results.
In your oral report, you will have 5 minutes each to present your main results. Just show your 4 or 5 best figures, 1 per page. Do not show all of your figures from your written report -- just the few best ones only. Do not look at me when giving your presentation. Have another student warn you when you have only 1 minute left. I will cut you off after 5 minutes, but then you must spend 1 minute answering questions.

Note: email your figures to me, in pdf format, by Sunday, March 8, 10pm, if you would like to present them on the projector for your talk.

The order of the talks will be:
Monday, March 9: Morrow, Bray, Leung, Kim, Chen, Guo, Zhao, Wong, Nie, Blatnik, Mayernik.
Wednesday, March 11: Levinson, Shargel, Goff, Crutcher, Gao, Gharibans, Darren Liu, Jaynes, Langholz, Fei Liu, Yajima.

You may switch with your classmates if you'd like, but let me know.