Statistics 202a. Statistics programming. Prof. Rick Paik Schoenberg.

F20

Lectures: Tue Thu 2:00-3:15pm via zoom. Enter by CCLE. Login, click on "week 1", and click on "ALL 202A LECTURES".
The lectures will be recorded, and the links to the recordings will be put on the main course website, http://www.stat.ucla.edu/~frederic/202a/F20 .



Texts:
1. R Programming for Data Science (2020) by Roger Peng. https://bookdown.org/rdpeng/rprogdatascience .
2. Introduction to Data Science (2020) by Rafael Irizarry. https://rafalab.github.io/dsbook .
3. The C programming language, 2nd edition, by BW Kernighan and DM Ritchie (1988).

The first two books are free. The 3rd book is more for reference and is completely optional.

Office hours: Thu 130-2pm via zoom. Enter by CCLE as above.

email: frederic@stat.ucla.edu

Course Website: http://www.stat.ucla.edu/~frederic/202a/F20 .

Statistics 202A will explore computational statistics and will focus especially on computing in R and C.

The course is designed for graduate students with solid mathematical and statistical backgrounds.

A preliminary outline of the class is given below, though the order may change.


1. Managing input and output in R, tidyverse, programming basics.
Peng ch4-6, Peng ch8, Irizarry ch3-5.
2. Subsetting R objects, managing dataframes, dplyr, join, bind, data visualization.
Peng ch9, 12, Irizarry ch6-10, 22.
3. Functions, regular expressions, debugging, profiling, web scraping, stringr, text mining.
Peng ch14,17,18,19, Irizarry ch23, 24, 26.
4. Simulation, parallel computation.
Peng ch20,21.
5. Machine learning, smoothing.
Irizarry ch27,28.
6. Cross validation, caret, classification, regression trees, random forests.
Irizarry ch29-31.
7. Large datasets.
Irizarry ch33.
8. Compiling C, functions on matrices and dataframes, kernel density estimation in R, C basics.
9. Functions and loops in C, using C in R.
10. Nonparametric regression in R, generalized additive models in R.
11. Variables, vectors, matrices, arrays, structures, strings, and pointers in C.
12. Managing input and output in C, calling C functions from C, running C from terminal.
13. Optimization in R.
14. Calling R from C.
15. MLE in general, and for Hawkes point processes. Newton-Raphson optimization for the MLE using optim().
16. Building R packages.

Grading:
Homeworks (80%), written project (15%), oral presentation/participation (5%).

Homeworks will be assigned on the main course website, Course Website: http://www.stat.ucla.edu/~frederic/202a/F20 . Attendance in class is generally not mandatory and not counted as part of the grade. However, if you cannot attend, please contact another student to find out what you missed rather than asking me to fill you in. Late homeworks will not be accepted at all. There will be no extensions for the project or presentation. Students who are unable to make these dates or otherwise fulfill the course requirements must consult with the instructor in advance, if possible. Students with learning disabilities must consult with the instructor by the 2nd week of class if special arrangements are required.

Written Project: due Mon Dec14 11:59pm, by email to frederic@stat.ucla.edu.
Oral presentations: Dec 3, 8, 10.

No final exam.

Description of Written Project:
(to come)

Oral presentations of project results will take place during lecture Dec 3, 8, 10. These will involve simply presenting a clear, concise, and very brief summary of some data using some of the methods we discuss in class. At least one method should be perfored in C. More description will be given in later lectures.