Statistics 202a. Statistics programming. Prof. Rick Schoenberg. 


F25 


Lectures: Mon and Wed 1230-145am in Royce 160. 


I am not maintaining the CCLE or Canvas site. The main course website is http://www.stat.ucla.edu/~frederic/202a/F25 , and course materials, including the syllabus and lecture notes, will be there. 




Texts:

1. Introduction to Data Science (2020) by Rafael Irizarry. https://rafalab.github.io/dsbook . 

2. Automate the Boring Stuff with Python, 2nd edition (2019) by Al Sweigart. https://automatetheboringstuff.com . 

3. R Programming for Data Science (2020) by Roger Peng. https://bookdown.org/rdpeng/rprogdatascience . 

4. The C programming language, 2nd edition, by BW Kernighan and DM Ritchie (1988). 


We will mostly be using the first two books. The first three books are free. The 4th book is more for reference and is completely optional. 


Office hours: Wed 1140am-1220pm, 3873 Slichter. 


email: frederic@stat.ucla.edu 


Course Website: http://www.stat.ucla.edu/~frederic/202a/F25 . 




Statistics 202a will explore computational statistics and will focus especially on computing in Python, R, and C.


The course is designed for graduate students with solid mathematical and statistical backgrounds. 


A preliminary outline of the class is given below, though the order may change.


(1) R graphics, UNIX/terminal, C, .C(), R packages, DSLabs, R programming, tidyverse, dplyr.   

(2) readr, dataviz, ggplot2.   

(3) joins, webscraping, rvest, stringr, regex, ML fundamentals, measures of accuracy for ML.   

(4) simulation, lm(), kde, splancs, kernel2d, rejection sampling, maps, kernel regression, loess, IDLE, python programming.  

(5) crossvalidation, knn3, caret, logistic regression, generative models, naive Bayes, GLM, QDA, LDA, python methods, numpy.  

(6) CART, random forests, sample, loops in C, simulation in C, double loops in C.  

(7) Kernel regression, vectors and matrices in C, calling C functions from C, running C from terminal, reading in from a file in C, integrating in C, C and R examples, GAM, webscraping in Python.  

(8) Biglm, ff, MLE in C and R, C++, Rcpp, Reticulate, regex in Python, building R packages.  

(9) Oral reports.    


Grading: 

Homeworks (80%), written project (15%), oral presentation/participation (5%).


Homeworks will be assigned on the main course website. Your TA will send you a Google Form for homework submissions. If for some reason the Google form does not work, then email your homework to statgrader@stat.ucla.edu . Each homework is graded out of 10 points. 


Attendance Dec1 and Dec3 is mandatory for all students, for the oral reports. Attendance for lectures, on all other days, is generally not mandatory and not counted as part of the grade. However, if you cannot attend, please contact another student to find out what you missed rather than asking me to fill you in. Late homeworks will not be accepted at all. There will be no extensions for the project or presentation. Students who are unable to make these dates or otherwise fulfill the course requirements must consult with the instructor in advance, if possible. Students with learning disabilities must consult with the instructor by the 2nd week of class if special arrangements are required.


Written Project: due Dec10 11:59pm, by email to frederic@stat.ucla.edu.

Oral presentations: Dec1 and Dec3. 


No final exam.


Description of Written Project: 

(to come) 


Oral presentations of project results will take place during lecture. These will involve simply presenting a clear, concise, and very brief summary of some data using some of the methods we discuss in class. At least one method should be performed in C. More description will be given in later lectures.