stat216.html

STATISTICS 216

High Dimensional Data Analysis, 2000Spring

Instructor : Ker-Chau Li, MWF 12-12:50 PM, MS 5127, office hours :Wed 1-3pm

CLASSROOM CHANGE:

Starting from April 5, Classroom will be moved to
Instructional Computing Lab at Boelter 9413 (see map)

Lectures by weeks :

Week-one Week two Week three Week four Week five

Week six Week seven Week eight Week nine Week ten

Programs used in this course.

Data Sets used in this course.

(1) Course Description :
Dimensionality is an issue that can arise in every scientific field. Generally speaking,
the difficulty lies on how to visualize a high dimensional function or data set. People often ask : "How do they look?", "What structures are there?", "What model should be
used?" Aside from the differences that underlie the various scientific contexts, such kind of questions do have a common root in Statistics. This is the driving force for the study of high dimensional data analysis.

This course will discuss several statistical methodologies useful for exploring voluminous data. They include Principal Component Analysis, Clustering and Classification, Tree-structured analysis, Neural Network, Hidden Markov Models, Sliced inverse regression(SIR) and principal Hessian direction(PHD).

SIR and PHD are two novel dimension reduction methods, useful for the extraction of geometric information underlying noisy data of several dimensions. The theory of SIR/PHD will be discussed in depth. It will be used as the backbone for the entire course.
Examples from various application areas will be given. They include social/economic problems like unemployment rates, biostatistics problems like clinic trials with censoring, machine learning problems like handwritten digital recognition; quality control problems like performance measurement of digital to analog converters; biomedical problems like
functional Magnet Resonance Imaging, and bioinformatics problems like micro-array gene expression.

(2) Grading basis :
There are no exams. The grade will be based on a term paper which can be either a thorough analysis of a large scale data or a software-development project, using techniques discussed in the course.

(3) Prerequisites :
Stat 100abc or equivalent.

(4) A Tentative Course outline :
Week 1 : Dimension reduction in regression; principal Component analysis; analysis of
variance ; Sliced inverse regression.
Week 2 : Sampling property of Sliced inverse regression; application.
Week 3 : Transformation, projection pursuit, Classification.
Week 4 : Support vector machine; clustering; neural network.
Week 5 : Multivariate SIR; Aggregated time series and curves.
Week 6 : Aggregated imaging data; independent component analysis functional data
analysis.
Week 7 : Discrete regressors; Error-in-regressor; Censored regression.
Week 8 : Principal Hessian Directions; regression trees.
Week 9 : Linear design condition; quasi-helices.
Week 10 : Data visualization for simulation data.

(5) Textbooks :
None. Instructor's lecture notes will be available. Selected papers for reading will be assigned.