High Dimensional Data Analysis, 2000Spring

Instructor :  Ker-Chau  Li, MWF 12-12:50 PM, MS 5127, office hours :Wed 1-3pm


      Starting from April 5, Classroom will be moved to
     Instructional Computing Lab at  Boelter 9413  (see map)

Lectures by weeks :

Week-one  Week twoWeek three Week four   Week five

Week six Week seven Week eight Week nineWeek ten

Programs used in this course.

Data Sets used in this course.

(1) Course Description :
Dimensionality is an issue that can arise in every scientific field.  Generally speaking,
the difficulty lies on how to visualize a high dimensional function or data set. People often ask :  "How do they look?",  "What structures are there?",  "What model should be
used?"  Aside from the differences that underlie the various scientific contexts, such kind of questions do have a common root in Statistics. This is the driving force for   the study of high dimensional data analysis.

This course  will discuss several statistical methodologies useful for exploring voluminous data.   They include Principal Component Analysis, Clustering and Classification, Tree-structured analysis, Neural Network, Hidden Markov  Models, Sliced inverse regression(SIR) and principal Hessian direction(PHD).

SIR and PHD are two novel dimension reduction methods,  useful for the  extraction of geometric information underlying noisy data of several dimensions. The theory  of   SIR/PHD  will be discussed in depth. It will be used as the backbone for the entire course.
Examples from various application areas will be given. They include social/economic problems like unemployment rates, biostatistics problems like clinic trials with censoring, machine learning problems like  handwritten digital recognition; quality control problems like performance measurement of digital to analog converters; biomedical problems like
functional Magnet Resonance Imaging, and bioinformatics problems like micro-array gene expression.

(2) Grading basis :
 There are no exams. The grade will be based on a term paper which can   be  either a thorough analysis of a large scale data  or a software-development project, using techniques discussed in the course.

(3) Prerequisites :
     Stat 100abc or equivalent.

(4) A Tentative Course outline :
 Week 1 : Dimension reduction in regression; principal Component analysis;  analysis of
                 variance ; Sliced inverse regression.
Week 2 : Sampling property of Sliced inverse regression; application.
Week 3 : Transformation, projection pursuit, Classification.
Week 4 : Support vector machine; clustering; neural network.
Week 5 :  Multivariate SIR;  Aggregated time series and curves.
Week 6 : Aggregated imaging data; independent component analysis functional data
Week 7 : Discrete regressors; Error-in-regressor; Censored regression.
Week 8 : Principal Hessian Directions; regression trees.
Week 9 : Linear design condition; quasi-helices.
Week 10 :  Data visualization for simulation data.

(5) Textbooks :
  None. Instructor's lecture notes will be available. Selected papers for reading will be assigned.