High Dimensional Data Analysis, 2000Spring
Instructor : Ker-Chau
Li, MWF 12-12:50 PM, MS 5127, office hours :Wed 1-3pm
CLASSROOM CHANGE:
Starting from April 5,
Classroom will be moved to
Instructional Computing Lab
at Boelter
9413 (see map)
Lectures by weeks :
Week-one Week twoWeek three Week four Week five
Week six Week seven Week eight Week nineWeek ten
Data Sets used in
this course.
(1) Course Description :
Dimensionality is an issue that can arise in every scientific
field. Generally speaking,
the difficulty lies on how to visualize a high dimensional
function or data set. People often ask : "How do they look?",
"What structures are there?", "What model should be
used?" Aside from the differences that underlie
the various scientific contexts, such kind of questions do have a common
root in Statistics. This is the driving force for the study
of high dimensional data analysis.
This course will discuss several statistical methodologies useful for exploring voluminous data. They include Principal Component Analysis, Clustering and Classification, Tree-structured analysis, Neural Network, Hidden Markov Models, Sliced inverse regression(SIR) and principal Hessian direction(PHD).
SIR and PHD are two novel dimension reduction methods,
useful for the extraction of geometric information underlying noisy
data of several dimensions. The theory of SIR/PHD
will be discussed in depth. It will be used as the backbone for the entire
course.
Examples from various application areas will be given.
They include social/economic problems like unemployment rates, biostatistics
problems like clinic trials with censoring, machine learning problems like
handwritten digital recognition; quality control problems like performance
measurement of digital to analog converters; biomedical problems like
functional Magnet Resonance Imaging, and bioinformatics
problems like micro-array gene expression.
(2) Grading basis :
There are no exams. The grade will be based on
a term paper which can be either a thorough analysis
of a large scale data or a software-development project, using techniques
discussed in the course.
(3) Prerequisites :
Stat 100abc or equivalent.
(4) A Tentative Course outline :
Week 1 : Dimension reduction in regression; principal
Component analysis; analysis of
variance ; Sliced inverse regression.
Week 2 : Sampling property of Sliced inverse regression;
application.
Week 3 : Transformation, projection pursuit, Classification.
Week 4 : Support vector machine; clustering; neural network.
Week 5 : Multivariate SIR; Aggregated time
series and curves.
Week 6 : Aggregated imaging data; independent component
analysis functional data
analysis.
Week 7 : Discrete regressors; Error-in-regressor; Censored
regression.
Week 8 : Principal Hessian Directions; regression trees.
Week 9 : Linear design condition; quasi-helices.
Week 10 : Data visualization for simulation data.
(5) Textbooks :
None. Instructor's lecture notes will be available.
Selected papers for reading will be assigned.