|
Statistics 202A: Syllabus
|
Overview
Computing has always been an essential ingredient of statistical
practice. While probability theory provides us with a mathematical
foundation for describing data and studying statistical inference,
computing technologies act as a medium
through which analyses are actually realized. Our ability to
manipulate data and to audition new methodologies depends on and is
limited by our familiarity with computing technologies. To some
extent, even our notion of what constitutes "data" is a product of
our background in computing.
Through a series of group projects, we will study tools for
"exploratory computing." We will emphasize programming and scripting
languages over point-and-click interfaces. We hope to instill a
problem solving ability so that you will learn languages on your own,
cull online documentation or tutorials, find books and manuals.
Organizational details
The course will be structured around a series of group
projects. Groups will be formed during the first lecture so that each
consists of students with different computing skills. While we will be
covering several programming languages and data technologies, our
approach will be motivated by the demands of the projects. Time
permitting, we will cover the following topics, roughly in the order
given below:
- Operating systems and Unix
- Unix tools, pipes, job control, editors
- Regular expressions, manipulating text
- Python
- R, data types, basic computations, objects and methods,
vectorized operations
- Code distribution, R and packages
- Databases, SQL, R interface to MySQL, XML
- Extending R, C basics and calling C from R
- Statistical computation and realtime systems
Through class discussions of the projects, we will evaluate the
strengths of each language and compare different approaches. By the
end of this quarter, we expect students to have developed a kind of
aesthetic when it comes to computing with data. More importantly, we
hope to initiate a culture in which students routinely discuss
computation, sharing experiences or reporting on new computing
platforms and emerging technologies.
There is no computing, probability or statistics prerequisite for this
course.
Grading
Grades will be based on class participation (20%), and a series of
group projects (80%). Some of these projects will culminate in
oral presentations. You are expected to be an active, contributing
member of your group. In some cases, tasks will be divided among group
members, in others, everyone will perform the same exercises and the
group will act as a forum to discuss various solutions.
Textbooks
There is no required textbook for this class. Instead, readings and
other materials will be made available throughout the quarter on the
course Web site. A few useful online references are listed below.
|
|