EDA in an expanded field of statistics

Announcing a new seminar in the non-professional practices
of data collection and analysis

UCLA Center for Statistical Computing
Academic year 2008-2009


Background. Almost every aspect of our daily lives is "rendered" in data. New data collection technologies have made it easy to record continuous, high-resolution measurements of our physical environment (weather patterns, seismic events, the human genome). We are also constantly monitoring our movements through and interactions with our physical surroundings (automobile and air traffic, large-scale land use, advanced manufacturing facilities). In computer-mediated settings, our activities either depend crucially on or consist entirely of complex digital data (networked games, peer-to-peer technologies, Web site and Internet usage).

As a result of our improved abilities to "observe," professional and research practices are becoming increasingly dependent on data and data processing, on drawing conclusions from or in some way adapting to rich flows of measurements taken from the physical or virtual worlds. These professional demands have given rise to a host of new analysis tools, new methodologies and new software, for uncovering significant structures in data.

While many of these advances were initiated in industrial or academic settings, we are starting to see the (inevitable?) migration of these technologies from labs and specialized deployments into widespread usage by the general public. The most obvious case in point is the trajectory followed by Geographic Information Systems; powerful mapping and overlay tools are available in a variety of convenient platforms and have been quickly taken up by non-specialists and applied effectively for social, political and cultural ends. The same can be said for database technology, with new Web sites and services like Dabble DB and Swivel offering powerful, exceedingly user-friendly tools for storing, manipulating, and importantly, sharing data; or Many Eyes, a site that offers a kind of "social data analysis" by making relatively sophisticated graphical tools easily available, and applying a social network model to encourage interaction around the displays.

We should emphasize that this migration is not purely a "server-side" phenomenon, impacting storage and analysis tools. Powerful observation technologies, data collection platforms, are already in the hands (and pockets) of millions of Americans. The mobile phone network represents a sensing system with billions of "nodes" globally, capable of capturing text, audio, images and video. Mobile phone manufacturers are busy extending the capabilities of these devices, extending their sensing capabilities. In parallel, the advances in academic sensor network research will soon provide a range of affordable, easy-to-use, low power observing systems to the public.

An announcement. Just over 30 years ago, John Tukey literally wrote the book on Exploratory Data Analysis, EDA. In a way that Tukey could not have imagined, data collection and analysis technologies are moving quickly into the public realm, creating a new kind of statistics, one that has emerged without the obvious involvement of statisticians. As information technologies have brought "the network" into our homes and personal spaces, new kinds of non-professional data collection and analysis practices have developed, practices that invite participation and data sharing. In this expanded field of statistics, EDA is transformed.

In the 2008-2009 academic year, we will be sponsoring a seminar as well as a lecture series to examine how and why non-statisticians are grappling with the effects of large, complex data flows, and the implications (both technological and ethical) of their work. These events will build on our experiences with "Site-Specifics," a seminar, offered in 2005-2006 and 2006-2007, that investigated the effects of data and data processing by studying specific places within Los Angeles. While Site-Specifics focused mainly on professional applications (healthcare, education, environmental management) the new lecture series will examine the collection, presentation and discussion of data in the public sphere. Our ultimate goal is a book that will describe the "best practices" of a new EDA.