Project 1: Automated data collection

This project is about characterizing change; about expressing how data evolve over time. Working in groups, students will collect data for approximately one month.

Each group will eventually design an automated collection system that will "scrape" data of some form from the Web. Groups are encouraged to be creative about their choice of data, with the only restriction being that the data must relate to the upcoming Presidential election. While data collection will initially proceed manually, we will begin to very quickly provide students with the tools to both collect and to process data in an automated fashion.

Because groups will eventually automate their processing, I would not encourage students to spend a lot of time with their manual analyses. The goal of this exercise is not to make the (obvious) point that a computer can conduct analysis faster than a human can, but that computing can change what we view as data (perhaps equally obvious, but certainly less appreciated).

It is also important that students get a sense of what it means to "own" a data feed and to feel the daily pressure to keep it current.

At the end of the quarter, each group will present to the class the data they chose and the systems they developed to collect, store and process the data. They will also present an analysis of the data, indicating how characteristics of the feed changed over time, and in response to events like the scheduled debates or (unforseen) incidents in the U.S. and abroad. Grading will be based on the decisions the students made while designing their system and on the analysis of the culled data.