Homework 1 (Translated)
Your
homework consists of two parts. All of this information was either given to you
in lecture, or in the lecture slides, but it can be difficult to remember or
find.
1. Compute the
number of hits to the portions of the site owned by Song-Chun Zhu, Vivian Lew,
Brian Kriegler, Debbie Barrera and Ivo Dinov.
a. Who received the
most hits last week?
b. What can you say
about the kinds of files that were downloaded?
c. What was the most
popular portion of each site?
2. Pull back a
little and tell me about the site and the habits of its visitors; specifically,
think about
a. When is the site
active? When is it quiet?
b. Do the visitors
stay for very long?
c. Do they download
any of our papers or software?
d. What applications
do they run?
e. On the balance,
is our traffic ÒrealÓ or mostly the result of robots or automated processes?
3. Take
a look at the 1950s.txt dataset. Provide me with a description of some of the
problems that you encounter if you try to use the current tools we have learned
to parse this file. What types of tools would we need to better parse the file?