Homework 1 (Translated)

 

Your homework consists of two parts. All of this information was either given to you in lecture, or in the lecture slides, but it can be difficult to remember or find.

 

1.      Compute the number of hits to the portions of the site owned by Song-Chun Zhu, Vivian Lew, Brian Kriegler, Debbie Barrera and Ivo Dinov.

         a.      Who received the most hits last week?

         b.      What can you say about the kinds of files that were downloaded?

         c.      What was the most popular portion of each site?

2.      Pull back a little and tell me about the site and the habits of its visitors; specifically, think about

         a.      When is the site active? When is it quiet?

         b.      Do the visitors stay for very long?

         c.      Do they download any of our papers or software?

         d.      What applications do they run?

         e.      On the balance, is our traffic ÒrealÓ or mostly the result of robots or automated processes?

3.      Take a look at the 1950s.txt dataset. Provide me with a description of some of the problems that you encounter if you try to use the current tools we have learned to parse this file. What types of tools would we need to better parse the file?

 

Back to 202A Page