Homework 3 Solutions



In part "a" of this homework assignment, we feed in a single executive's directory into the function and modify the code Mark gave in class to print out a count of the number of emails person A sent to person B. #!/usr/local/bin/python
import email, sys
in_out = {}
receivers = []
for line in sys.stdin:
        msg = email.message_from_file(open(line.strip()))
   if msg['To'] and msg['From'] and msg['Message-ID']:
        sender = msg['From'].strip()
        if sender.endswith("enron.com"):
            msg_id = msg['Message-ID'].strip()
            receiver_list = [r.strip() for r in msg['To'].split(",") \
                        if r.endswith("enron.com")]
            receiver_set = set(receiver_list)
            for receiver in receiver_set:
                if not in_out.has_key(sender):
                    in_out[sender] = {}
                if not in_out[sender].has_key(receiver):
                    in_out[sender][receiver] = []
                in_out[sender][receiver].append(msg_id)
for sender in in_out.keys():
    for receiver in in_out[sender].keys():
        for msg in in_out[sender][receiver]:
        print sender, receiver, len(in_out[sender][receiver])

That's right folks! All you needed to change were two lines. Recall that in_out[sender][receiver] is a list containing the message IDs for emails exchanged by that pair. Once we have reached the part of the last block where we iterate through each receiver, we can now access in_out[sender][receiver] and just return the length!

Many of you rewrote the code so that instead of appending any message IDs to a list, you just keep a counter of messages. This is perfectly corect, and is actually more efficient because we do not even need to consider the individual messages to complete the assignment.

To pass in the messages for one individual (say skillin-j), we just use the find command:

find /data/202A/maildir/skilling-j/*sent*/ -type f | ./pass_two.py

The asterisks (*) around the word sent means "use any directory that contains the word sent" since there are different ways of structuring the email folders and each server or email client may have a different way of naming the sent emails folder.


For part "b" of this assignment (some of you called it Homework 4), we modify our approach so that the program will read in ALL user's sent email directories. No changes to the Python code were necessary! All we needed to do was change the find statement:
find /data/202A/maildir/*/*sent*/ -type f | ./pass_two.py

The extra asterisk in bold is a wildcard and means "consider any directory here."


Return to HW 3