Homework 1 Solutions to *'d problems

3.9:
For full credit, your report should (a) address the "5 W's" as appropriate (b) describe the charts  (c) answer the questions and (d) use complete sentences.  For example:
The data consist of oil spills from 50 tankers and carriers (who).  We don't know when these occurred (when). The data record the cause of the spillage (what), which is a categorical variable.  We don't know, precisely, where the data were recorded. (OK, the smart alecs will say "the ocean", but which oceans?") Also, the "how" is unknown, for example, were they self-reported, or determined by an independent investigator?

The bar-chart shows that grounding was the most common cause, although there were not really dramatic differences in the number of spills caused by the other three reasons.  Also, relatively few (one or two?) spills have no known cause.  The pie chart is appropriate, as long as only one cause was reported for each spill.  The pie chart shows, a little more clearly in my opinion, htat all four causes (excluding "unknown") happened equally often, more or less.

3.26
a) Pets are expensive and some pets are more expensive than others.  So, a priori, I would expect the distribution of pets to vary for different income levels.
b) These are column percents, because each column sums to 100.  (You need an explanation, as well as the correct answer, for full credit.)
c) Yes, more or less.  Horses are rare in households making under $12.500.   In fact, for each type of pet, the upper four income categories are (roughly) equally likely to have that pet, while the households making under $12,500 are considerably less likely to have that pet.

You might ponder this question (not required for full credit!):  why not use row percents to answer this question? What questions could we answer with row percents?