Stat 210B: Homework Assignment 3

Due:  Monday Feb. 10


The data given in data frame `Insurance' in library(MASS) consist of the numbers of policyholders of an insurance company who were exposed to risk, and the numbers of car insurance claims made by those policyholders in the third quarter of 1973.  The data are cross-classified by District (four levels), Group of car (four levels), and Age of driver (four ordered levels).  The other variables are the numbers of Holders and Claims.

> library(MASS)
> data(Insurance)

Variables:

     `District' district of policyholder (1 to 4): 4 is major cities.

     `Group' group of car (1 to 4):  <1 litre, 1-1.5 litre, 1.5-2 litre, >2 litre.

     `Age' of driver in 4 ordered groups:  <25, 25-29, 30-35, >35.

     `Holders' numbers of policyholders

     `Claims' numbers of claims

  1. The relevant model is a rate model with Claims as response and offset(log(Holders)).  Report  your final model and explain how you reach it.   Present your results as a table of estimated claim rates per policy holder for each category of holder. Give a brief report how claim rates are related to the covariates.
  2. It is not strictly valid to regard such data as having the obvious binomial distribution, since some policyholders may make multiple claims.  Nevertheless it should be a reasonable approximation.  Repeat the analysis with a binomial model and compare the outcomes on the estimated claim rates (or in this case, estimated probabilities of making a claim).

Note:  To have an easy interpretation, you should use treatment contrasts.  

> options(contrasts=c("contr.treatment", "contr.treatment"))

The following commends are useful in viewing the data.

> ftable(xtabs(Holders~ District + Group + Age, Insurance))
> ftable(xtabs(Claims~ District + Group + Age, Insurance))
> xtab <- ftable(xtabs(Claims/Holders~ District + Group + Age, Insurance))
> round(xtab,2)