Our estimate of the percentage of households in Somerville, Brighton, and Beacon Hill (SBB, for short) with high lead levels in their water is p1=.2, based on an SRS of n1 = 248 households, compared with an estimate for Cambridge of p2 = .05 based on an independent SRS of n2 = 100 households. The estimated difference for all households in the two communities is p1-p2=.15 or 15%. The standard error for the SBB estimate is SE( p1 ) = p1 * (1 - p1 ) / n1 = 0.2*0.8/248 = .025 , and the SE for the Cambridge estimate is SE( p2 ) = 0.05*0.95/100 = .022, so the standard error for the estimated difference is [.025^2 + .022^2]^(1/2) = .033 or 3.3%.

We are 95% sure that the difference in proportions of all households with high lead levels between SBB and Cambridge is somewhere between 15% - 1.96*3.3%=8.53% and 15%+1.96*3.3%=21.46%. For those who prefer a significance test, the null hypothesis is that the two population percentages P1 and P2 are equal, or equivalently that the population difference P1-P2=0, and the (quick-and-dirty) signal-to-noise ratio for testing this null is z = p1 - p2 ) / SE( p1 - p2 ) = 15%/3.3% = 4.5, with a P-value of essentially 0%. I say quick-and-dirty because the squeaky-clean standard error for testing the equality of two proportions is a bit different than the standard error for building a confidence interval. Here the squeaky-clean standard error comes out about 4.3%, leading to a z-value of 3.5 and the same P-value of nearly 0%.

As for **causality**, we definitely have a strong **association**
between adding anticorrosives to the drinking water and lowering lead
levels, but there might be other factors at work as well -- maybe the lead
concentrations in the pipes in Cambridge are lower to begin with, or maybe
there is something else in the water in Cambridge that is acting to protect
the water supply from lead.