We might think of samples as n independent trials. Each trial results in either "success" or "failure", and the chance of a success each time is p (the parameter) and our outcomes/results are called p-hat (the statistic, the sample proportion)
In Chapter 7, we make inferences based on the sampling distribution of the proportion, p.
The sample proportion p-hat is an unbiased estimator of the population proportion p.
And the standard deviation of the proportion is sqrt( p (1-p) / n).
And for large samples, the distribution of p-hat will approximately be normal.
Suppose a simple random sample of size n is drawn from a large population in which there are a proportion p of successes. Let p-hat be the observed proportion of successes in the sample.
Then, if the true p is unknown, the approximate standard error is sqrt( phat x (1-phat) / n), and this can be used for confidence intervals.
On the other hand, if the true p is given (such as by a null hypothesis), use the true p to compute the standard deviation.
Suppose a simple random sample of 100 students were drawn from UCLA. In this sample, 37 were women. Find a 95% confidence interval for the percent of women at UCLA.
The estimated percentage of women at UCLA is 37%; the estimated standard error is sqrt(.37 x .63 / 100) x 100% = 4.83%. For a 95% confidence interval, z = 1.96. Thus, the 95% interval is 37% +/- (1.96)(4.83)% = (27.5%, 46.5%).
Note that the data are from a SRS. The population (UCLA students is at least 10 times as large as the sample). And n*p and n*(1-p) are larger than 10.
A researcher wants to test the hypothesis that 50% of UCLA students are women. What is the resulting p-value?
Here, the null hypothesis is that 50% of UCLA students are women (and that the difference between 37% and 50% is due to chance). If the null were true, the expected percentage would be 50%, and the standard error would be sqrt(.50 x .50 / 100) x 100% = 5%. Then, z = (37-50)/5 = -2.6, and p is 1/2 of 1 percent.
Formula : n = ((z*/m)^2) * p*(1-p*)
p* is a guessed value for the sample proportion. m is the margin of error expressed as a proportion (not a percentage).
Note, smaller margins of error require bigger samples.
a. Need SRS
b. Population >> sample
c. p-hat will be almost normally distributed
d. confidence interval p-hat +/- z*SE
e. significance test: use Z and SD.
In two sample problems, you are comparing two independent populations or two responses based on two independent samples.
Notation (see top of page 500)
Suppose a simple random sample of size n1 is drawn from a population having a proportion p1 of successes. Let p1-hat be the proportion of successes in sample 1. Suppose an independent simple random sample of size n2 is drawn from a population having a proportion p2 of successes. Let p2-hat be the proportion of successes in sample 2.
Then, if the true p's are unknown, the approximate standard error for the difference between p1-hat and p2-hat can be estimated by sqrt( p1hat x (1-p1hat) / n1 + p2hat x (1-p2hat) / n2). This is useful for finding confidence intervals.
In a simple random sample of 100 Democrats, 56% favored increased taxes, while in an independent simple random sample of 150 Republicans, only 41.3% favored new taxes. Find a 95% confidence interval for the difference. Does it look likely the support rate is the same?
For the confidence interval, the observed difference is 14.7%, and the estimated standard error is sqrt( 0.56 x 0.44 / 100 + 0.41 x 0.59 / 150 ) x 100% = 6.4%.
Thus a 95% confidence interval for the difference is 14.7% +/- (1.96)(6.4), or from (2.1% to 27.3%). It looks like there is a difference.
a. Need two independent SRS
b. p1-hat minus p2-hat will be almost normally distributed when the samples are large.
c. The variances sum, the standard deviations of each sample do not sum.
d. confidence interval estimate +/- z*SE
For a test of significance, you are simply looking at the difference in the population proportions. The null is that there is really no difference and the alternative suggests that there is a difference.
Standardize, using a z statistic for a difference in two proportions. The standard error is somewhat different here, you need to pool the samples to get the single estimate for the population parameter p. The pooled sample proportion is:
p-hat = count of successes in both samples combined ___________________________________________ count of observations in both samples combined
The z-statistic then will be the difference in the two sample proportions divided by a standard error constructed from this estimate of the pooled sample proportion.
Last Update: 25 November 1996 by VXL