Statistics 50 Lecture 14

Statistics 50
Lecture 14

LECTURE 14: INFERENCE FOR A SINGLE POPULATION

A. Method 1 (Chapter 5.2)

Assumptions
The following method is appropriate when doing a simple random sample and either of the two circumstances is true:
a. the population is known to be normally distributed and the population SD is known; or
b. the sample size is "large".
If either of these two conditions is met, the appropriate test statistic is z, and p-values are found using the normal curve.

B. Method 2 (Chapter 6.1)

Assumptions
The little blurb on Gosset is worth reading. P. 406.
Upshot: in practice, sigma is usually unknown. So we have slightly different assumptions and a slightly different test.
Assumption 1.: Your data are a SRS of size n from a population.
Assumption 2.: Observations are normally distributed with mean mu and SD sigma. Both are unknown.
The sample mean x-bar will be normally distributed with mean mu and standard deviation (called STANDARD ERROR) s/SQRT(n). When the standard deviation of a statistic (here x-bar) is estimated from the data (instead of being known) it is called the standard error of the statistic.
The t distribution
The z statistic uses the standard normal distribution (TABLE A) with mean 0 and standard deviation 1. When sigma is unknown, we substitute s/SQRT(n) of x-bar for the standard deviation sigma/SQRT(n). The new statistic, the t statistic, is not normally distributed, its distribution is a t-distribution (TABLE C). It looks normal, but it has fatter tails (see picture on p. 410). The t statistic has the same interpretation as the z statistic in the sense that it says how far x-bar is from mu in standard deviation units. The only difference is that there is a different t distribution for each sample size.
A particular t distribution is specified by its DEGREES of FREEDOM. The degrees of freedom come from the standard deviation of the sample s and it is n-1.
Method
The following method is used when examining a small simple random sample from a population that is known to be normally distributed with an unknown SD. In such cases, the appropriate method involves the following three steps:
a. Estimate the SD in the population by the SD of the sample.
b. Instead of using z, use the T-CURVE (Table C) with n-1 degrees of freedom.
c. Put a range of values on the p-value (exact p-values are usually unobtainable).
SUMMARY: the one-sample t procedures are very similar to the z procedures in Chapter 5.
Example (one-sample t confidence interval)
Cockroach example from the book, p. 413. N=5, N-1=4, x-bar=44.44, s=20.741, the appropriate value of t to use is 2.776 for a 95% confidence interval.
Example (one-sample t test)
Use modified problem set #5, problem 3. Cheese factory. n=5, n-1=4, x-bar=-.538. Suppose sigma is unknown but you know that milk freezes at -.545. Supppose s=.01, perform test.
Example (matched pairs t test)
In an experiment done by Mark Rosenzweig et al., a pair of rats were chosen from 11 litters. In each pair, one rat was given playthings; the other rat was isolated with no toys. After a month, the animals were dissected and their brain cortexes were weighed. For most of the pairs, the treatment rat had a heavier cortex:
32 33 16 6 21 17 64 7 89 -2 -9
If playthings made no difference, we would expect the brain cortexes to have the same weight, so the expected difference would be zero. Since we don't expect every difference to be zero, we suspect the true differences would follow a normal curve with a mean of zero and with an unknown SD. Since this is a randomized experiment, these data can be treated as a simple random sample.
Does it look like playthings make a difference in cortex weight?
H0: the differences come from a normal curve with mean zero
Ha: the differences come from a normal curve with a mean greater than zero
The mean of the list is 24.91; the SD of the sample is 29.09. The estimated SD of the population is thus 29.09. The estimated standard error is 29.09/sqrt(11) = 8.77 with 10 df.
Now, t = (24.91-0.00)/8.77 = 2.84 with 10 df. According to Table E, this has a p-value between .01 and .005.
Therefore, reject the null hypothesis: the true mean difference looks like it is greater than zero (i.e., playthings make a difference).

Other points

If the sample is small and the population is NOT normally distributed, or if the data do NOT come from a simple random sample, then doing the above procedures will give the WRONG answers!

The usefulness of the t procedures depends on how strongly the data departs from normality.

The t is strongly affected by outliers. But it is also a conservative test -- outliers will make the test less signficant and the margin of error larger.

The accuracy can be improved by increasing the sample size (assuming other things are OK).

Return to the Fall 1996 Statistics 50 Home Page

Last Update: 18 November 1996 by VXL