Statistics 50 Lecture 11

Statistics 50
Lecture 11

INTRODUCTION TO INFERENCE AND CONFIDENCE INTERVALS

Inference
A. Overview
In PROBABILITY, the parameters are known, and we estimate the chance of various outcomes. In STATISTICAL INFERENCE, the parameters are unknown, and we draw conclusions from outcomes to make guesses about the parameters.
Statistical inference is related to probability as follows: we make assumptions about the parameters, and then test to see if those assumptions could have led to the outcome we observed. We then use probability to express the strength of our conclusions.
In this chapter we will examine CONFIDENCE INTERVALS (5.1) for estimating the value of the population paramter and TESTS OF SIGNIFICANCE (5.2) which we will use to assess a claim about a population parameter. Both the confidence intervals and the tests of significance are based on the sampling distribution of statistics from chapter 4.
B. Remark
Remember, parameters such as _mu_, although unknown, are fixed; it was the OUTCOME (statistic) that was random. Randomness, that is either your data comes from a random sample or from a randomized experiment is an important prerequisite.
Basics
A. Definition
A CONFIDENCE INTERVAL is a range of values within which we think the parameter lies.
B. Properties
a. From the 68-95-99.7 rule that in about 68% of all samples, the sample mean x-bar will be within one standard deviations of the population mean mu. In about 95% of all samples, the sample mean x-bar will be within two standard deviations of the population mean mu. In about 99.7% of all samples, the sample mean x-bar will be within three standard deviations of the population mean mu.
Example: Psychologists know that the typical IQ test has a standard deviation of 10. A SRS sample of 225 UCLA students is drawn from all UCLA students with mean 105. On the basis of this sample, what can we say about the mean score for the population of UCLA students?
standard deviation of x-bar is 10/SQRT(225) = 10/15 = .667. So x-bar will be within .667 points of mu in 68% of all samples, and 1.333 points of mu in 95% of all samples, and 2 points of mu in 99.7% of all samples.
b. When the sample mean x bar is within say 1.333 points of mu, this makes mu within 1.333 points of x-bar. This will happen in 95% of all samples.
c. So you can say "in 95% of all samples, the unknown population parameter mu lies between x-bar minus 1.333 and x-bar plus 1.333. Or we are 95% confident that the true unknown mean mu for all UCLA students lies between 105+1.333 and 105-1.333"
d. On the other hand, you might say that it is possible that our SRS is one of the few samples where x-bar is not within 1.333 points of the true mean mu. Only 5% of all possible samples would give such inaccurate results.

Method
Constructing a confidence interval for a population mean involves five steps:
1. Find the sample average x-bar. This is our POINT ESTIMATE of _mu_.
2. Compute the standard deviation for the sample means; for simple random samples, the standard deviation is SD/sqrt(n).
3. Find the test statistic. For an exact 95% confidence interval, z=1.96.
4. Multiply (2) and (3).
5. Add and subtract (4) from (1). This is your "margin of error" that is, how accurate you believe your statistic is based on the variability of the estimate.

Remarks

a. A typical confidence interval has the form "estimated value, plus or minus (confidence level expressed as z* -- z star -- standard deviations) x (standard deviation of the sample distribution)." In other words, an estimate plus and minus a margin of error.
b. If the original population is normally distributed with a known standard deviation, or if the sample size is "large", then the distribution of the sample mean is normal, and the appropriate test statistic is thus z from the normal table. (If the original distribution is normal with an unknown standard deviation, the test statistic is different.)
c. Your margin of error will depend on the choice of a confidence level. A lower confidence will give you a smaller margin of error. A higher confidence will give you a larger margin of error.
d. If your standard deviation is small, it is easier to get a more precise fix on mu. Your margin of error is smaller for populations with smaller sigma.
e. If your n increases in size, it will reduce your margin of error. If your n gets smaller, it will increase your margin of error.

Summary
1. The CORRECT interpretation for a confidence interval is as follows: "We did a procedure of drawing a sample, computing, etc. This procedure will give us a correct interval 95% of the time and an incorrect interval 5% of the time. We hope this is one of the correct times. Thus, we believe that the true, fixed parameter is somewhere between..."
2. It is WRONG to talk about the chance a particular confidence interval contains the parameter. Any single confidence interval is either RIGHT or WRONG; there is no chance. It is WRONG to talk about the percentage of times the parameter value will lie within a particular interval: the parameter is fixed; it never changes.

Warnings
This formula for confidence intervals only applies to simple random samples or randomized experimental situations!
If the sample is known to be biased, the confidence interval can be calculated, but is meaningless.
Remember, means are strongly influenced by unusually large or small observations and they can affect the confidence interval.
You must know the population standard deviation sigma to use the z* to calculate a confidence interval. In practice we usually don't know sigma. Later, you will learn to estimate sigma with s (the sample standard deviation).

Return to the Fall 1996 Statistics 50 Home Page

Last Update: 6 November 1996 by VXL