Statistics 50 Lecture 12

Statistics 50
Lecture 12

INTRODUCTION TO HYPOTHESIS TESTING

A. Overview

In the previous lecture you learned about confidence intervals here you will learn about tests of significance. Recall that in STATISTICAL INFERENCE, the parameters are usually not known, and we draw conclusions from outcomes (i.e. sampling outcomes) to make guesses about the underlying parameters.
Remember the basic idea: we make assumptions about the parameters, and then test to see if those assumptions could have led to the outcome we observed. We then use probability -- say through a probability calculation to express the strength of our conclusions.

B. Example

You play a game of dice with Professor Lew. If you roll a 1,2, or 3 she pays you $1. If you roll a 4,5, or 6 you pay her $1. You roll the die 10 times and get:
6 5 6 4 4 1 2 1 6 6
A test of significance simply asks: Does this die seem fair? That is, does the result of $7 for Professor Lew and $3 for you show evidence of cheating...or could Professor Lew have gotten $7 just by chance?
The mean of these 10 rolls is 4.1 the expected mean for a fair die is 3.5 and s.d. is about 1.9
To answer the question, you rely upon how the sample mean x-bar is expected to behave if the samples were repeated and if the mean were really equal to 3.5
If you do the calculations (4.1-3.5)/ 1.9/SQRT(10) = 1. You can find the probability associated with a Z score of 1 from table A.

C. Definitions

The NULL HYPOTHESIS is that the observed results are due to chance alone; in this case, the null hypothesis is that the die is fair, and getting 7 of the 10 rolls was a matter of luck. The null hypothesis must be translated into a statement about a parameter; here, the statement is that mu (the mean value of the distribution of all possible rolls) is 3.5.
The ALTERNATIVE HYPOTHESIS is that the observed results are due to more than just chance. Mathematically, the (one-sided) alternative hypothesis is that mu>3.5. (Alternative hypotheses can be one-sided --- "the die is biased towards numbers 4, 5 and 6" --- or two-sided --- "the die is not fair.")
The TEST STATISTIC measures how different the observed results are from what we would expect to get if the null hypothesis were true. When using the normal curve, the test statistic is z,
where z = (observed - expected)/spread = (xbar - mu) / (sigma/sqrt(n)).
Give the SIGNIFICANCE LEVEL (or P-VALUE). This is the chance of getting new results as or more extreme than what we got, IF the null hypothesis were true. p-values are always "if-then" statements:

"If the null hypothesis were true, then there would be a p% chance to get these kind of results."
If the p-value is less than 5%, we say the results are STATISTICALLY SIGNIFICANT; if p < 1%, the results are HIGHLY STATISTICALLY SIGNIFICANT. A "significant" result means that it would be unlikely to get such extreme observed values by chance alone.

D. Hypothesis Testing Summarized

1. Clearly identify the parameter and the outcome.
2. State the null hypothesis. This is what is being tested. A test of significance assesses the strength of evidence (outcomes) against the null hypothesis. Usually the null hypothesis is a statement of "no-effect" or "no difference"
3. The alternative hypothesis is the claim about the population that we are trying to find evidence in favor of. In the coin toss example, you are seeking evidence of cheating. The null would say that the mean is 3.5, the alternative would say it is larger than 3.5. Note this is a ONE-SIDED alternative because you are only interested in deviations in one direction. (A two-sided situation occurs when you do not know the direction, you just think the evidence suggests somthing different from the null)
4. The test statistic. It is the statistic that estimates the parameter of interest. In the above example, the paramter is mu and the outcome is x-bar and the test-statistic is Z.
The significance test assess evidence by examining how far the test statistic fall from the proposed null. In other words a six-sided die should have a distribution of mean 3.5 for a given number of tosses. I get 4.1. Is 4.1 far enough from 3.5 to suggest that something is not correct here?
To answer that question, you find the probability of getting an outcome as extreme or MORE than you actually observed. So to test the outcome, you would ask "what is the chance of get 4.1 or greater?"
5. The probability that you observe is called a P-VALUE. The smaller the p-value the stronger is the evidence against the null hypothesis. If instead you had rolled the die with Prof. Lew 100 times and paid out $70 the Z score would be 3.16 with a p-value of .0008
6. On significance levels. Sometimes prior to calculating a score and finding it's P-value, we state in advance what we belive to be a decisive value of P. This is the significance level. 5% and 1% significance levels are most commonly used. If your P-value is as small or smaller than the significance level you have chosen then you would say that "the data is statistically significant at level --- "
NOTE:
Significant is not the same as important. All it means is that the outcome you observed probably did not happen by chance.

Return to the Fall 1996 Statistics 50 Home Page

Last Update: 11 November 1996 by VXL