Statistics 10
Lecture 17


Hypothesis Testing and Tests of Significance (Chapter 26.2-26.5)

A. Overview

Remember the basic idea: we make assumptions about the parameters, and then test to see if those assumptions could have led to the outcome we observed. We then use a probability calculation to express the strength of our conclusions.
What we are doing in Chapters 26 and 27 is learning about a "method" for making decisions. This involves identifying the parameter and statistic, constructing a test, and then using the results of the test to determine whether the outcome could have happened by chance or is the difference meaningful?

B. More about the Test Statistic

At the end of the last lecture, the IRS/Treasury makes the statement that if the true parameter was zero dollars and the samples have a variation of $72 then the chance that you could have picked a sample of size 100 with a mean of -219 is about 0.1 of 1% which is the area to the left of -3 under the normal curve.
1/10 of 1% or .1% is like getting a result like this 1 in 1000 samples. From this test result, the IRS concludes that the tax bill cannot possibly be "revenue neutral" because the result of one sample is so extreme, there is a strong suggestion that the true parameter is not zero but something negative.

C. Definitions (26.2)

  1. The NULL HYPOTHESIS is that the observed results are due to chance alone. That is, any differences between the parameter (the expected value) and the observed (or actual) outcome was due to chance along. In this case, the null hypothesis is a statement about a parameter: the population average is 0.
  2. The ALTERNATIVE HYPOTHESIS is that the observed results are due to more than just chance. It implies that the NULL is not correct and any observed difference is real, not luck.
  3. Usually, the ALTERNATIVE is what we're setting out to prove. The NULL is like a "straw man" that we might knock over with the alternative. As Freedman says, it's unfortunate that these rather confusing names are so standard.

  4. The TEST STATISTIC measures how different the observed results are from what we would expect to get if the null hypothesis were true. When using the normal curve, the test statistic is z,
  5. where z = (observed value - expected value)/spread

    All a Z does is it tells you how many SEs away the observed value is from the expected value when the expected value is calculated by using the NULL HYPOTHESIS.

  6. The SIGNIFICANCE LEVEL (or P-VALUE). This is the chance of getting results as or more extreme than what we got, IF the null hypothesis were true. P-VALUE could also be called "probability value" and it is simply the area associated with the calculated Z.
  7. p-values are always "if-then" statements:

    "If the null hypothesis were true, then there would be a p% chance to get these kind of results."
  8. If the p-value is less than 5%, we say the results are STATISTICALLY SIGNIFICANT; if p < 1%, the results are HIGHLY STATISTICALLY SIGNIFICANT. A "significant" result means that it would be unlikely to get such extreme observed values by chance alone.

D. Hypothesis Testing Summarized

1. Clearly identify the parameter and the outcome.
2. State the null hypothesis. This is what is being tested. A test of significance assesses the strength of evidence (outcomes) against the null hypothesis. Usually the null hypothesis is a statement of "no-effect" or "no difference"
3. The alternative hypothesis is the claim about the population that we are trying to find evidence in favor of. In the tax law example, you are seeking evidence that the law is not neutral. The null hypothesis would say that the average return will not change (i.e. 0) , the alternative would say it is negative . Note this is a ONE-SIDED alternative because you are only interested in deviations in one direction. (A two-sided situation occurs when you do not know the direction, you just think the evidence suggests somthing different from the null)
4. The test statistic. It is the statistic that estimates the parameter of interest. In the above example, the parameter is the population average and the outcome is the sample average and the test-statistic is Z.
The significance test assess evidence by examining how far the test statistic fall from the proposed null.
To answer that question, you find the probability of getting an outcome as extreme or MORE than you actually observed. So to test the outcome, you would ask "what is the chance of getting a -$219 or lower number?"
5. The probability that you observe is called a P-VALUE. The smaller the p-value the stronger is the evidence against the null hypothesis. If instead you had gotten a sample average of -$100 with the same SE, the senator may well be right. (A Z of about -1.4 has about 8% of the normal, so here, there was an 8% chance of getting a sample with an average of -$100 or lower).
6. On significance levels. Sometimes prior to calculating a score and finding it's P-value, we state in advance what we believe to be a decisive value of P. This is the significance level. 5% and 1% significance levels are most commonly used. If your P-value is as small or smaller than the significance level you have chosen then you would say that "the data is statistically significant at the ---- (e.g. 1% or 5% or some other) level."
NOTE:
Significant is not the same as important. All it means is that the outcome you observed probably did not happen by chance.

E. One more example

The following letter appeared in the "Dear Abby" column in the 1970s:

Dear Abby

You wrote in your column that a woman is pregnant for 266 days. Who said so? I carried my baby for 10 months and 5 days (Prof's note: that's 310 days) and there is no doubt about it because I know the exact date my baby was conceived. My husband is in the Navy and it couldn't have been conceived at any other time because I saw him only once for an hour, and I didn't see him again until the day before the baby was born.

I don't drink or run around, and there is no way this baby isn't his, so please print a retraction about the 266 day carrying time because otherwise I am in a lot of trouble.

San Diego Reader

OK....suppose it is known that pregnancy durations are normally distributed with a mean of 266.0 days and a standard deviation of 16.0 days.

A chapter 23-like question might be: what is the chance of observing a pregnancy duration of 310 days or more?

310 - 266
-----------------   =   2.75 = Z = .3% chance 
(SQRT(1)x 16) / 1

Remember, this is like a sample of one and you know the population parameters. Let's not pass judgment on the San Diego lady...a pregnancy as long as 310 days can happen, and the chance is about 3 in 1000 pregnancies.

In chapter 26, when we are calculating chances, we're thinking more along the lines of larger samples and of trying to make a decision -- choosing between hypotheses.

A chapter 26-like question: In a recent study of 100 pregnancy durations selected at random, the average pregnancy duration was 270 days with standard deviation of 20 days. Does this study provide evidence that pregnancy durations have increased since the 1970s? Perform a test of significance and state the p-value.

The null hypothesis is 266 days.
The alternative something longer than 266 days
This is a one-side test, we're only interested if the durations are longer now
The test statistic is

( 270 - 266)
-------------------   = 2.50 = Z = .62%
(SQRT(100) x 16)/100

This would suggest that the probability of getting a sample average of 270 if the true average is 266 is about 6 times in 1000. This is evidence for the alternative, that is, that durations are getting longer.

Things to note: why a SD of 16?

F. A note on counts (26.5)

I've been working through examples on averages, but counting situations work too. The z statistic is the same, but beware of the SE. If you are working with averages, make sure the SE is for the average, with counts, make sure the SE is for the number, with percentages, make sure the SE is for the percentage. Page 487 of your text makes for a good summary.

G. Another example

The Statistical Abstract of the United States reported that the net average earnings of all M.D.'s was $155,800 per year with a standard deviation of $23,400.
A random sample of income tax returns of 9 M.D.'s practicing in rural communities showed that their net earnings to be:
93,700 110,500 173,600 123,300 136,800 142,700 129,900 153,400 140,200
Assuming that the earnings of these M.D.'s follow a distribution that is approximately normal, use a 1% level of significance to test the claim that the mean earnings of all rural M.D.'s is less than the national average.
1) What is the population? What are the parameters?
All MDs in the US. Average income is $155,800 and the SD is $23,400
2) What is the sample? What are the statistics?
9 MDs. Sample Average= 133,789, Sample SD = $22,045
3) What is the null hypothesis? (See definitions, next section)
Rural avearge earnings are not different from US average earnings, that is $155,800
4) What is the alternative hypothesis? (See definitions, next section)
Rural average earnings are less than US average earnings.
5) What is the appropriate test statistic? (See definitions, next section)
Ask yourself...what do I know about the population?
I'll say, although the sample size is very small (9) but the population standard deviation is known (Chapter 26.6 deals with situations when population standard deviation is unknown) so a Z-test is appropriate here.
z = (observed value - expected value)/spread
(133789 - 155800) / SE of the average.
Why SE of the average?
SE of the average = SQRT(9) x 23,400
                    ---------------- = $7,800
                            9
How far is 133789 from 155800? about a Z of -2.85.
6) What is the interpretation of the test statistic given the stated significance level of 1% (or p-value of 1%)?
First, what is being suggested is that the person asking the question will only accept p-values of 1% or less. In other words, returning to concepts in Chapters 17, 19, and 20 if there is less than a 1% chance of getting an average this low if we were expecting 155,800 I will reject the null hypothesis in favor of the alternative.
The p-value associated with a Z of -2.85 is about .22%. How did I get that? The probability of getting a Z between +2.85 and -2.85 is 99.56% (from table A 105). So the chance of being outside of that is (100 - 99.56) or .44%. Since the question stated the alternative as a one-tailed (or one-sided) we're only interested in the left side of the curve.