Economics 40/Statistics M11
Lecture 17


TESTS OF SIGNIFICANCE (12.1, 12.2, 12.3 -- and optionally 12.8)

A. Hypothesis Testing: Definitions

  1. The NULL HYPOTHESIS is that the observed results are due to chance alone. That is, any differences between the parameter (the expected value) and the observed (or actual) outcome was due to chance and not a deliberate error.
  2. The ALTERNATIVE HYPOTHESIS is that the observed results from a sample of a given size are due to more than just chance. It implies that the NULL is not correct and any observed difference is real, not luck.
  3. Usually, the ALTERNATIVE is what we're setting out to prove. The NULL is like a "straw man" that we set up to knock down.

    The null and alternative hypotheses are usually stated formally as (for the business example):

    H0: m = 1,200

    H1: m < 1,200 (this is a one-sided test, we are really only concerned about the average if it is going to be less than what is claimed -- because then we would be overpaying for his business. A two-sided test would look like: m not equal 1200, where the question becomes 'we're interested in whether the outcome is different, period.')

  4. The TEST STATISTIC measures how different the observed results are from what we would expect to get if the null hypothesis were true. When using the normal curve, the test statistic is z,
  5. where z = (observed value - expected value)/spread

    for this example

    z = ( 950 - 1200) / (1000 Ö 150) = -3.06 )

    All a Z does is it tells you how many standard deviations away the observed value is from the expected value when the expected value is calculated by using the NULL HYPOTHESIS.

  6. The SIGNIFICANCE LEVEL (or P-VALUE). This is the chance of getting results as or more extreme than what we got, IF the null hypothesis were true. P-VALUE could also be called "probability value" and it is simply the area associated with the calculated Z.
  7. p-values are always "if-then" statements:

    "If the null hypothesis were true, then there would be a p% chance to get these kind of results."
    The less probable an outcome is, the stronger the evidence that we would reject the null in favor of the alternative.
     For this example a Z = -3.06 is associated with a value of .0011 in your Table E. The translation is there is a .0011 chance or a .11% chance of getting a result as or more extreme than $950 given that you were expecting to get $1,200.
     
  8. If the p-value is less than 5%, we say the results are STATISTICALLY SIGNIFICANT;
    if p < 1%, the results are HIGHLY STATISTICALLY SIGNIFICANT. A "significant" result means that it would be unlikely to get such extreme observed values by chance alone.

B. Hypothesis Testing Summarized

1. Clearly identify the parameter and the outcome.
2. State the null hypothesis. This is what is being tested. A test of significance assesses the strength of evidence (outcomes) against the null hypothesis. Usually the null hypothesis is a statement of "no-effect" or "no difference"
3. The alternative hypothesis is the claim about the population that we are trying to find evidence in favor of. In the business example, you are seeking evidence that the business owner is not being truthful about the value of his business. The null hypothesis would say that the average sales per client is . $1,200 , the alternative would say it is less than $1200 . Note this is a ONE-SIDED alternative because you are only interested in deviations in one direction. (A two-sided situation occurs when you do not know the direction, you just think the evidence suggests something different from the null see p. 509-513)
4. The test statistic. It is the statistic that estimates the parameter of interest. In the above example, the parameter is the population average and the outcome is the sample average and the test-statistic is Z.
The significance test assesses evidence by examining how far the test statistic fall from the proposed null.
To answer that question, you find the probability of getting an outcome as extreme or MORE than you actually observed. So to test the outcome, you would ask "what is the chance of getting a $950 or lower number?"
5. The probability that you observe is called a P-VALUE. The smaller the p-value the stronger is the evidence against the null hypothesis. If instead you had gotten a sample average of $1150 with the same standard deviation, the owner may well be right. (A Z of about -1.4 has about 8% of the normal, so here, there was an 8% chance of getting a sample with an average of -$100 or lower).
6. On significance levels. Sometimes prior to calculating a score and finding it's P-value, we state in advance what we believe to be a decisive value of P. This is the significance level. 5% and 1% significance levels are most commonly used. If your P-value is as small or smaller than the significance level you have chosen then you would say that "the data is statistically significant at level --- "
NOTE:
Significant is not the same as important. All it means is that the outcome you observed probably did not happen by chance.

C. One more class example

The following letter appeared in the "Dear Abby" column in the 1970s:

Dear Abby

You wrote in your column that a woman is pregnant for 266 days. Who said so? I carried my baby for 10 months and 5 days (Prof's note: that's 310 days) and there is no doubt about it because I know the exact date my baby was conceived. My husband is in the Navy and it couldn't have been conceived at any other time because I saw him only once for an hour, and I didn't see him again until the day before the baby was born.

I don't drink or run around, and there is no way this baby isn't his, so please print a retraction about the 266 day carrying time because otherwise I am in a lot of trouble.

San Diego Reader

OK....suppose it is known that pregnancy durations are normally distributed with a mean of 266.0 days and a standard deviation of 16.0 days.

A chapter 5-like question might be: what is the chance of observing a pregnancy duration of 310 days or more?

 310 - 266
 -----------------   =   2.75 = Z = .3% chance 
 16 /1

 

Remember, this is like a sample of one and you know the population parameters. Let's not pass judgment on the San Diego lady...a pregnancy as long as 310 days can happen, and the chance is about 3 in 1000 pregnancies.

In a 1994 study of 100 pregnancy durations selected at random, the average pregnancy duration was 270 days with standard deviation of 20 days. Does this study provide evidence that pregnancy durations have increased since the 1970s? Perform a test of significance and state the p-value.

The null hypothesis is 266 days.
The alternative something longer than 266 days
This is a one-side test, we're only interested if the durations are longer now
The test statistic is

 ( 270 - 266)
 -------------------   = 2.50 = Z = .62%
 16/100
 

This would suggest that the probability of getting a sample average of 270 if the true average is 266 is about 6 times in 1000. This is evidence for the alternative, that is, that durations are getting longer.

Things to note: why a SD of 16?

D. A note on percentages (12.7 optional)

I've been working through examples on averages, but situations involving percentages work too. The z statistic is the same, but beware of the SE of the sampling distribution. If you are working with averages, make sure the SE is for the average, percentages, make sure the SD is for percentage (i.e. use p or p-hat).
 

E. Describing Relationships between Two Variables (3.1 and optionally 3.2)

A. Graphical Summary

1. Scatterplot
A scatterplot is a two dimensional plot of data; the horizontal dimension is called X, and the vertical dimension is called Y. Each point on a scatterplot shows two values, an X value and a Y value; each point represents a single case. Explanatory variables (also known as independent variables) are by convention identified as the X value and the response variable (the outcome variable or the dependent variable) is the Y.
2. Things to think about when looking at scatterplots
Form: does it have a shape (like an egg, a circle, a cigar, a "U")
Direction: does the data have a direction (is it generally running up from the lower left to the upper right or downwards from the upper left to the lower right)
Strength: Are the points close together or scattered?
 
3. Positive and negative relationships
There is a POSITIVE relationship if above-average values of X are associated with above-average values of Y; conversely, there is a NEGATIVE relationship if above-average values of X are associated with below average values of Y.
4. Warning! Scatter diagrams only show association; but association is not causation (firefighters, fire damage)
 
F. HOMEWORK (Due 12/8/99)
Chapter 12.2 #1 and #3
Chapter 3.1 #5