Statistics 10

1. Significance in a Nutshell, again

Statistical significance is about deciding whether differences observed between a sample outcome and the population parameter are "real" or whether they might well just be due to chance. The sample outcomes can be groups of people who were assigned “treatment” or “control” in a randomized experiment. They can also be companies with different characteristics that you observed, rather than people who were treated differently. For instance, you might want to compare growth rates of firms in the Northeast vs. all firms.

2. Notes on Significance Again

A. Clearly identify the parameter and the outcome.

B. State the null hypothesis. This is what is being tested. A test of significance assesses the strength of evidence (outcomes) against the null hypothesis. Usually the null hypothesis is a statement of "no-effect" or "no difference" and it is ALWAYS a statement about the parameter.

C. The alternative hypothesis is the claim about the population that we are trying to find evidence in favor of. In the tax law example, you are seeking evidence that the law is not revenue neutral. The null hypothesis would say that the average return will not change (i.e. m= 0) , the alternative would say it is negative(i.e. m< 0) . Note this is a ONE-SIDED alternative because you are only interested in deviations in one direction. (A two-sided situation occurs when you do not know the direction, you just think the evidence suggests something different from the null see p. 455)

D. The test statistic. It is the statistic that estimates the parameter of interest. In the above example, the parameter is the population average and the outcome is the sample average and the test-statistic is Z. We use Z because the distribution of sample outcomes (i.e. averages of many, many samples) will be normal if n is large or if the original population is normal.

The significance test assesses evidence by examining how far the test statistic fall from the proposed null (the parameter).

The answer the Z test gives is ultimately the probability of getting an outcome as extreme or MORE than you actually observed. So to test the outcome, you would ask "what is the chance of getting a -$219 or lower number?"

E. The probability that you observe is called a P-VALUE. The smaller the p-value the stronger is the evidence against the null hypothesis. If instead you had gotten a sample average of -$100 with the same standard deviation, the senator may well be right. (A Z of about -1.4 has about 8% of the normal, so here, there was an 8% chance of getting a sample with an average of -$100 or lower).

F. On significance levels. Sometimes prior to calculating a score and finding it's P-value, we state in advance what we believe to be a decisive value of P. This is the significance level and its symbol is a (alpha). Commonly used levels of alpha are 5% (.05) and 1% (.01) significance. If your P-value is as small or smaller than the significance level you have chosen then you would say that "the data is statistically significant at level --- "

3. Common problems and misconceptions about statistical significance

Regardless of the statistical technique used or type of study (eg, experimental vs observational), p values are often reported .The p value (also sometimes called significance) is compared with the alpha level (a), which serves as the criteria for rejecting or failing to reject the null hypothesis. However, misinterpretations of the P value are common. Part of this misinterpretation may stem from the lack of uniformity across texts in the definition of the P value. One generally agreed upon definition of the p-value is that we are ultimately testing the null hypothesis against a level of significance (alpha) designated beforehand.

A. There is nothing “magical” about an alpha=.05. This is arbitrary and depends on the field of research. Ideally, one should ask at what level are you willing to believe that an outcome is “rare”. 10%, 5%, and 1% are common, but we ask that you not use these blindly. NOTE: THERE IS NO SHARP DISTINCTION BETWEEN “SIGNIFICANT” AND “INSIGNIFICANT” , WE JUST KNOW THAT AS THE P-VALUE DECREASES THERE IS INCREASINGLY STRONGER EVIDENCE AGAINST THE NULL.

B. Tests of significance, confidence intervals, anything that relies on the laws of probability requires that you have randomization in sampling or experimentation. If you don’t have a randomly drawn sample or if you couldn’t randomized and then assign treatment, you really cannot properly use the methods in Chapter 6. It is easy to perform calculations, but think about the question and the quality of the data before proceeding. Also remember; to use Z

1. the population is known to be normally distributed and the population SD is known; or

2. the sample size is "large" (in this textbook > 15).

C. So you reject a null and state that “the data is statistically significant…” this is evidence that there is something going on, but be careful, the effect may be very small. This can come about when you have a large sample size. Examine the formula:

as n gets larger, a very small difference between x-bar and mu can be made to be statistically significant. When your finding is statistically significant all you know is that your result would be unlikely (e.g.happens only 5% of the time) if the null hypothesis were true and that you therefore have decided to reject your null hypothesis and to go with your alternative hypothesis. Unfortunately, this does not tell you anything about how big of an effect is present or how important the effect would be for practical purposes. That’s why once you determine that a finding is statistically significant you must next decide if it is also practically significant!

D. "Not statistically significant" doesn't mean that the observed differences are due to chance. Only that it would not be surprising if they turned out to be due to chance. It may be that there is a difference that you care about, but not enough people were included in the study for the difference to show as statistically significant. If a sample is too small, it's possible that there's a big and meaningful difference between the means but hardly any difference was actually seen - just due to chance.

E. Searching for significance?

Too many studies massage the data in various questionable ways that make the conclusions doubtful. An old saying I like is "If you torture data long enough, it'll confess". Thus, the results should not only be of statistical significance; they must also be of practical significance. This means that they should affect sufficient people in ways that have a meaningful effect on their lives.