1.
Significance in a Nutshell, again
Statistical significance is about deciding whether
differences observed between a sample outcome and the population parameter are
"real" or whether they might well just be due to chance. The sample
outcomes can be groups of people who were assigned “treatment” or
“control” in a randomized experiment. They can also be companies
with different characteristics that you observed, rather than people who were
treated differently. For instance, you might want to compare growth rates of
firms in the Northeast vs. all firms.
2.
Notes on Significance Again
A. Clearly identify the parameter and the outcome.
B. State the null hypothesis. This is what is being tested. A test of
significance assesses the strength of evidence (outcomes) against the null
hypothesis. Usually the null hypothesis is a statement of "no-effect"
or "no difference" and it is ALWAYS a statement about the parameter.
C. The alternative hypothesis is the claim about the population that we
are trying to find evidence in favor of. In the tax law example, you are
seeking evidence that the law is not revenue neutral. The null hypothesis would
say that the average return will not change (i.e. m= 0) ,
the alternative would say it is negative(i.e. m< 0) . Note this is a ONE-SIDED alternative
because you are only interested in deviations in one direction. (A two-sided
situation occurs when you do not know the direction, you just think the
evidence suggests something different from the null see p. 455)
D. The test statistic. It is the statistic that estimates the parameter
of interest. In the above example, the parameter is the population average and
the outcome is the sample average and the test-statistic is Z. We use Z because the distribution of
sample outcomes (i.e. averages of many, many samples) will be normal if n is
large or if the original population is normal.
The significance test assesses evidence by examining how far the test
statistic fall from the proposed null (the parameter).
The answer the Z test gives is ultimately the probability of getting an
outcome as extreme or MORE than you actually observed. So to test the outcome,
you would ask "what is the chance of getting a -$219 or lower
number?"
E. The probability that you observe is called a P-VALUE. The smaller the
p-value the stronger is the evidence against the null hypothesis. If instead
you had gotten a sample average of -$100 with the same standard deviation, the
senator may well be right. (A Z of about -1.4 has about 8% of the normal, so
here, there was an 8% chance of getting a sample with an average of -$100 or
lower).
F. On significance levels. Sometimes prior to calculating a score and finding it's P-value, we state in advance what we believe to be a decisive value of P. This is the significance level and its symbol is a (alpha). Commonly used levels of alpha are 5% (.05) and 1% (.01) significance. If your P-value is as small or smaller than the significance level you have chosen then you would say that "the data is statistically significant at level --- "
Regardless of the statistical
technique used or type of study (eg, experimental vs observational), p values
are often reported .The p value
(also sometimes called significance) is compared with the alpha level (a), which serves as the criteria for rejecting or
failing to reject the null hypothesis. However, misinterpretations of the P value are common. Part of this misinterpretation may
stem from the lack of uniformity across texts in the definition of the P value. One generally agreed upon definition of the
p-value is that we are ultimately testing the null hypothesis against a level
of significance (alpha) designated beforehand.
A.
There is nothing
“magical” about an alpha=.05.
This is arbitrary and depends on the field of research. Ideally, one should ask at what level
are you willing to believe that an outcome is “rare”. 10%, 5%, and 1% are common, but we ask
that you not use these blindly.
NOTE: THERE IS NO SHARP DISTINCTION BETWEEN “SIGNIFICANT”
AND “INSIGNIFICANT” , WE JUST KNOW THAT AS THE P-VALUE DECREASES
THERE IS INCREASINGLY STRONGER EVIDENCE AGAINST THE NULL.
B.
Tests of significance, confidence intervals, anything that relies
on the laws of probability requires that you have randomization in sampling or
experimentation. If you
don’t have a randomly drawn sample or if you couldn’t randomized
and then assign treatment, you really cannot properly use the methods in
Chapter 6. It is easy to perform
calculations, but think about the question and the quality of the data before
proceeding. Also remember; to use
Z
1. the population is known to be normally distributed and
the population SD is known; or
2. the sample size is "large" (in this textbook
> 15).
C.
So you reject a null and
state that “the data is statistically significant…” this is
evidence that there is something going on, but be careful, the effect may be
very small. This can come about
when you have a large sample size.
Examine the formula:
as
n gets larger, a very small difference between x-bar and mu can be made to be
statistically significant. When
your finding is statistically significant all you know is that your result
would be unlikely (e.g.happens only 5% of the time) if the null hypothesis were
true and that you therefore have decided to reject your null hypothesis and to
go with your alternative hypothesis. Unfortunately, this does not tell you
anything about how big of an effect is present or how important the effect
would be for practical purposes. That’s why once you determine that a
finding is statistically significant you must next decide if it is also
practically significant!
D.
"Not
statistically significant" doesn't mean that the observed differences are due to chance. Only that it would not
be surprising if they turned out to be due to chance. It may be that there is a
difference that you care about, but not enough people were included in the
study for the difference to show as statistically significant. If a sample is
too small, it's possible that there's a big and meaningful difference between
the means but hardly any difference was actually seen - just due to chance.
E.
Searching
for significance?
Too many
studies massage the data in various questionable ways that make the conclusions
doubtful. An old saying I like is "If you torture data long enough, it'll
confess". Thus, the results should not only be of statistical
significance; they must also be of practical significance. This means that they
should affect sufficient people in ways that have a meaningful effect on their
lives.