z-tests

Ó 2004, S. D. Cochran. All rights reserved.

Z-TESTS

Z-tests

The formula for a z test is:

The mathematical formula is:

Where do we get the components of this equation?

Observed refers to our data

We can observe an average, this is the mean of X.

We can observe a sum also

The expected part refers to our expectations under the null hypothesis

If we think only chance is operating, and chance on average both adds and subtracts about the same from scores, then if scores are centered at zero or if we are considering the difference between two scores (and we hypothesize there is no difference), our expected is 0

The expected can also not be zero. Example: The average IQ is 100. We might expect in any group that the average IQ is 100. In this case expected would be equal to 100. We could also expect, though very rarely, that the exact difference between two means is not zero but some other number.

Here the expected is referred to as µ₀ (mu sub zero), the population mean.

The SE also comes from our data via the SD. It is an estimate of expected chance variation around the expected value. The effect of dividing by the SE weights the difference between the observed and expected values to calibrate it to the construct we are studying.

This is the standard error of the means, or (sigma sub mean of X).

Notice what we are doing here. We have created an equation where in the denominator is our expected average variation due to chance. In the numerator could be something that is very similar--a deviation of an observation from our expected value. If only chance is happening, the expected value of the deviation is 0 and it’s deviation from that expected value is on average the average deviation due to chance. By dividing in this way we are figuring out how deviant our observation is from what we expect. It should be zero, but will be bigger or smaller simply due to chance.

It turns out that the equation (observed - expected)/SE is distributed as a normal distribution with mean of 0 and standard deviation of 1 when the sample size of our data is relatively large, say more than 100 observations. This is referred to as the z-distribution. The z-distribution looks identical to the normal distribution. So about two thirds of the time the result should be ± 1.

How do we use the z-test?

The z value that we obtain indicates how many SE the observed value differs from the expected value. We can then use the Normal Table to attach a percentile to this z score.

The percentile is referred to as P, and is the probability that we would observed this value or farther away from the expected value if in fact the expected value were true and what we are looking at is simply chance variation.

Example

Your friend offers to play a shell game with you: three red cups with a spool under one. If you call it correctly, you get $3; if you call it wrong you lose a $1. You play the game 100 times and walk away having lost $27. Has he cheated you?

What is our research hypothesis? Your friend has cheated you.

We need two mutually exclusive and exhaustive statistical hypotheses:

The game is fair, the null hypothesis:

In 100 trials you should be win about 33 times and lose about 67 times so you should have 33*3 + 67*(-1) = $32

The game is unfair--the alternative hypothesis

You win less than $32.

Notice that we have said nothing about the possibility that you might have won more than $32. That's the other tail. It doesn't make sense with this research hypothesis, so if you had won more than $32 dollars we would have included it in the null hypothesis (which is that the your friend did not cheat you). Can you think of a different research hypothesis that would be two tailed?

The SE is derived from the SD

The SD = (3 - minus 1)*square root (.33 * .67) = 1.88

The SE = square root (100) * 1.88 = 10 * 1.88 = 18.80

So z = (-27 - 32)/18.8 = -59/18.8 = -3.14

So, the amount you lost is 3 SEs below what you would expect to win

To get the P associated with that, we look in the normal table. A z of 3 is associated with an area of 99.73. That leaves (100 - 99.73 = 0.27) .0027 in the two tails. We have a one-tail hypothesis, so we divide that by 2 to get our P = .0027/2 = .001 approximately. So the P, or chance probability, associated with the outcome we observe is about 1 in a thousand

P is the probability that losing $27 when you expect to win $32 at this game over 100 trials can occur when the game is fair, that is when your friend is not cheating you, and in this case would occur, oh, about 1 in a thousand times.

So what do you conclude about the plausibility of your null hypothesis? Because it is very, very unlikely that the null hypothesis is true we REJECT it.

Now, given that we have decided that our null hypothesis is very unlikely to be true, what do we conclude about the logical alternative? We conclude that it could be true--we accept our alternative hypothesis.

This does not mean that the null hypothesis is wrong and the alternative correct. It could be, though it is very, very unlikely that the null hypothesis is correct, and that we are in error in accepting the alternative hypothesis. But it is a risk we are willing to take.

Now we can cycle back to our original research hypothesis--did your friend cheat you? The statistical test suggests unfortunately that is true.

Final thoughts on the z-test

If what we observe is exactly what we expect under the null hypothesis, then the numerator is zero, the z-score is 0 and the probability, P, of achieving that response or one stronger is 1.

The null hypothesis also dictates that the two values should be the same—but we know that other things might be going on, such as bias or chance. For this equation, we assume that bias is zero. That makes all the variation simply due to chance, if the null hypothesis is correct. The SE is used to weight for the size of chance.

The null hypothesis is always possibly true, but we reject it as plausible when it is unlikely to be true. The point at which science has typically drawn the line is P < .05. That is if null hypothesis is likely to be true less than 1 in 20 times when we observe a score as far away from the expected score or possibly farther, then we call the result statistically significant and we reject the null hypothesis. Rejecting the null hypothesis leaves us only one option, to accept the alternative (this is our research hypothesis).

As z gets larger, P gets smaller.

We can never state with certainty that our observed differs from our expected, because there are two reasons why that may happen (true difference and chance error) and no way to partition how much is due to true differences and how much is due to chance error. By hypothesizing there is no difference, all we are left to evaluate is the plausibility of chance error as the reason for any differences we observe.