Ó
2004, S. D. Cochran. All rights reserved.MEASUREMENT ERROR
So far we have learned a few key concepts that are relevant to this next section
We've started to think about distributions
Sample distributions--the collections of values we observe in our sample
Normal distribution--the probability density curve
These distributions have a central point, a middle, that we can define in various ways
These distributions have variability around the middle point--the spread, the variance
Now we are going to carry the idea of distributions, or uncertainty, a step further. Whenever we measure or observe something, the value we obtain can also be thought of as an element in a distribution of all the possible observations that we might have made
In that hypothetical distribution, there is a central point
Like all distributions, there is also variability
Example: You meet somebody for the first time and you make a judgment about how likeable a person she is. You meet her another time and she seems a little different. Someone else meets her and has a completely different take on her. She is the same person despite these different perceptions of her. Which observation is real? After meeting her several times, you longer have the jarring sense that she is different each time; gradually she seems to become more stable in who she is.
You have intuitively divided your observations of her into two parts: that which is consistent over time (which you label the real her) and that which changes over time in trivial ways (like the backpack she carries once, the bandaid on her finger). We could say the first part you treat as real; the second as just random variation.
Let's say you end up liking her, seeing her quirks as pleasant. Your friend says that's just your bias--you tend to like everyone you meet no matter who they are. Your friend is adding a third component to your observations.
Let's outline them: Your perception of her is a function of who she real is + trivial variation + your general tendencies or bias to like people
Statisticians also think about their observations, or measurements, as having three components.
Observed score = True value + Chance Error + Bias
The first part is true score--the part of some observed value that is absolutely real
The second part is chance error
These are differences that tend to show variation around a central point; the central point is the true score--if we measure something repeatedly we won't get the same answer but it will be centered around a particular score
Chance error is bidirectional--the pertubations it causes both inflate and deflate the observed score
Chance error itself can be thought of as having its own distribution
Sometimes error in a measurement is large, sometimes small
Sometimes it adds, sometimes it subtracts
You can think of the size of a chance error as a deviation from no error at all
Observed scores in a distribution that are very far from the average of the distribution are referred to as outliers
Outliers have a disproportionate effect and so statisticians disagree over whether or not outliers should be thrown out
Example: Imagine you are asked to sprint a 100 yard dash 10 times. 9 of the 10 times go just fine, you come in somewhere between 13 and 17 seconds. Once you trip and fall, skin your knee, and take 5 minutes to get across the finish line. The time for that trial is an outlier. Should we include it. Some would argue that it reflects the total range of your performance and it should be included. Some would argue that it is such an aberration that to include distorts your true performance capabilities in the 100 yard dash.
There is no hard and fixed rule
The third part of an observed value is bias or systematic error
Bias is unidirectional
Positive bias inflates a score
Negative bias decreases a score
Example: Imagine that we measure inches with a ruler that is mismarked so that 13 inches is indicated as being 12 inches. The bias is one inch. Every measurement using that ruler will biased by the same amount in the same direction.