inclassquiz2

In-class Quiz 2: Friday, March 12

Explain why it is better to have a random sample of size 10 than one of size 5 for estimating the mean of a population.

The main reason: there is less variability in an estimate based on a sample of size 10 than there is for one of size 5. This means the estimator based on n=10 is more precise and is likely to be closer to the population mean. For example, if you use the average to estimate the mean, the standard error when n = 10 is sigma/sqrt(10), and this standard error is roughly 70% the size of the standard error you'd get if if you'd used only 5 observations.

Note that both estimators are unbiased. Many of you said that the estimator with n=10 would be "more accurate". But accuracy means "tends to hit the target" and they both tend to do that. The difference is that the n=10 estimator tends to be closer.

Remember, being a statistician means never having to say you're certain. In fact, often it's a bad thing. So beware of writing something like "the estimator for n=10 will be closer to the true population mean than the estimator for n=5". It is possible, and in fact not highly unlikely, for the estimator based on n=5 to produce an estimate closer to the true value than for n=10. However, it is more likely to happen the other way around. Look at it like this: If the population SD is sigma=1, then in 95% of all samples of size 5, the n=5 estimator will produce a result that is within 1.96*(1/sqrt(5))= .88 units of the true value. On the other hand, 95% of all samples of size 10 will have an estimator be within 1.96*(1/sqrt(10)) = .62 units of the true value. The n=5 estimator tends to stray a little bit further from home.

Some of you mentioned the Central Limit Theorem. The CLT becomes a factor to consider when we're computing probabilities. However, the two main facts: (1) the average is an unbiased estimator of the true mean whether n=5 or n=10 and (2) the average is more precise if n=10 than if n=5 --- are true regardless of the probability distribution of the population. Still, it is worth nothing that if the population is NOT normal, then the estimator based on n=10 then the normal distribution will provide a better approximation when doing probabilitiy calculations (for example, determining the width of a confidence interval or calculating a p-value.)

If I were grading this, you'd get full credit if you mentioned only that the standard error is smaller, and you'd get partial credit if the only thing you mentioned was that the central limit theorem says the sampling distribution is better approximated by a normal distribution when n=10. (The reason this is only worth partial credit is that it is only useful when (a) the population distribution is not normal and (b) you need to calculate probabilities concerning your estimator.

Is a large, non-random sample of n=100 better than one of size 10 for estimating the mean of a population?

No, it's not. Most of you got this one right. If the sample is non-random, then it could be biased. And getting a large biased sample is not helpful at all. One of you said it better than I could: "quality is more important than quantity."

Some of you said that if the population is small, say close to 100, then the n=100 estimator would be prefered since you're seeing almost all of the population. This is true ONLY if the population size is 100. Otherwise, you will still get a biased sample. Think of this way: since the sample is non-random, it could exclude any minority group. So even if you included 90% of the population, if you excluded that 10%, you'd have a biased view of the population. Suppose you decided to take a survey of whether people liked hamburgers for dinner, but deliberately excluded vegetarians. Even if you asked ALL non-vegetarians, you'd have a biased view of how the population felt about hamburgers.

Some of you said an even more wrong answer: bigger means you're seeing more of the population and therefore it is always better. But for the same reason as described above, this is not true if the sample is biased.