*Introduction to Statistical Methods for the Life and Health Sciences*

http://www.stat.ucla.edu/~dinov/ |

**Objective**:
You will see the Central Limit Theorem in action; you also will have fun (yes, fun) learning
about confidence intervals.

__Activity 1:__

Suppose we have a box that contains tickets with values that range from 0 to 1. There are many, many tickets in this box and each ticket has its own unique value (no 2 tickets are identical). Each ticket has the same chance of being drawn from the box as any other ticket. This is a description of the uniform distribution.

1. If you could compute the average of all of the tickets in the box, what would the average be?

2. Is the average of all of the tickets in the box a population parameter (population average) or a sample statistic (sample average)?

Let’s create the probability distribution for the tickets in the box.

Type

net from http://www.ats.ucla.edu/stat/stata/ado

net install clt

clt, samples(50000) n(1) dist(u) [draw 1 ticket with replacement from the box of tickets which has a uniform distribution, 50000 times]

3. Describe the shape of the distribution. Does it look like a normal curve? Is the mean of that probability histogram the same value as the population average? If not exactly, why not? Jot down the standard deviation of the distribution.

Let’s imagine that we are going to draw some tickets at random with replacement from this box and compute the average value of those tickets.

4. Is the average of the tickets in the sample a population parameter or a sample statistic?

Now let’s imagine that we are going to repeat the process of drawing some tickets at random with replacement from this box an infinite number of times, compute the sample average each time, and create a histogram of the probability distribution of all of those sample averages.

5. What is the name of that probability distribution?

Unfortunately, we are limited to repeating the process a
maximum of 50,000 times rather than an infinite number of times. However, this is such a large number of
repetitions that we can *approximate *a theoretical sampling distribution
of sample averages from the box described above. We can choose to repeatedly draw a few as 2 tickets at random
with replacement and compute the sample average (50,000 times), or we can
choose to repeatedly draw many more tickets at random with replacement and
compute the sample average (50,000 times).

Type

clt, samples(50000) n(2) dist(u) normal [repeat the process 50,000 times, draw 2 tickets at random with replacement, compute the sample average each time, plot the histogram of the 50,000 sample averages]

6. Describe the shape of the distribution. Does it look like a normal curve? What is the mean of this distribution? Is it very close to the population mean? If it’s off a bit, why? Compute the standard deviation of this empirical sampling distribution using the information about the standard deviation from the original population distribution. Do you get the same standard deviation as the computer program got?

Type

clt, samples(50000) n(20) dist(u) normal [repeat the process 50,000 times, draw 20 tickets at random with replacement, compute the sample average each time, plot the histogram of the 50,000 sample averages]

7. Describe the shape of the distribution. Does it look like a normal curve? What is the mean of this distribution? Is it very close to the population mean? If it’s off a bit, why? Compute the standard deviation of this empirical sampling distribution using the information about the standard deviation from the original population distribution. Do you get the same standard deviation as the computer program got?

Type

clt, samples(50000) n(100) dist(u) normal [repeat the process 50,000 times, draw 100 tickets at random with replacement, compute the sample average each time, plot the histogram of the 50,000 sample averages]

9. Describe the shape of the distribution. Does it look like a normal curve? What is the mean of this distribution? Is it very close to the population mean? If it’s off a bit, why? Compute the standard deviation of this distribution using the information about the standard deviation from the original population distribution. Do you get the same standard deviation as the computer program got?

10. At what sample size did the sampling distribution of sample means look like the normal curve?

11. Please describe:

a) how this lab illustrates the central limit theorem

b) the conclusion that you draw about the value of the population mean and the mean of the sampling distribution of sample averages.

__Activity 2:__

We will be using the thatch ant dataset in this lab. We will assume that the 1199 ants in this
dataset constitute the entire population of thatch ants, with unknown
population mean m_{X
}and unknown population standard deviation s_{X}. Your goal will be to construct confidence
intervals for the population mean mass of the thatch ants (m_{X})from
simple random samples.

12. What is the population parameter in words?

Repeat the following instructions 10-15 times.

A. Type

use http://www.stat.ucla.edu/projects/datasets/thatch-ant.dta

sample 1 [SRS of 1% of the dataset]

summarize mass [get sample average and sample SD of the variable “mass”]

clear

B. Record the sample average, sample SD, and sample size.

13. For each repetition, compute the 68% CI and 95% CI for the population mean mass. Discuss with your classmates how to do this.

14. Which interval provides a bigger range of values for the population mean: the 68% CI or the 95% CI?

15. Imagine that you conducted a single study and obtained the very first sample average that you observed above. Please provide an interpretation of the 68% and 95% CI that you constructed for the population mean mass. Do you find these intervals informative for getting a handle on the value of the population mean mass?

When everyone is finished with #2 above, your TA will tell you the value of the population mean mass.

16. Count up the number of your 68% CIs that cover the population mean mass and number of 68% CIs that do not cover the population mean mass. Do the same thing for the 95% CIs. Tally up this information everyone in the lab. Collectively, you should have about 100 68% CIs and 100 95% CIs. What percentage of the 68% CIs covered the population mean mass? What percentage of the 95% CIs covered the population mean mass?

17. How would the CIs change if you increased the sample size?