Statistics M11/Economics M40 Lab 3: Probability Exercises and Distributions
DUE FEBRUARY 16, 2001
Purpose: The purpose of this lab is to become comfortable with random processes using Stata.
Data: There is no data for this assignment. You will generate your own data. Unfortunately, you will need to work on a computer that is either in the Statistics Lab (which allows you to save things to its hard drive) or on your own home computer.
Introduction: In Chapter 4, you are introduced to the concepts of randomness and chance in statistics. In this lab, you will see illustrations of probability and construct probability distributions using simulations.
Start up Stata and issue the command:
heads2 1000 .5
If the program "heads2" is installed on your computer, a graph should appear and it should look something like Figure 4.1 in your textbook (page 291). Ask yourself "what is this graphic trying to teach me?"
This is a graph of the cumulative (total) proportion of tosses that lands "heads" for a fair coin, if you were to toss it 1,000 times.
If "heads2" is not working on your computer and you either in the Statistics Lab or at home, follow these instructions:
ASSIGNMENT
heads2 1000 .5
four variables are created (see your variable window). We're only interested in the variable heads. I am going to have you issue the command "tabulate heads", but before you do, answer this question:
(a) Question: If I toss a coin 1,000 times, how many heads do I expect or what percentage do I expect to be heads?
Hopefully you wrote 500 or 50%. Now issue the command:
tabulate heads
For the variable heads, 1=heads 0=tails. Note how many heads you got (you will not get the same as your classmates) and issue this command:
heads2 10 .5, nograph
This does the same thing as before, but no graph is produced. I want you to issue the command "tabulate heads" again, but before you do, answer this question:
(b) Question: If I toss a coin 10 times, how many heads do I expect or what percentage do I expect to be heads?
Hopefully you wrote 5 or 50%. Now issue the command:
tabulate heads
note how many heads you got (you will not get the same as your classmates) . Issue this command:
heads2 100 .5 10, nograph
What this does is like the following: let us make 100 people toss a fair coin 10 times. The results of this are stored in the variable heads.
Now issue a:
tabulate heads
What you will see is a distribution of the count of heads. That is, of the 100 people, when some tossed a coin 10 times, they only got 1 head, some got 2, some got 3, some got 5, some got 10 etc.
( c) Answer the question: What percentage of the 100 people got exactly 5 heads? What percentage got exactly 4 heads or exactly 6 heads? What percentage got 3 or 7? What percentage got 2 or 8? What percentage got 1 or 9 heads? What percentage got 0 heads or all 10 heads?
And before you go on to the next part, issue the command
graph heads, bin(11) xlab t1("Simulation of 100 people tossing coins") normal
(d) And print out this graphic. Question: Does this look at all familiar?
heads2 1000 .5 10, nograph
tabulate heads
graph heads, bin(11) xlab t1("Simulation of 1000 people tossing coins") normal
heads2 10000 .5 10, nograph
tabulate heads
graph heads, bin(11) xlab t1("Simulation: 10000 people tossing coins") normal
heads2 100000 .5 10, nograph
tabulate heads
graph heads, bin(11) xlab ti1("Simulation: 100000 people tossing coins") normal
The last one might take a while and make sure "normal" is on the same line as the word graph. (a) For each of the simulations, please printout the histogram of results and please include the percentages for:
Exactly 5 heads
Exactly 4 heads or Exactly 6 heads
Exactly 3 heads or Exactly 7 heads
Exactly 2 heads or Exactly 8 heads
Exactly 1 head or Exactly 9 heads
Exactly 0 heads or Exactly 10 heads
(b) Compare these percentages to a set of numbers on Page T-9 (back of your book). Check the rightmost column, the one titled ".50" and look at the set starting with the number .0010. This is the theoretical probability distribution of how a fair coin behaves in 10 tosses. How does your final table, involving 100,000 "samples" compare?
Market, in the lecture examples, behaves like a biased coin being "up" 60% of the time and "not up" 40% of the time. We can use the heads2 program to see how our theoretical discrete probabilities match up with a simulation.
Issue the command:
heads2 1 .6 3, nograph
tabulate heads
You would interpret the result as: in a single 3-day trading period, there were (result of the tabulate) up days.
To see how the market might behave in a little more than a year:
heads2 100 .6 3, nograph
tabulate heads
And to see how the market might behave over a very long period of time:
heads2 100000 .6 3, nograph
tabulate heads
Now, issue a summarize and tell me what are the mean and standard deviation for the number of "up days" in 3 days:
summarize heads
(a) Question: How does your simulated result compare with the theoretical result calculated during lecture? Is it very similar or very different?
You should start thinking about any single 3 day period as a "single sample" and the theoretical probability distribution as the population of "all possible samples".
RECAP