Statistics 10
Lecture 9


LECTURE 9: BOX MODELS, EXPECTED VALUES AND STANDARD ERRORS

A. Background

Can we predict the future? In a way, yes. For example, it's Friday and if you commute, you'd probably predict it will take you longer to drive home tonight than it does on other nights. If you normally commute, say you expect your average commute to be 20 minutes. But sometimes it's longer, sometimes it is shorter.
A problem with our predictions is there is always the chance for error. So even though we know tonight's commute will take longer, it's tough to say exactly how much longer for any single commute. Over many commutes though, we could get a sense of how frequently (what percentage of our commutes) we're close to the average, how frequently (what percentage of all our commutes) it's a short commute and how frequently (what percentage of all our commutes) it's a long commute.
A better word for prediction in the context of Stat 10 is Expectation. The term Expected Value is like the average. The term Standard Error is like the standard deviation and we'll use it to help us determine what percentage of time we're close to the expected value or far away from it.

B. Box Models (16.4)

This is just a tool to help you understand Chapter 17.
Three things to think about:
    1. What are the possible outcomes?
    2. How many (or what probability or percentage) of each outcome?
    3. How many draws do I get?
Example from Chapter 16 p.286, problem 7. The score will be like the sum of 25 draws from a box with tickets that read +4, -1, -1, -1, -1.

C. The Expected Value (17.1)

Definition.
The EXPECTED VALUE in Chapter 17 is the number of draws from a box times the average of a box. The draws must be random with replacement for this to work.
Associate it with the idea of a "most likely outcome"
A basic example: a coin toss -- it has 2 outcomes. Head or Tails. Suppose we're interested in the count of heads in some number of tosses. We expect 50% for each outcome (i.e. half heads, half tails). The average of the box is .50 or 1/2 or 50%.
In a situation of 10 tosses (draws), you wind up with an expected value of 5 (10 times 1/2). Or think of it as 5 heads is the most likely outcome.
Example: Suppose you are a sales executive and market research information suggests that these are your sales estimates for this month.
                UNITS SOLD	300	500	750
                PROBABILITY     1/3     1/3     1/3

The expected number sold (average) is 516.67 units. That is (300x 1/3) + (500 x 1/3) + (750 x 1/3) = 516.67 and multiplied by 1 (a single draw).

The expected number sold in 12 months would be
516.67 * 12 = 6,200 (approximately)

Possible interpretations: in a given month (assuming all months are the same) you'd expect to sell a little over 516.67 units. In 12 months, you'd expect to sell 6200 units.

Pushing the example from Chapter 16 p.286, problem 7 forward to make it a Chapter 17 question. The score will be like the sum of 25 draws from a box with tickets that read +4, -1, -1, -1, -1. What is the expected value (most likely outcome). Well, if you have 25 draws, it like getting 5 of each or (5*+4) + (5* -1) +(5 * -1) + (5*-1) +(5*-1) = 0. So on average, the expected value is zero.

 

D. The Standard Error (17.2)

What is suggested is this:

Actual Outcome (observed value) from some draws
= expected value + chance error

chance error is the amount above or below the expected value.

Think about tossing a coin ten times. If I toss it ten times and get 9 heads, you might think I'm extremely lucky or I'm cheating.

If I toss a coin ten times and get 6 heads, you probably wouldn't think I was extremely lucky or that I was cheating. 6 seems reasonable, 9 doesn't. This is where the chance error component enters.

The standard error is the chance error. An outcome (sum) from some number of draws will be around an expected value but off by chance error which should be close in magnitude to the standard error.

Formula: standard error = square root (number of draws)
multiplied by the standard deviation (SD) of the box.

Remember, Standard deviation is a measure of spread. What the formula suggests is that the more draws you make, the larger the standard error.

Example: 4 draws, the multiplier is 2, 9 draws it is 3, 25 draws it is 5 and so forth.

Note: the standard error is not the same as the standard deviation. The SD is calculated for lists, but the SE is for some kind of chance process, like a lottery. The SD is part of an SE though.

Pushing the example from Chapter 16 p.286, problem 7 forward to make it a Chapter 17 question related to the standard error. The score will be like the sum of 25 draws from a box with tickets that read +4, -1, -1, -1, -1. What is the expected value (most likely outcome). Well, if you have 25 draws, it like getting 5 of each or (5*+4) + (5* -1) +(5 * -1) + (5*-1) +(5*-1) = 0. So on average, the expected value is zero. What is the standard error for this random (or chance) process? We need to take the square root of the number of draws, so the square root of 25, which is 5 and multiply it by the SD of the tickets in the box. In this case, the numbers are: +4, -1, -1, -1, -1 and the standard deviation of that list is = 2. Therefore, the standard error is 5 times 2 or 10.

Interpretation, we expect an average of zero plus or minus 10.

E. Using the Normal Curve (17.3)

This section ties it all together. You can borrow the normal curve to make statements about random processes (such as draws from a box).

All that is required is that you (a) calculate the expected value and (b) calculate the SE.

Then, you can calculate standard units with a familiar formula:

   Z = (observed value - expected value)
        --------------------------------
                 SE

Example. Let's go back to the 9 heads in 10 tosses of a coin idea. The SE for the coin toss situation is SQRT(10) x SQRT(.5 x .5) (read 17.5 for the specifics). Let's see how likely it is to get 9 heads in 10 tosses.

SE = 1.5811 and Z = ((9 -5) / 1.5811) = 2.52 or about 2.55. The area between + and - 2.55 is 98.92% which leaves 1% total outside of the area. So the chance of getting 9 heads or better is about 1/2 of a percent. The chance of getting 6 heads or more is about 25%

Your intuitive sense works well. The combination of the expected value, standard error, and normal curve validates your suspicions.

This same method can be used to figure out chances in all kinds of situations.

Pushing the example from Chapter 16 p.286, problem 7 forward to make it a Chapter 17 question related to the standard error. The score will be like the sum of 25 draws from a box with tickets that read +4, -1, -1, -1, -1. The expected value is zero and the standard error is 10. Suppose someone "draws 25 tickets" from this box and gets a sum of 20. What is the chance (or percentage of times) we would get a 20 or more if we were expecting to get a 0?

Z = (20 - 0) / 10 = 2, so a Z score of 2. If you look up a Z=2 on your table A-105, this corresponds to 95.45% between +2 and -2 Z so the percentage in the "upper tail" (that is a Z of 2 or more) would be about 2.275% (one half of 100-95.45).

 

F. Homework (Due 2/18/00) for Chapter 17