Expected Values Continued

EXPECTED VALUES CONTINUED

We might also start out with elements in a sample where we recode from one type of scaling into another so that we track whether or not something occurred (as opposed to how much occurred).

Example: We might originally have a box like a roll of a die where there are six outcomes possible but all we are interested in is whether or not we roll a six. We can recode our box into a new box that contains 6’s versus everything else. This is called a binomial outcome.

Binomial outcomes: When there are only two outcomes of interest

Given a binomial random variable with n trials and each trial having probability p of success:

The expected value is µ = np

The standard error is

Example: Supposed we tossed a fair coin 10 times. The expected number of heads would be 10(.5) = 5. The S.E. would be the squareroot of (10(.5)(1 - .5)) = 1.6. On one toss the SD would be squareroot of (.5*.5)=.5

Using the approach from the book:

The expected value is 10 X (average of box = .5) = 5 five heads

The SD is

or:

And the SE =

The book refers to the situation where you are interested in only one of the several outcomes possible to as classifying and counting

Example: In 60 draws at random from a bag containing 3 wild balls and 2 yellow balls, how many times can we expect to draw a wild ball. This can be viewed as 60 draws of a ball from a bag when the probability of drawing a wild ball is 3/5.

The expected value is np = 60(3/5) = 36.

The S.E. is square root (np(1-p)) = squareroot of (60*.6*.4) = 3.8

In 10 draws, the expected value is 10(3/5) = 6. The SE is the squareroot of (10*.6*.4) = 1.5. So in 10 draws about two thirds of the time we expect to observe 4.5 to 7.5 wild balls drawn.

Combining estimates with what we know from the normal curve

Just as we did before with observed scores and their standard deviations where we translated scores into z-scores in order to use the Normal Table and estimate the percentile, we can treat the expected value and their standard errors in a similar fashion

Example: Remember our 3-shell game where we would win $10 for picking the right shell and lose $5 for picking the wrong one. There we calculated that with 30 tries we would on average have a net gain of $0, with a S.E. of $38.72. We're feeling lucky. What is the chance that we will win $50.00 or more in 30 tries if we were to play?

z = (50-0)/38.72 = 1.29

In the Table, this is approximately 80.64 (for z = 1.30)

We need to do some further calculations:

100 - ((80.64/2) + 50) = 9.68

So we have 9.68% chance of winning $50.00 or more if we were to play.

The probability density histogram

In Chapter 18, the book demonstrates probability histograms and the normal curve.

You have previously used histograms to graph outcomes. But histograms can also be used to graph chance.

As the number of trials grows large the probability histogram for the sum of draws from a box approximates the normal curve (But this does not hold true for products of draws from a box)

3. Points to remember

even if we are uncertain about the outcomes of a particular event

over the long run, if repeated trials occur, the probability distribution will approximate a normal curve

This is the basic tenet of frequentist statistics, the philosophy of statistics that you are learning--that each event may not be predictable when taken alone, but collections of random events can take on predictable forms.

There are different schools of statistical philosophy and if you continue on in your statistical training you will learn about them too.

Recap

We can create hypothetical distributions based on our knowledge of the probability of an event occurring.

Example: We don’t have to actually play the shell game to estimate what we could expect to win or lose if we did play.

Example: In developing countries, where financial resources are scarce, health ministries try to forecast which is less expensive: immunization for tetanus among pregnant women or medical care for tetanus cases in newborns. To do this, estimates have to be generated of the number of tetanus cases that would occur in the absence of immunization

The distribution that we create has a center or mean, the expected value of which is the probability of the event occurring

This distribution also has spread or variance.

The spread is due to chance.

The spread is dependent on two things

The spread or S.D. of the box

The number of tries or draws from the box

With very few draws, our observation can swing wildly from what we expect to see (if we flip a coin twice, we expect one head, but are not surprised by 2)

With many, many draws our observations should cluster closely around what we expect (if we flip a coin 10 times, we would not trust the same outcome as above--all heads--but expect instead to see 4 to 6 heads clustered around the expected value of 5)

We can use these estimates of spread due to chance to decide how much spread we would expect to see if we have specified the box correctly

We can also use these estimates of spread for forecasting (how much would I win or lose on average if I played this game 1000 times?)