Statistics 10

1. Recall: Random Variables

Definition -- usually denoted as X or Y it is the numerical outcome of a random process or experiment.

Examples: tosses of a coin, rolls of a die, the market close, a lottery, the amount of fluid in a bottle of beer, the number of students present in a class.

2. Finishing Lecture 9: Continuous Random Variables

A continuous random variable can assume an infinite number of values in an interval. So for example, our beer bottles can contain any amount of beer in an interval between 0 and 300 ml.

The most commonly observed continuous random variable is the NORMAL distribution (Chapter 1.3). The probability distribution is described by a curve and the probability of any event is described by the area under the curve. We are always interested in the probability for an interval rather than the probability of an exact value. This is simply because the area under the curve at some exact point will be zero.

Notation: Greek Letter mu or m is the symbol for the mean of the normal distribution, Greek letter sigma, or s, the standard deviation. In the Standard Normal that we use in Table A, the mean is m =0 and the standard deviation s=1.

Example: Suppose an automobile manufacturer claims their new SUV has mean in-city mileage of 16 miles per gallon. Suppose you write to the manufacturer and you find out that the standard deviation around that mean is 2 miles per gallon. This information allows you to formulate a probability model. So you think that the random variable "in city gas mileage" can be approximated by a normal distribution with a mean of 16 and a standard deviation of 2.

ASK: What is does the distribution of the in-city gas mileage look like for the population of these vehicles? What percentage do we expect to be between 14 and 18 miles per gallon? What percentage do we expect to be between 12 and 20 miles per gallon?

Example: Suppose you work for a magazine that tests new autos and trucks. If you were to test this SUV, what is the probability that the one you purchased averages less than 13 miles to the gallon? What is the probability that you would purchase one that gets more than 20 miles? Suppose you were to purchase one that gets better than 20 miles per gallon, is your probability model necessarily wrong?

Example: Suppose you are thinking about investing some money in a mutual fund. Past data shows that the fund returned a mean of 19.8% with a standard deviation of 13.40%. Suppose we know it is normally distributed. Based on this information, what is the probability that you will experience a loss (get a return of less than zero). This year-to-date, the fund has returned -9.07 (a loss). What is the probability of getting a return that low or lower? Is the model necessarily wrong?

3. Random Variables have means too (4.4)

The mean of a list (from Chapter 1.2) is x-bar. It's an ordinary average that gives every value in the list equal weight. The mean of a random variable is also an average, but slightly different, it assigns probabilities to the outcomes and they do not need to be equal.

The examples above give a mean (and standard deviation) for a random variable.

Symbols: the mean of probability distribution is Greek letter mu, μ , and its standard deviation is denoted by sigma, σ , Where have you seen this before? (Chapter 1.3)

Generally, random variables are written μ_x pronounced "mu sub x" to represent the mean of any random variable x, not just normal ones. What would the symbol μ_ymean to you?

4. The mean of a discrete random variable: The Expected Value

A discrete random variable is countable and finite. Recall some outcomes are naturally discrete such as you can't roll a 3.1 or you can't have 351.7 employees at a firm or you can't have 3.91 customer complaints today. Discrete variables jump from one value to the next.

So one can "list" the outcomes or what are known as the possible values some random variable X can take. And one can calculate the probabilities of each outcome.

Discrete Random variables have probability distributions -- they are just a way of organizing outcomes and representing them graphically. A table or a graph might suffice. There are only 2 requirements:

1) probabilities must be greater than or equal to zero

2) the sum of the probabilities must be 1.

Example: What are the possible outcomes for the market close over a consecutive 3 day period? Let's let random variable X represent the number of days observed when the market closed above its previous day's high. Let's suppose the probability of a "up" closing is .6 and "not up" is .4

If we table the outcomes and calculate their probabilities:

f(X)	0	1	2	3
p(X)	.064	.288	.432	.216

Note that .064 + .288 + .432 + .216 = 1.0 and remember (.4³ = .064, .6³=.216, 3*(.4²*.6)=.288, and 3*(.6²*.4) = .432)

What would the mean of this random variable be? We know that the market can behave in this manner in any 3 consecutive trading days, but what is "most likely to happen?"

To find the mean of this probability distribution, or the mean of this random variable X, multiply each possible outcome by its probability and add up the products:

The formula (see page 327) μ_x = x₁p₁+x₂p₂+ x₃p₃+ … + x_np_n

So for the example above, μ_x is (0*.064) + (1*.288) + (2*.432) + (3*.216) = 1.8

So in any 3 day trading period, you expect to see a little less than 2 of them closing "up". Since this is a discrete random variable, you expect between 1 and 2 up days in every 3 examined.

5. The mean of a continuous random variable

You usually need some calculus to calculate the mean for a continuous random variable unless it comes from a very simply symmetric distribution, such as a uniform distribution (it looks like a brick). In this class, it is generally given to you as the mean of a normal distribution. Remember, the normal distribution is a continuous probability distribution and normal random variable is a continuous random variable. So in the examples on the SUVs and the mutual funds above, you would be expected to make statements about the distribution based on information given about the mean and standard deviation of the variable.

6. The Law of Large numbers

Was developed by Jacob Bernoulli, a Swiss mathematician. He wrote "In any chance event, when the event happens repeatedly, the statistics will tend to prove the probabilities."

Most people, when they hear the terms probabilities and statistics want to run away. It's not that bad. Probabilities, in his mind, are simply theoretical results, or in the class PARAMETERS. Statistics are nothing more than actual results coming from our samples. Inserting the definitions, we have:

"In any chance event, when the event happens repeatedly, the majority of sample outcomes will tend to be near the theoretical parameter."

This is basically common sense converted to mathematics. So if an experiment (like a sample, or like a coin toss, or a roll of a die) is performed repeatedly under identical conditions, the relative frequency of an event that occurs approaches its probability of occurrence with increasing accuracy as the number of trials (or the sample size) becomes “large.” For example, the experiment could be a coin toss, and the event is the occurrence of a head. The experiment could be observing the close of the Dow Jones Industrial Average and noting the frequency with which it close "up".

7. Rules and properties of the mean of a random variable

Rule 1. μ_a+bX = a + b( μ_x )

If a is a constant and b is a constant, then the mean of random variable is the constant plus the random variable and the mean of a constant times a random variable is the constant times the random variable.

Example: You sell real estate and this is your chance of selling a certain number of homes in a given week:

Homes Sold	0	1	2	3	4
Probability	.1	.5	.3	.1	.0

The expected number sold (mean) is 1.4 homes. We get that from (0*.1) +(1*.5) +(2*.3) + (3*.1)+(4*.0) If you must pay a $2500/week to the firm regardless of what you sell and you get $10,000 for each home sold, to find your expected net earnings: (-2500) + 10000(1.4) or $11,500.

Rule 2. Given random variables X and Y, μ_{x +y} = μ_x + μ_y . You can add the means of two different random variables together.

You fall in love and your partner also learns to sell real estate, your partner's chance of selling a certain number of homes in a given week is:

Homes Sold	0	1	2	3	4
Probability	.1	.1	.5	.2	.1

You partner's expected number sold is 2.1 homes per week. But your partner does not pay anything to the firm regardless but only gets $4000 for each home sold. What are your combined weekly earnings?

11,500 + (2.1*4,000) or $19,900

before taxes of course.