Statistics 10

1. Finishing Lecture 8: The Basic Probability Rules

1. The sample space S of some random process is the set of all possible outcomes

2. A probability is a number between 0 and 1 or the probability of some event is 0 ² P(A) ² 1, probabilities are never negative and never greater than 1.

3. The sum of all possible outcomes must equal 1 or P(S) = 1

4. The probability that an event does not occur is 1 minus the probability that it does occur. P(A^c) = 1 - P(A)

5. If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities. P(A or B) = P(A) + P(B) this also means that the two events A and B are MUTUALLY EXCLUSIVE ("disjoint")

Note: knowing that A happens completely affects the chance of B happening (B can't happen if A does) and the reverse is true, if B happens, A is completely affected (it will not happen).

Assigning Probabilities

The key to assigning probabilities is knowing all of your possible outcomes and knowing two rules:

All possible outcomes must total 1 or 100% (Where have we talked about 100% being important)

A probability must take a value 0 ² P(A) ² 1 (or 0% to 100%)

The best way to get comfortable doing this is to do it. Study the examples on pages 299-301.

In class example. This is the composition of Statistics 11/Economics 40 as of last week

	Freshman	Sophomore	Junior	Senior	Total
Male	7	20	14	3	44
Female	2	25	14	2	43
Total	9	45	28	5	87

The Multiplication Rule and Independence (pp. 301-304)

If two events have common outcomes but do not influence each other, they are independent. For example, I roll two die, one die should not have an effect on the roll of the second die. In these situations you are trying to figure out the chances of two things happening together. The chance that two things will happen equals the chance that the first will happen multiplied by the chance that the second will happen.

RULE: Two events A and B are INDEPENDENT if knowing whether or not A happens does not help in predicting whether or not B happens. IF A and B are independent, then P(B|A) = P(B) The vertical line is read "given" and the relationship suggests that it doesn't matter if you know the value P(A), P(B) is unchanged, and we can rewrite the relationship as P(A and B) = P(A) x P(B).

Note: under INDEPENDENCE, A & B can happen together, they just don't affect each other. True independence is a rarity, an ideal situation.

Example, if I roll two die and the first one is a "1", what's the probability that the second one will also be a "1"? 1/6. The probability of rolling two "1"s is 1/6 * 1/6 = 1/36. If however I had asked the question, the first one is a "even number", what is the probability that the sum of the two die will be odd. If I roll a 1 on the first die AFFECTS or INFLUENCES the probability that sum will be odd (in fact, in this case, it is zero)

Example: Role 2 Die. What can happen to die 1? What can happen to die 2?

Example: What happens to the Dow Jones on 3 consecutive trading days?

Summary -- Why do you need to know this?

Well, you might decide to become a professional gambler. Just kidding. Probability (or chance) is a tool, an important tool, for the understanding of statistics. Variability is related to chance (or we might say that in life there is chance variation in sample outcomes). As a result we are able to make generalizations from sample outcomes to population parameters with some calculated degree of certainty. In other words, we will learn to discuss sample outcomes in the language of chance.

THE RANDOM VARIABLE (4.3)

Definition -- usually denoted as X or Y it is the numerical outcome of a random process or experiment.

Example: We manufacture beer. We draw one bottle at random from the conveyor belt. That's a random experiment. The contents could be measured (e.g. volume 0 to 300ml, taste scale of 1 to 5) and these numerical variables describe the characteristics of the randomly selected bottle. These variables could be thought of as random variables, they vary with each bottle chosen.

Can you think of other examples? Random variables can be DISCRETE or CONTINUOUS

Discrete Random Variables

A discrete random variable is countable and finite. Recall some outcomes are naturally discrete such as you can't roll a 3.1 or you can't have 351.7 employees at a firm or you can't have 3.91 customer complaints today. Discrete variables jump from one whole value to the next. So one can "list" the outcomes or what are known as the possible values some random variable X can take. And one can list the probabilities of each outcome.

Discrete Random variables have probability distributions -- they are just a way of organizing outcomes and representing them graphically. A table or a graph might suffice. There are only 2 requirements:

a. probabilities must be greater than or equal to zero

b. the sum of the probabilities must be 1.

7. Continuous Random Variables

A continuous random variable can assume an infinite number of values in an interval. So for example, our beer bottles can contain any amount of beer in an interval between 0 and 300 ml.

The most commonly observed continuous random variable is the NORMAL distribution (Chapter 1.3). The probability distribution is described by a curve and the probability of any event is described by the area under the curve. We are always interested in the probability for an interval rather than the probability of an exact value. This is simply because the area under the curve at some specific point will be zero.

Notation: Greek Letter mu or m symbolizes the mean of the normal distribution, Greek letter sigma, or s , the standard deviation. In the Standard Normal that we use in Table A, the mean is =0 and the standard deviation=1.

Example: Question 4.49 from your textbook. An opinion poll asks a SRS of 1,500 adults "Do you happen to jog?" suppose that the population proportion who jog (a parameter) is p=0.15 (or 15%). To estimate p, we use the proportion p (p with a caret also called "p-hat") in the sample who answer "YES". The statistic p (p-hat) is a random variable that is approximately normally distributed with mean m = 0.15 and standard deviation s = 0.0092. Find the following probabilities:

a.P(p-hat 3 0.16)

b.P(0.14 £ p-hat £ 0.16)

Example: Suppose an automobile manufacturer claims their new SUV has mean in-city mileage of 16 miles per gallon. Suppose you write to the manufacturer and you find out that the standard deviation around that mean is 2 miles per gallon. This information allows you to formulate a probability model. So you think that the random variable "in city gas mileage" can be approximated by a normal distribution with a mean of 16 and a standard deviation of 2.

Example: Suppose you work for a magazine that tests new autos and trucks. If you were to test this SUV, what is the probability that the one you get averages less than 13 miles to the gallon? What is the probability that you would get one that gets more than 20 miles? Suppose you get one that get better than 20 miles per gallon, is your probability model necessarily wrong?

Example: Suppose you are thinking about investing some money in a mutual fund. Past data shows that the fund returned a mean of 19.8% with a standard deviation of 13.40. Suppose we know it is normally distributed. Based on this information, what is the probability that you will experience a loss (get a return of less than zero). This year-to-date, the fund has returned -9.07 (a loss). What is the probability of getting a return that low or lower? Is the model necessarily wrong?