Statistics 10

1. From last time: Bias, variability, and Statistical Inference

Recall that ultimately we work with statistics and make inferences (definition: the act of passing from statistical sample data to generalizations as of the value of population parameters usually with calculated degrees of certainty) and that bias and variability are two things that make this difficult for different reasons. Handout on Freshman survey of 269,413 freshmen designed to represent 1.1 million.

Introducing random chance into our sample selection methods helps us combat bias. Variability can be reduced by increasing the sample size (knowing what you know from Chapter 1.2, can you guess why?)

Keep in mind that a single sample could be thought of as originating from a larger, unseen, theoretical sampling distribution (page 270)

2. Probability, the study of chance

To understand why we need probability in statistics goes back to the importance of randomly assigning treatments or randomly selecting samples from a population. RANDOM in this class means that an exact outcome is not predictable in advance, but a predictable long run pattern will emerge after many repetitions. Examples: your commute home, how the stock market behaves, how people will respond to survey questions, how many are in class today.

PROBABILITY THEN is synonymous with the word CHANCE and it is a percentage or proportion of time some event is expected to happen PROVIDED we have randomness and we can repeat the event (whatever it may be) many times under the same conditions (in other words -- replicate).

3. Models of Probability (4.2)

The whole point of a probability model is to simplify some relatively complicated random process, say, how the stock market behaves, and trying to represent or model that behavior with a model -- or a mathematical description -- that we understand well. (For later: you could think of the normal curve as an ideal type of model)

A model consists of (a) listing the possible outcomes (notation f(x))

(b) assigning a probability to each outcome (notation p(x))

The set of all possible outcomes is THE SAMPLE SPACE. Example.5 students. Stock market on 3 days. First we need to identify what can happen. Then assign chances to each outcome.

We think of each outcome as an EVENT. An event can be a single outcome or a combination of outcomes.

4. The Basic Probability Rules

1. The sample space S of some random process is the set of all possible outcomes

2. A probability is a number between 0 and 1 or the probability of some event is 0 ≤ P(A) ≤ 1, probabilities are never negative and never greater than 1.

3. The sum of all possible outcomes must equal 1 or P(S) = 1

4. The probability that an event does not occur is 1 minus the probability that it does occur. P(A^c) = 1 - P(A)

5. If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities. P(A or B) = P(A) + P(B) this also means that the two events A and B are MUTUALLY EXCLUSIVE ("disjoint")

Note: knowing that A happens completely affects the chance of B happening (B can't happen if A does) and the reverse is true, if B happens, A is completely affected (it will not happen).

5. Assigning Probabilities

The key to assigning probabilities is knowing all of your possible outcomes and knowing two rules:

All possible outcomes must total 1 or 100% (Where have we talked about 100% being important)

A probability must take a value 0 ≤ P(A) ≤ 1 (or 0% to 100%)

The best way to get comfortable doing this is to do it. Study the examples on pages 299-301.

In class example. This is the composition of Statistics 11/Economics 40 as of last week

	Freshman	Sophomore	Junior	Senior	Total
Male	7	20	14	3	44
Female	2	25	14	2	43
Total	9	45	28	5	87

6. The Multiplication Rule and Independence (pp. 301-304)

If two events have common outcomes but do not influence each other, they are independent. For example, I roll two die, one die should not have an effect on the roll of the second die. In these situations you are trying to figure out the chances of two things happening together. The chance that two things will happen equals the chance that the first will happen multiplied by the chance that the second will happen.

RULE: Two events A and B are INDEPENDENT if knowing whether or not A happens does not help in predicting whether or not B happens. IF A and B are independent, then P(B|A) = P(B) The vertical line is read "given" and the relationship suggests that it doesn't matter if you know the value P(A), P(B) is unchanged, and we can rewrite the relationship as P(A and B) = P(A) x P(B).

Note: under INDEPENDENCE, A & B can happen together, they just don't affect each other. True independence is a rarity, an ideal situation.

Example, if I roll two die and the first one is a "1", what's the probability that the second one will also be a "1"? 1/6. The probability of rolling two "1"s is 1/6 * 1/6 = 1/36. If however I had asked the question, the first one is a "even number", what is the probability that the sum of the two die will be odd. If I roll a 1 on the first die AFFECTS or INFLUENCES the probability that sum will be odd (in fact, in this case, it is zero)

Example: Role 2 Die. What can happen to die 1? What can happen to die 2?

Example: What happens to the Dow Jones on 3 consecutive trading days?

7. Summary -- Why do you need to know this?

Well, you might decide to become a professional gambler. Just kidding. Probability (or chance) is a tool, an important tool, for the understanding of statistics. Variability is related to chance (or we might say that in life there is chance variation in sample outcomes). As a result we are able to make generalizations from sample outcomes to population parameters with some calculated degree of certainty. In other words, we will learn to discuss sample outcomes in the language of chance.