1. From last time: Bias, variability, and
Statistical Inference
Recall
that ultimately we work with statistics and make inferences (definition: the
act of passing from statistical sample data to generalizations as of the value
of population parameters usually with calculated degrees of certainty) and that
bias and variability are two things that make this difficult for different
reasons. Handout on Freshman survey of
269,413 freshmen designed to represent 1.1 million.
Introducing
random chance into our sample selection methods helps us combat bias. Variability can be reduced by increasing the
sample size (knowing what you know from Chapter 1.2, can you guess why?)
Keep
in mind that a single sample could be thought of as originating from a larger,
unseen, theoretical sampling distribution (page 270)
2. Probability,
the study of chance
To understand why we need probability in statistics goes back to
the importance of randomly assigning treatments or randomly selecting samples
from a population. RANDOM in this class
means that an exact outcome is not predictable in advance, but a predictable
long run pattern will emerge after many repetitions. Examples: your commute home, how the stock market behaves, how
people will respond to survey questions, how many are in class today.
PROBABILITY THEN is synonymous with the word CHANCE and it is a
percentage or proportion of time some event is expected to happen PROVIDED we
have randomness and we can repeat the event (whatever it may be) many times
under the same conditions (in other words -- replicate).
3. Models of Probability
(4.2)
The whole point of a probability model is to
simplify some relatively complicated random process, say, how the stock market
behaves, and trying to represent or model that behavior with a model -- or a
mathematical description -- that we
understand well. (For later: you could
think of the normal curve as an ideal type of model)
A model consists of (a) listing the possible outcomes (notation f(x))
(b) assigning a probability to
each outcome (notation p(x))
The set of all possible outcomes is THE SAMPLE
SPACE. Example.5 students. Stock market on 3 days. First we need to
identify what can happen. Then assign
chances to each outcome.
We think of each outcome as an EVENT. An event can be a single outcome or a
combination of outcomes.
4. The Basic Probability Rules
1.
The sample space S of some random process is the set of all possible outcomes
2.
A probability is a number between 0 and 1 or the probability of some event is 0
≤ P(A) ≤ 1, probabilities are never negative and never greater than
1.
3.
The sum of all possible outcomes must equal 1 or P(S) = 1
4.
The probability that an event does not occur is 1 minus the probability that it
does occur. P(Ac) = 1 - P(A)
5.
If two events have no outcomes in common, the probability that one or the other
occurs is the sum of their individual probabilities. P(A or B) = P(A) + P(B)
this also means that the two events A and B are MUTUALLY EXCLUSIVE
("disjoint")
Note: knowing that A happens completely affects
the chance of B happening (B can't happen if A does) and the reverse is true,
if B happens, A is completely affected (it will not happen).
5.
Assigning Probabilities
The key to assigning probabilities is knowing
all of your possible outcomes and knowing two rules:
All possible outcomes must total 1 or 100% (Where have we talked about 100% being
important)
A probability must take a value 0 ≤ P(A)
≤ 1 (or 0% to 100%)
The best way to get comfortable doing this is to
do it. Study the examples on pages
299-301.
In class example. This is the composition of Statistics 11/Economics 40 as of last
week
|
Freshman |
Sophomore |
Junior |
Senior |
Total |
Male |
7 |
20 |
14 |
3 |
44 |
Female |
2 |
25 |
14 |
2 |
43 |
Total |
9 |
45 |
28 |
5 |
87 |
6. The
Multiplication Rule and Independence (pp. 301-304)
If two events have common outcomes but do not
influence each other, they are independent. For example, I roll two die, one
die should not have an effect on the roll of the second die. In these
situations you are trying to figure out the chances of two things happening
together. The chance that two things will happen equals the chance that the
first will happen multiplied by the chance that the second will happen.
RULE: Two events A and B are INDEPENDENT if
knowing whether or not A happens does not help in predicting whether or not B
happens. IF A and B are independent, then P(B|A) = P(B) The vertical line is
read "given" and the relationship suggests that it doesn't matter if
you know the value P(A), P(B) is unchanged, and we can rewrite the relationship
as P(A and B) = P(A) x P(B).
Note: under INDEPENDENCE, A & B can happen
together, they just don't affect each other. True independence is a rarity, an
ideal situation.
Example, if I roll two die and the first one is
a "1", what's the probability that the second one will also be a
"1"? 1/6. The probability of rolling two "1"s
is 1/6 * 1/6 = 1/36. If however I had asked the question, the first one
is a "even number", what is the probability that the sum of the two
die will be odd. If I roll a 1 on the
first die AFFECTS or INFLUENCES the probability that sum will be odd (in fact,
in this case, it is zero)
Example:
Role 2 Die. What can happen to
die 1? What can happen to die 2?
Example: What happens to the Dow Jones on 3
consecutive trading days?
7. Summary
-- Why do you need to know this?
Well, you might
decide to become a professional gambler.
Just kidding. Probability (or
chance) is a tool, an important tool, for the understanding of statistics. Variability is related to chance (or we
might say that in life there is chance variation in sample outcomes). As a
result we are able to make generalizations from sample outcomes to population
parameters with some calculated degree of certainty. In other words, we will learn to discuss sample outcomes in the
language of chance.