Basic idea: Sampling Distributions show the behavior of some statistic over a large number of samples or over the "long run". Remember REPLICATION? In practice, statisticians cannot gather a large number of samples so they rely on probability theory to tell them what would happen if they actually had many samples to work with and observe.
Something to consider, nothing in life is certain (except death and taxes) and at least some of us like to have a sense of certainty or of chances in life. Think about Economics, we want to know the chance of success, the likelihood of earning a profit, the chance that the relationship between an explanatory variable and a response variable is real. Probability is the formal study of chance.
Talking about chance. What are the chances of rain tomorrow? What are the chances of winning the California Lottery? Probability theory was developed to answer these questions (in fact, most of the early work was done by gamblers...)
PROBABILITY is the study of CHANCE: a certain random process/phenomenon/experiment is given (such as rolling a die or spinning a roulette wheel), and we want to know the chance of various outcomes.
The basic idea is simple, if you have some kind of random phenomenon -- a coin toss -- you might not be able to say how the next toss will fall, but you might be able to say with some certainty how many heads you will get if you toss a coin 20 times (about 10 heads). Similarly
Definition of Random from the text: Individual outcomes are uncertain but there is a regular distribution of outcomes in a large number of repetitions.
Definition of Probability: given some random phenomenon this is the proportion of times the outcome would occur over many repetitions. This is a long term relative frequency. EXAMPLE. Again the CHANCE or PROBABILITY of a particular event is the percentage of time that event is expected to occur if the same random process is repeated over and over under the same circumstances. The ability to replicate is absolutely critical.
Definition of an Event: a single outcome or some set of outcomes associated with some random phenomenon or random process. Events can be combined.
1. The sample space S of some random process is the set of all possible outcomes
2. A probability is a number between 0 and 1 or the probability of some event is 0 ≤ P(A) ≤ 1, probabilities are never negative and never greater than 1.
3. The sum of all possible outcomes must equal probability 1 or P(S) = 1
4. The probability that an event does not occur is 1 minus the probability that it does occur. P(Ac) = 1 - P(A)
5. If two events have no outcomes in common, the probability that one or the other occurs is the sum of their individual probabilities. P(A or B) = P(A) + P(B)
· Two events A and B are MUTUALLY EXCLUSIVE ("disjoint") if the events A and B cannot happen simultaneously (they have nothing in common). IF A and B are mutually exclusive, P(A and B)=0 and thus P(A or B) = P(A) + P(B)
Note: knowing that A happens completely affects the chance of B happening (B can't happen if A does) and the reverse is true, if B happens, A is completely affected (it will not happen).
D. Assigning Probabilities To Events
The key to assigning probabilities is knowing all of your possible
outcomes and knowing two rules:
Simple example, I toss a coin, what are the possible outcomes? Assume the coin I toss is fair, what are the probabilities?
I make the simple more complicated. I toss 4 coins, what are the possible outcomes if I'm just counting the number of heads that show? 0 1, 2, 3, 4 heads.
Simple example, the daily Dow Jones Industrial Average historically has closed "up" 65% percent of the time. On any given trading day, what are the possible outcomes for the DJIA? It can close up or it will not close up. If it closes up 65% of the time, it must close not up 35% of the time. This is rule #4 (complementarity) in effect.
In a 3 day period then, what are the possible outcomes for the DJIA? UUU, UUD, UDU, DUU, UDD, DUD, DDU, DDD. Now, how do we work with this kind of information?
Note: in general, probabilities are not equal.
Note: Types of probabilities -- classical/theoretical (based on
theories), relative frequency (based on the long run), and personal (some
events are not repeatable and so individual probabilities are assigned -- like
a gut feeling -- this is outside of the scope of this course).
E. Independence and Independent Events
If two events have common outcomes but do not influence each other, they are independent. For example, the roll of two die, one die should not have an effect on the roll of the second die. In these situations you are trying to figure out the chances of two things happening simultaneously. The chance that two things will happen equals the chance that the first will happen multiplied by the chance that the second will happen. If the outcomes are dependent (i.e. situations like sampling without replacement) then multiply conditional probabilities (probabilities that are dependent on previous outcomes).
· Two events A and B are INDEPENDENT if knowing whether or not A happens does not help in predicting whether or not B happens. IF A and B are independent, then P(B|A) = P(B) The vertical line is read "given" and the relationship suggests that it doesn't matter if you know P(A), and we can rewrite the relationship as P(A and B) = P(A) x P(B).
Note: under INDEPENDENCE, A & B can happen together, they just don't affect each other. True independence is a rarity, an ideal situation.
Example, if I roll two die and the first one is a "1", what's
the probability that the second one will also be a "1"? 1/6.
The probability of rolling two "1"s is 1/6 * 1/6 = 1/36. If however I had asked the question, the
first one is a "1", what is the probability that the sum of the two
die will be 3, the fact that I've rolled a 1 AFFECTS or INFLUENCES the
probability that sum will be 3.
F. Examples
Suppose you work for a bank and survey 1,000 customers selected at random from the client database and get these results:
Uses ATM/Uses internet banking services 175
Uses ATM/Does not use internet banking services 525
Does not Use ATM/Uses internet banking services 75
Does not Use ATM/Does not use internet banking services 225
Total 1000
If this was a representative sample, you could make some generalizations and some laws of probability. For example, suppose you went back to the client database and selected a person at random, given what you know about this sample, what is the probability that if you selected a person at random, he/she does not use the ATM and does not use internet banking services? 225/1000 or 22.5% or .225.
What is the probability P(select a person who currently uses the ATM) = .175 + .525 = .70 or 70% You use the addition rule because these are mutually exclusive, you can use the ATM and use internet banking OR you can use the ATM and NOT use internet banking but you can't be in both categories simultaneously.
Finally, on independence, is internet use independent of ATM use, well if this holds true:
P(uses internet| uses ATM) = P(uses internet) * P(uses ATM) = .25 * .70 = .175 if this is true, then independence holds…
G. Summary -- why do you need to know this?
Ultimately, a sense of probabilities or chance will allow you to make guided decisions about the information/data in front of you. Probability is the linkage between the statistics you calculated in previous chapters and the ability to make inferences/generalizations to a larger population.