Lecture 3

Estimation

Lets talk about how statistics are used to estimate parameters.

Last time we talked about populations.  Populations are large, abstract collections of objects.  We care only about one attribute (for now) of these objects, and so it makes sense to try to model what the distribution of that variable/attribute would be.

The probability distribution (or just population distribution) is a mathematical description of the values and their relative frequencies.  If the values of the variable are discrete, then this description could be as straight-forward as a function like this: f(x) = Prob of seeing the value x ( or in other words, the proportion of the population with the value x).  If the values are continuous, this leads to mathematical difficulties, and so f(x) is usually a density function so that the area under the curve represents probabilities or relative frequencies.

These distributions can be summarized by a variety of parameters: the mean, the variance, the median, etc.  Which parameters are interesting to us depends on our investigation and on the type of variable.

Random variables are functions that randomly choose a numerical value from the population.  RVs have probability distributions -- the population distribution in fact.  And RVs can also be summarized by parameters.  In particular:
E(X) =  sum x p(x) or int x f(x) dx   is the mean or expected value.  and
Var(X) = ....... is the variance.  The square root of this is the standard deviation.

These are sometimes called the population mean and the population standard deviation. 

They are analogous to the sample means and sample sd's.

It's called the "expected value" because it's supposed to tell us what value to "expect" (or "predict"?) for the RV X.

Example: Roulette:
the pdf is f(-1) = 20/38 f(1) = 18/38
E(X) = -.0526
Sd(X) = .998614

In practice we can't stop with 1....we draw several observations from the population: X1, X2, .....Xn.   We combine these into a function, called a statistic,.  Statistics are random and have their own distributions.

Example, T = X1 + X2  represents teh amount of money won by playing roulette twice. 
Values of T are  -2, 0, +2
P(T = -2) = (20/38)^2 = .2770083
P(T = 0) = 2*(18/38)*(20/38) = .498615
P(T = 2) = (18/38)^2 =  .2243767

So E(T) = -2*P(T = -2) + 0 + 2*P(T = 2) =  -.10526
And we can figure out Var(T) too.

And we can continue if T = X1 + .... + Xn
But tedious. Wouldn't it be nice to just know what the mean, sd, and distribution of T is?

Terminology:  the pdf of a statistic is called the sampling distribution.  The sampling distribution of T = X1 + X2 was given above.

With computers, we can often rely on cmputation to tell us what this is.  But math helps, too.  You'll see many examples of sampling distributions and in a math stats class talk about why they are what they are.  The most famous example is this:
Any sum of independent RVs is approximately Normally distributed with mean E(T), SD(T).  The same is true if you multiply by a constant, and so Xbar is Normally distributed, too.

Hence, if n is large, T follows a normal distribution.  What about E(T)?

Rules for linear combinations of RVs.
    E(a1X1 + .... an Xn ) = a1 E(X1) + ... an E(Xn)  always
    Var(a1 X1 + ... an Xn) = a1^2 Var(X1) + .... + an^2 Var(Xn)  only if X's are indpt.

(WE haven't defined indpent and will do so informally:  observing the value of one of the X's has no effect on the others, and gives us no information about the others.)

So E(T) = n*(-.0526)
Var(T) = .998614^2*n

If I play 100 times, i expect to lose 5.26, give or take 9.986.