Lecture 3
Estimation
Lets talk about how statistics are used to estimate parameters.
Last time we talked about populations. Populations are large, abstract
collections of objects. We care only about one attribute (for now)
of these objects, and so it makes sense to try to model what the distribution
of that variable/attribute would be.
The probability distribution (or just population distribution) is a mathematical
description of the values and their relative frequencies. If the values
of the variable are discrete, then this description could be as straight-forward
as a function like this: f(x) = Prob of seeing the value x ( or in other
words, the proportion of the population with the value x). If the values
are continuous, this leads to mathematical difficulties, and so f(x) is usually
a density function so that the area under the curve represents probabilities
or relative frequencies.
These distributions can be summarized by a variety of parameters: the mean,
the variance, the median, etc. Which parameters are interesting to
us depends on our investigation and on the type of variable.
Random variables are functions that randomly choose a numerical value from
the population. RVs have probability distributions -- the population
distribution in fact. And RVs can also be summarized by parameters.
In particular:
E(X) = sum x p(x) or int x f(x) dx is the mean or expected
value. and
Var(X) = ....... is the variance. The square root of this is the standard
deviation.
These are sometimes called the population mean and the population standard
deviation.
They are analogous to the sample means and sample sd's.
It's called the "expected value" because it's supposed to tell us what value
to "expect" (or "predict"?) for the RV X.
Example: Roulette:
the pdf is f(-1) = 20/38 f(1) = 18/38
E(X) = -.0526
Sd(X) = .998614
In practice we can't stop with 1....we draw several observations from the
population: X1, X2, .....Xn. We combine these into a function,
called a statistic,. Statistics are random and have their own distributions.
Example, T = X1 + X2 represents teh amount of money won by playing
roulette twice.
Values of T are -2, 0, +2
P(T = -2) = (20/38)^2 = .2770083
P(T = 0) = 2*(18/38)*(20/38) = .498615
P(T = 2) = (18/38)^2 = .2243767
So E(T) = -2*P(T = -2) + 0 + 2*P(T = 2) = -.10526
And we can figure out Var(T) too.
And we can continue if T = X1 + .... + Xn
But tedious. Wouldn't it be nice to just know what the mean, sd, and distribution
of T is?
Terminology: the pdf of a statistic is called the sampling distribution.
The sampling distribution of T = X1 + X2 was given above.
With computers, we can often rely on cmputation to tell us what this is.
But math helps, too. You'll see many examples of sampling distributions
and in a math stats class talk about why they are what they are. The
most famous example is this:
Any sum of independent RVs is approximately Normally distributed with mean
E(T), SD(T). The same is true if you multiply by a constant, and so
Xbar is Normally distributed, too.
Hence, if n is large, T follows a normal distribution. What about E(T)?
Rules for linear combinations of RVs.
E(a1X1 + .... an Xn ) = a1 E(X1) + ... an E(Xn)
always
Var(a1 X1 + ... an Xn) = a1^2 Var(X1) + .... + an^2 Var(Xn)
only if X's are indpt.
(WE haven't defined indpent and will do so informally: observing the
value of one of the X's has no effect on the others, and gives us no information
about the others.)
So E(T) = n*(-.0526)
Var(T) = .998614^2*n
If I play 100 times, i expect to lose 5.26, give or take 9.986.