1. A bernoulli random variable,X, is one for which there are only
two outcomes: 1 or 0. The probability of observing a 1 is given by
p. That is, P(X = 1) = p, and therefore P(X = 0) = 1-p.
Such a random variable could be used to model the outcome of a single flip
of a coin, for example.
a) What is the expected value of X?
b) What is the variance? The standard deviation?
5) According to the Central Limit Theorem, the normal distribution approximates
the binomial distribution for "sufficiently large" n. How large is
sufficiently large? For each, compute the probability of observing
a value within 1 SD of the mean: P( - sigma < X < + sigma).
Do so first for the distribution given, and then the normal approximation.
a) X is binomial, n = 10, p = .1
b) binomial, n = 25, p = .1
c) binomial, n = 50, p = .1
d) binomial , n = 100, p = .1
e) binomial n = 500, p = .1
f) binomial, n = 1000, p = .1
Help with R:
type help("pbinom") and help("pnorm") for assistance with calculating these
probabilities.
You can further explore the central limit theorem (and I strongly encourage
you to take a look at this) via:
http://www.ruf.rice.edu/~lane/stat_sim/index.html
This page lets you choose a "parent" distribution, and then take a random
sample, form a statistic, and repeat. The CLT says that the distribution
of linear combinations of independent RVs is approximately normal, and this
approximation improves as n increases. (The further the parent distribution
is from normal, the bigger n will need to be.) Because a binomial RV
is a sum of bernoulli's, the CLT applies. This web page requires Java1.1.
The CLT is particularly useful when applied to averages (a sum of observations).
Compare the sampling distributions of averages with medians, for example.
Let X be a random variable that counts the number of events that
occur in some time interval. Suppose these assumptions hold:
i) the time can be divided into small sub-intervals so that the probability
of two events happening in any one sub-interval is 0. (Two events
can't happen simultaneously.)
ii) the events in any subinterval are independent. (An event happening
at one time does not influence whether or not an event will happen in any
other time interval.)
iii) the probability of an event occuring is the same in each interval.
(The rate at which events occur is constant. Call this rate lambda.)
Then the probability distribution of X is the Poisson distribution: P(X = k) = (lambda^(k) / k! ) * exp(-lambda) for k = 0,1,2,...
A study in 1898 found that Prussian calvary officers were kicked to death at the rate of 0.61 per year.
a) Explain why the assumptions hold and this might be a good model.
b) Find P(X = 0), P(X = 3), P(X = 10)
c) Suppose we're observing cars passing beneath an overpass on a one-lane
road. If we let X represent the number of cars that pass through in
an hour, would the Poisson distribution be a good model? Explain.
What if X represented the number of cars in 24 hours? Would the model
hold in heavy traffic?