Homework 2 Due Friday Jan 18

Some probability theory practice:

1.  A bernoulli random variable,X, is one for which there are only two outcomes: 1 or 0.  The probability of observing a 1 is given by p.  That is, P(X = 1) = p, and therefore P(X = 0) = 1-p.
Such a random variable could be used to model the outcome of a single flip of a coin, for example.
a) What is the expected value of X?
b) What is the variance? The standard deviation?

2)  A binomial random variable, Y, represents the number of successes in n independent experiments in which the outcome is either "success" (1) or "failure" (0).  the probability of success at each trial is the same: p.  Y can be represented as a sum of n independent bernoulli random variables.   A classic use of the binomial distribution is to model the number of heads in a fixed number, n, of coin tosses.
a) Derive the expected value for Y.
b) Derive the variance and standard deviation.

3) Suppose a random variable X follows a normal distribution with mean 10 and SD 3.
a) Convert these values to standard units:  13, 14.5, 7, 8
b) Convert these standard units to values:  1.2, -3.0, 2.5
c) Find P(7 < X < 13), P(8 < X < 15), P(X < 13), P(X > 8).  (Hint: you can use published tables or R. Don't do the integral.)


4) Suppose a random variable has a discrete uniform distribution, assuming values x1, x2, ..., xn, each with probability 1/n.  Find the
formula for the Variance and standard deviation of this RV.

5) According to the Central Limit Theorem, the normal distribution approximates the binomial distribution for "sufficiently large" n.  How large is sufficiently large?  For each, compute the probability of observing a value within 1 SD of the mean:  P( - sigma < X < + sigma). Do so first for the distribution given, and then the normal approximation.
a) X is binomial, n = 10, p = .1
b) binomial, n = 25, p = .1
c) binomial, n = 50, p = .1
d) binomial , n = 100, p = .1
e) binomial  n = 500, p = .1
f) binomial, n = 1000, p = .1

Help with R:
type help("pbinom") and help("pnorm") for assistance with calculating these probabilities.


You can further explore the central limit theorem (and I strongly encourage you to take a look at this) via: http://www.ruf.rice.edu/~lane/stat_sim/index.html
This page lets you choose a "parent" distribution, and then take a random sample, form a statistic, and repeat.  The CLT says that the distribution of linear combinations of independent RVs is approximately normal, and this approximation improves as n increases.  (The further the parent distribution is from normal, the bigger n will need to be.)  Because a binomial RV is a sum of bernoulli's, the CLT applies.  This web page requires Java1.1.  The CLT is particularly useful when applied to averages (a sum of observations).  Compare the sampling distributions of averages with medians, for example.

6)

 Let X be a random variable that counts the number of events that occur in some time interval.  Suppose these assumptions hold:
i) the time can be divided into small sub-intervals so that the probability of two events happening in any one sub-interval is 0.  (Two events can't happen simultaneously.)
ii) the events in any subinterval are independent.  (An event happening at one time does not influence whether or not an event will happen in any other time interval.)
iii) the probability of an event occuring is the same in each interval.  (The rate at which events occur is constant.  Call this rate lambda.)

Then the probability distribution of X is the Poisson distribution: P(X = k) = (lambda^(k) / k! ) * exp(-lambda) for k = 0,1,2,...

A study in 1898 found that Prussian calvary officers were kicked to death at the rate of 0.61 per year.

a) Explain why the assumptions hold and this might be a good model.
b) Find P(X = 0), P(X = 3), P(X = 10)
c) Suppose we're observing cars passing beneath an overpass on a one-lane road.  If we let X represent the number of cars that pass through in an hour, would the Poisson distribution be a good model?  Explain.  What if X represented the number of cars in 24 hours?  Would the model hold in heavy traffic?
 

7) In American Roullette, if you bet "red", then the probability of winning is 18/38 and the probability of losing is 20/38.  To play, you bet $1 on red and spin the wheel.  If the ball lands on red, you win a dollar, otherwise you lose a dollar.  Let X_i represent the amount you win (or lose) on the ith spin of the wheel.  We'll use X to represent a generic spin of the wheel.
a) Find E(X).  Plot the pdf of X, indicating the mean.
b) Find SD(X).  In infinitely many plays, in what percentage of outcomes will your "winnings" be within one standard deviation of the mean?
c) Suppose you play 3 times?  What's the expected value of your total winnings?  That is, find E(X1+X2+X3).
d) Find the SD of the total winnings after 3 plays.
e) Graph the pdf of the total winnings after 3 plays.  What percent of the time will you be within one SD of the mean?
f) Let Y = X1 + ... + X100, that is, the total of 100 plays.  Find the mean and SD.  Use the Central Limit Theorem to approximate the probability that you'll be within one SD of of the mean.
g) Let Z = X1 + ... + X100/100  (the average).  Find the mean and SD of Z.  Note that the mean of Z is the same as the mean of X.  What's the probability that Z will be within one SD(X) of the mean?  What's the probability that Z will be within one SD(Z) of the mean?  (SD(X) means "standard deviation of X".  Note that this should be a larger number than SD(Z)).)
The point of this is that the average is likely to be closer to the mean than any single observation.