Lab 7
Due Friday, December 7
DATA
The Labor Force Participation Rate (LFPR) measures the percent of a population of eligible workers who are actually working.  The LFPR for women can be used to track changes in the participation of women in the U.S. workforce.

The data are based on surveys in 19 US cities and consist of the LFPR for women in 1968 and 1972.  So each line of the database has a city, the LFPR in 1968 for that city, and the LFPR for 1972 for that city.

Research Question
Did the LFPR for women change from 1968 to 1972?

Method
There are a variety of approaches to answer this question.  I'll suggest a few, and leave you to decide which you prefer.

Approach 1

Assume that the average LFPR across all 19 cities in 1968 is exactly equal to the mean LFPR of those 19 cities in 1968.  Then, ask yourself, if we were to randomly sample 19 observations from a distribution with this mean and this SD, is it likely we would get the average that we got in 1972?

Approach 2

We can take into account the fact that the observed average LFPR in 1968 is not EXACTLY equal to the mean, but in fact has some error in it.  (Which is just another way of saying that it lies close to the true mean, but is not exactly equal to it.)  We then  take the average of the cities in 1968 and subtract the average of the cities in 1972.  If there was no change in the true LFPR, then the difference of these averages will be close to 0.  They won't be exactly 0 because of sampling error, but they will be close to 0.  The question is, how close?  Well, that depends on how much differences of averages can vary.

Approach 3

We can take into account the fact that the the observed LFPR in one city might change in different ways than in other cities.  So these means that we can't just lump all of the cities into one big pool, but instead should look at the changes in each city.  To do this, subtract the observered LFPR for each city:
LFPR(1972) - LFPR(1968).  We then have 19 differences.  If there was no change in the true LFPR, these differences should be normally distributed with a mean of 0.  So we can take the AVERAGE of the differences, and see if its "close to " 0.

Descriptive Statistics

Before you begin, you need to understand what the data look like.

Find the average and SD of the 1968 data and the 1972 data separately.  Are they very different?
Make histograms of the two years and compare them.

Inference

Approach 1

Assume that the average and SD for 1968 is the mean and the SD of the population.  If we were to take a random sample of 19 observations and find their average, what would be the  mean and SD of the average?  (So imagine that you and infinitely many friends do this experiment.  What does the distirbution of your averages look like?)  Remember that this SD of the averages is called the Standard Error (SE).  How many SEs away from the 1968 mean is the 1972 average?

If the 1972 average is close  to the presumed 1968 mean, then we would conclude that nothing had changed.  But if it were far, we would say that things had changed.  Would you say that the 1972 average is far?  If so, why?  If not, how far would it have to be before you would conclude that a change had occured?

Approach 2

We don't really know the true mean of the LFPR in 1968 or 1972.  We don't even know the true SDs for that matter.  Let's make the following assumptions, though:
1) LFPR(1968) is  a random variable from a normal distribution.
    a) the Mean of LFPR(1968) is a fixed number called mu1, the SD is a fixed number called sigma1.  (These are fixed, but unknown.)
2) LFPR(1972) is a random variable from a normal distribution.
    b) the Mean of LFPR(1972) is a fixed number called mu2, the SD is  a fixed number called sigma2.
3) sigma1 = sigma2

This last assumption is pretty extreme.  But I think you can see from your Descriptive Statistics that it doesn't look all that outlandish.

(This next bit uses some math that you don't have to worry about following. You have all the tools necessary, but it's a little tricky.  Just make sure you get the bottom line.)
If there really was no change from 1968 to 1972, then mu2 - mu1 = 0,  so the Avg(1972) - Avg(1968) will be apprxoimately 0, and
the MEAN(avg1972 - avg1968) = 0.  We can also figure out what the Standard Error (SE) would be:
SD(avg2 - avg1) = sqrt( SD(avg2)^2 + SD(avg1)^2) = sqrt(sigma2^2 * (1/n)^2 + sigma1^2 * (1/n)^2) = sigma2/sqrt(n) + sigma1/sqrt(n).
Because we are assuming sigma1 = sigma2, we get that the SE(avg1972-avg1968) = 2*sigma/sqrt(n).

This means we can estimate the SE by estimating the SD of all of the data pooled together and dividing by sqrt(19).

The bottom line:
If there was no change between 1972 and 1968, then the differences in averages will be (approximately) normally distributed with mean 0 and SD sigma/sqrt(19).

1. What's the SD of all of the data pooled together?
2. What's the SE (2*SD/sqrt(19))?

How many SE's away from 0 is Avg(1972) - Avg(1968)?  Would you conclude that a change had occured?

You can do this whole operation quickly with Excel.  Under Tools:Data Analysis, select "t-test for two samples assuming equal variances" .  For "variable 1 range" input the range for the 1968 data.  For "variable 2 range" input the range for 1972 data.  For "hypothesized mean difference" put 0  (this means that we will assume there is no difference between the two groups and will then see how likely this assumption seems).

Do this.

The output will be a table with several entries.  The first few you should be able to identify.  The one that says "t-stat" gives a number (approximately 1.50 here) which is an estimate of how many SEs away from 0 the difference in averages was.  So avg(1972) - avg(1968) was 1.5 SEs above 0.  If the distribution of the difference of two averages is normal and centered at 0, the probability that a random sample will produce thats 1.5 SEs above 0 or more is pretty large.  The exact value is given under P(T <= t), one-t.  Excel has a misprint here.  The probability shown is actually P(T >= t), in other words, the probability that
a t-stat from a random sample will be bigger than the one you saw (1.5) is about 0.07.  The probability that a t-stat from a random sample will be bigger
than1.5   OR less than -1.5  is .14 (given on the next line.)   From this we conclude that, while the 1972 average seems to be above 1968 average, it is not very far above.  In fact, differences of this size occur about 14 percent of the time.  So I would conclude there is no change.

Approach 3

I'm not going to go through  the theory here, but Excel can quickly give you the output you need.  Under "Tools: Data Analysis" choose "t-test: Paired Two Sample for Means".  You'll get an output table similar to the last one.  Look at the line that says "P(T<=t) two-tail 0.024352597".  This means that
the probability of getting a t-stat as far away from 0 as the one you got (which was about 2.45 here) is only about 2%.  This means that it is possible that
no change occured in the LFPR variable, but if so the data we saw only happens 2% of the time or less.  So we must either conclude that there was a change, or that we were just unlucky.