Economics 40/Statistics M11 Lab 3: Populations and Samples
Due Wednesday November 24, 1999

Purpose: The purpose of this lab is to use Excel to generate samples from a population.

Data: We will use data on the S & P 500. This is the data set used in lab 2.

If you do not have the data, please go to the class web page at www.sscnet.ucla.edu/99F/econm40 -1/ and click on the Lab Page link. Then go to the instructions for Lab 2 and click any of data files.

Regardless of which data link you select from the web page, you should have 9 columns of data and 500 rows. The values of the variables for all the companies are as of September 30, 1999. The source was Value Line Publishing.

ASSIGNMENT

  1. First, treat the S & P stocks as a population and calculate the population parameters (print out just three of them -- the mean, the median and the standard deviation) for the TOTAL RETURN Y-T-D (it should be the 8th column). We could use the "Descriptive Statistics" tool that you worked with in Lab 2, but most of those statistics are really meant for samples. So an alternative is to use a few Excel Functions. Functions are found under INSERT on the menu bar.
  2. I would like you to calculate (have Excel calculate) the AVERAGE (mean), MEDIAN, and STANDARD DEVIATION (for the population). You can find all of these under the Statistical functions. Here is a screen capture for the Standard Deviation of the population function:

     

     

    The other two, AVERAGE and MEDIAN have the proper names, only the Standard Deviation of the population is abbreviated as STDEVP. Once you click "OK" it will ask you for a cell range. For my example here, the cell range of the 500 stocks was H2:H501, yours could be different.

    You can also choose to go around this insert function helper and simply type in formulas. For those of you who are brave (or who dislike these pop-up windows), you could just type the following functions in 3 separate cells:

    @AVERAGE(H2:H501)

    @MEDIAN(H2:H501)

    @STDEVP(H2:H501)

    and you will get the same results as if you had used the insert function helper. Remember, there are many ways to do things in Excel, so do what is most comfortable and gives you the right answer.

    Print these statistics out on a piece of paper and then construct a histogram of the population. If you saved the histogram from your previous lab, you can just make a copy of it. To make a copy of any worksheet :

    If your graphic is in a separate workbook, open up that workbook and make a copy of the appropriate worksheet to your lab 3 workbook:

     

     

     

     

     

  3. Drawing Samples from the Population. You are going to draw random samples from the population of S & P 500 stocks. To do this, you will need the Random Number Generator tool in the Data Analysis kit and you will need to assign a probability to each one of the stocks. (HINT: The probabilities are equal. In other words, you should assign probabilities so that each stock has the same chance of being chosen as the next stock.)
  4. To keep the whole thing tidy, I'd recommend that you copy the column of "Total Return YTD" to a new worksheet. To insert a new worksheet in any Excel workbook, click on INSERT on the menu bar and then click "WORKSHEET".

    Then go back to your worksheet with the data and highlight the column with the data of interest by clicking on the letter at the very top of the column. On my sheet, it's "H", clicking on it will select your column -- it should be highlighted.

     

    If so, click on the word "EDIT" on the menu bar select "COPY". If you successfully copy the column, the highlight should look as if the edges are "racing around". Anyway, try it, then go back to the new worksheet, click on the "A" of the first column to select it. Then click on the word "EDIT " on the menu bar and choose "PASTE" to put the column of returns in the "A" column.

    Column B should contain your probabilities. Again, the probabilities are equal and they sum to 1 or 100%. So if you have 500 stocks…what are the individual probabilities?

    When you've got your column "A" of outcomes and column "B" of probabilities, you are ready to go to the RANDOM NUMBER GENERATOR TOOL found under TOOLS - DATA ANALYSIS. The boxes in the random number generator should be empty (it's OK if it's not).

     

     

    What you will do is draw 100 random samples of size 50 from the population. In the box above, the number of variables is the sample size and the number of random numbers is the number of repetitions (i.e. number samples drawn). Actually, in this assignment, it doesn't matter which one you call sample size and which one you call repetitions AS LONG AS YOU ARE CLEAR ON WHAT COLUMN OR ROW IS USED TO GENERATE THE SAMPLE AVERAGES for the returns -- remember it's 100 samples of size 50. Draw these samples from the population and save it to a New Worksheet (see how the options is checked under Output Options). The action should result in a new page with 5000 numbers on it, something like this:

     

    DO NOT PRINT THE 5000 NUMBERS OUT PLEASE.

  5. Calculate, using a function, the 100 sample means in a separate column (or row) of the worksheet. For example:
  6.  

    Then construct a relative frequency histogram of the 100 sample means and print it out. And while you are at it, calculate the standard deviation for the 100 sample means, once again, you should use STDEVP as it uses the correct formula (if you look at the help for this function, you will see the formula)

  7. Finally, convert each of the 100 sample means into Z scores using the formula (you can do this by hand if you wish, but to save time, you should learn to program this into Excel):



 

(remember, x-bar is a sample average, m (mu) is the population parameter (mean), s (sigma) is the population standard deviation and n is the sample size, that's a square root sign before the n). Please answer the following questions:

 

A. What were your highest and lowest Z scores and what percentage of your Z scores fell between +1 and -1 Z? What percentage fell between +2 and -2 Z?

B. What was the mean of your 100 sample means?

C. What was the standard deviation of your 100 sample means?

D. What is the theoretical standard error for samples of 100? The formula is on page 450 of your text.

 

 

 

RECAP

To summarize, a complete assignment requires that you turn in these things:

  1. The population parameters and a graphic of the population distribution of TOTAL RETURN Y-T-D. Any kind of histogram or bar chart of the population is acceptable for this assignment (you can use the one from your lab 2 if you wish).
  2. A relative frequency histogram of the 100 sample means calculated from your 100 samples of size 50.
  3. The answers to questions 4A through 4D above. No other write up is needed.
  4. Staple it together, put your TA's name on it (if you are in enrolled in one and go to the other, please indicate that) and turn it by the end of lecture on November 24, 1999. If you are leaving town early for the holidays, please leave it my mailbox in Math Sciences 8130 or e-mail to me as an attachment.