Lab3sampling

Economics 40/Statistics M11 Lab 5: Testing Hypotheses
Due Friday March 9, 2001

Purpose: The purpose of this lab is to use Stata to test a hypothesis.

Data: We will use the data on the S&P 500 from lab 4.Ý If you saved it during the last lab, please get rid of your saved copy and "re-use" it from the internet, the command is:

ÝÝÝÝÝÝÝÝÝÝÝ use http://www.stat.ucla.edu/~vlew/stat11/labs/wi01sp500Ý

ASSIGNMENT

1. Treat the S&P 500 as a population of 500 stocks and calculate two important population parameters: the mean and the standard deviation for variable pr4weeks. You do not need to print these out for this assignment.

2. Here is a novel investing theory. My investment advisor, "The Oracle", believes that short company names are bad because they do not inspire confidence. ÝTherefore, he believes investors are less likely to buy stocks from companies whose names are short. So for example, he might say, "Listen, I never want to see a stock in your portfolio whose company name has 10 or few letters in it, those stocks are for losers and USC graduates (he's a UCLA alum)!"Ý For example, what he is suggesting is that an investor should never buy a stock named "Xerox" (XRX). Instead, if you really want a retail stock, buy "The Colgate-Palmolive Company" (CL), because it has a very long name.Ý To him, stocks of companies with short names will generate lower returns on average than the market as a whole.ÝÝ And of course, the whole idea of investing, as opposed to indexing, is to beat the market.

3. Please test his theory for me and help me decide -- should I fire my adviser or is he onto something? I will get you started. First, state the null hypothesis. From page 455 in your book, the null is a statement of "no difference" or "no effect" or a statement about a population.

A statement of no difference or no effect would look something like this:

H0: m = .75318 (this is the overall average, the population mean of pr4weeks, for the S&P500)

In other words, the null suggests that one could treat the stocks with short company names as a random sample of stocks and there should be no relationship between the 4 week percentage returns and the length of the company name.

The alternative hypothesis is, as stated on p. 455 of your text "a name to the statement we hope is true instead of the null"

For this assignment:

Ha:Ý m< .75318

because "The Oracle's" Theory is stocks of companies with short names do worse on average than the average stock traded on the market.

4. The appropriate test statistic is Z. To calculate it, you will need to find the sample average x-bar. To do this, generate a new variable called :

generate howlong = length(name)

all this will do is generate a new variable called "howlong".Ý Length is a Stata function that will take the variable called "name", count up how many letters are in its name and then return the number.Ý So for a company like Xerox, the result should be "5"

Then issue this command:

summarize pr4weeks if howlong <= 10

To get summary statistics on the "sample" of short names.Ý The mean for these stocks will serve as your sample average.Ý You should have 15 values.Ý Treat the 15 as a random sample of 15 stocks as there SHOULD be no relationship between the length of the company's name and percentage return, but Oracle thinks there is.

5. Perform your test by calculating Z. What did you get for Z, what is the P-value associated with Z and how would you interpret that P-value (review pp. 457-460 in your text)? Is there evidence to support the Oracle's theory of stock selection?

6. Finally, you can see how Oracle's "sample" fits into the distribution of possible samples of size 15.Ý Using the "bootstrap" procedure in Stata, for example:

bs "summarize pr4weeks" "r(mean)", reps(1000) size(15) saving(lab5)

Will bootstrap the population of 500 by drawing 1000 samples of size `5 (with replacement) and then saving the results to a new data file.Ý Beware, this might take a while to do, but use the new data file of 1000 samples of size 15:

use lab5, clear

And run some summary statistics on the saved dataset:

summarize bs1

see how the standard deviation for the distribution of samples differs from the theoretical value.Ý How does Oracle's sample compare with this set of 1,000 samples, that is, what percentage of samples are as extreme or more extreme than his (you could find this out with several methods you've already learned this quarter)?Ý Is the result of #5 above consistent with this part?Ý And how well does the simulated distribution fit the normal?Ý (you could plot the observations using qnorm or just a histogram, your choice) the commands are

qnorm bs1 ÝÝÝÝÝÝÝÝÝÝÝÝ orÝÝÝÝÝ ÝÝ graph bs1, bin(10) normal

SUMMARY

A complete assignment requires the following:

1. A restatement of the null and alternative given above and results from a test (i.e. Z and P-values) using the S&P 500 data.

2. Your interpretation please.Ý What do you recommend?Ý Is my investment advisor right or is the difference between stocks with short company names and the S&P 500 average return just due to random chance?Ý Provide the p-value from the Z test using the theoretical standard error and then provide a percentage (which is the empirical, as opposed to theoretical, equivalent of a p-value) from the simulated data of 1,000.

3. Include the summary statistics for the original data and the distribution of 1000 samples of size 15.Ý Also include a normal probability plot or a histogram of the 1000.Ý If you use a histogram, please use the "normal" option too.

The stocks mentioned in the handout are not meant to be a solicitation for purchase of any sort.Ý I own none of those stocks, nor does The Oracle as far as I know.