Economics
40/Statistics M11 Lab 5: Testing Hypotheses
Due Friday March 9, 2001
Purpose: The purpose of this lab is to use Stata to test a hypothesis.
Data: We will use the data on the S&P 500 from lab 4.н If you saved it during the last lab, please get rid of your saved copy and "re-use" it from the internet, the command is:
ннннннннннн use http://www.stat.ucla.edu/~vlew/stat11/labs/wi01sp500н
ASSIGNMENT
1.
Treat
the S&P 500 as a population of 500 stocks and calculate two important
population parameters: the mean and the standard deviation for variable pr4weeks.
You do not need to print these out for this assignment.
2.
Here
is a novel investing theory. My investment advisor, "The Oracle",
believes that short company names are bad because they do not inspire
confidence. нTherefore, he believes
investors are less likely to buy stocks from companies whose names are short.
So for example, he might say, "Listen, I never want to see a stock in your
portfolio whose company name has 10 or few letters in it, those stocks are for losers
and USC graduates (he's a UCLA alum)!"н
For example, what he is suggesting is that an investor should never buy
a stock named "Xerox" (XRX). Instead, if you really want a retail
stock, buy "The Colgate-Palmolive Company" (CL), because it has a
very long name.н To him, stocks of
companies with short names will generate lower returns on average than the
market as a whole.нн And of course, the
whole idea of investing, as opposed to indexing, is to beat the market.
3.
Please
test his theory for me and help me decide -- should I fire my adviser or is he
onto something? I will get you started. First, state the null hypothesis. From
page 455 in your book, the null is a statement of "no difference" or
"no effect" or a statement about a population.
A statement of no difference
or no effect would look something like this:
H0: m = .75318 (this is the overall average, the
population mean of pr4weeks, for the S&P500)
In other words, the null
suggests that one could treat the stocks with short company names as a random
sample of stocks and there should be no relationship between the 4 week
percentage returns and the length of the company name.
The alternative hypothesis
is, as stated on p. 455 of your text "a name to the statement we hope is
true instead of the null"
For this assignment:
Ha:н m< .75318
because "The
Oracle's" Theory is stocks of companies with short names do worse on
average than the average stock traded on the market.
4.
The
appropriate test statistic is Z. To calculate it, you will need to find the
sample average x-bar. To do this, generate a new variable called :
generate howlong =
length(name)
all this will do is generate
a new variable called "howlong".н
Length is a Stata function that will take the variable called
"name", count up how many letters are in its name and then return the
number.н So for a company like Xerox,
the result should be "5"
Then issue this command:
summarize pr4weeks if
howlong <= 10
To get summary statistics on
the "sample" of short names.н
The mean for these stocks will serve as your sample average.н You should have 15 values.н Treat the 15 as a random sample of 15 stocks
as there SHOULD be no relationship between the length of the company's name and
percentage return, but Oracle thinks there is.
5.
Perform
your test by calculating Z. What did you get for Z, what is the P-value
associated with Z and how would you interpret that P-value (review pp. 457-460
in your text)? Is there evidence to support the Oracle's theory of stock
selection?
6.
Finally,
you can see how Oracle's "sample" fits into the distribution of
possible samples of size 15.н Using the
"bootstrap" procedure in Stata, for example:
bs "summarize pr4weeks"
"r(mean)", reps(1000) size(15) saving(lab5)
Will bootstrap the
population of 500 by drawing 1000 samples of size `5 (with replacement) and
then saving the results to a new data file.н
Beware, this might take a while to do, but use the new data file of 1000
samples of size 15:
use lab5, clear
And run some summary
statistics on the saved dataset:
summarize bs1
see how the standard deviation for the distribution of samples differs from the theoretical value.н How does Oracle's sample compare with this set of 1,000 samples, that is, what percentage of samples are as extreme or more extreme than his (you could find this out with several methods you've already learned this quarter)?н Is the result of #5 above consistent with this part?н And how well does the simulated distribution fit the normal?н (you could plot the observations using qnorm or just a histogram, your choice) the commands are
qnorm bs1 нннннннннннн orннннн
нн graph bs1, bin(10) normal
SUMMARY
A complete assignment requires the following:
1.
A
restatement of the null and alternative given above and results from a test
(i.e. Z and P-values) using the S&P 500 data.
2.
Your
interpretation please.н What do you
recommend?н Is my investment advisor
right or is the difference between stocks with short company names and the
S&P 500 average return just due to random chance?н Provide the p-value from the Z test using the theoretical
standard error and then provide a percentage (which is the empirical, as
opposed to theoretical, equivalent of a p-value) from the simulated data of
1,000.
3.
Include
the summary statistics for the original data and the distribution of 1000
samples of size 15.н Also include a
normal probability plot or a histogram of the 1000.н If you use a histogram, please use the "normal" option
too.
The stocks mentioned in the handout are not meant to be a solicitation for purchase of any sort.н I own none of those stocks, nor does The Oracle as far as I know.