Introduction to Statistical Methods for the Life and Health Sciences
|
This lab involves hypothesis testing.
Data: We will use the same data on the S & P 500 that was used previously: use "http://www.stat.ucla.edu/~darlene/datasets/sp500.dta"
A statement of no difference or no effect would look something like this:
NULL: m = 9.79622 (this is the overall average, the population mean, for the S & P 500 stocks total return year-to-date).
In other words, the null suggests that one could treat the stocks with company names which begin with U, V, X, Y and Z as a random sample of stocks because there should be no relationship between the year-to-date percentage returns and company name.
The alternative (ALT) hypothesis is "a name to the statement we hope is true instead of the NULL"
For this lab:
ALT: m < 9.79622
because "The Oracle's" Theory is stocks of companies with names at the end of the alphabet do worse on average than the average stock in the S & P 500.
summarize totalret in 462/500
The information from the summarize command will serve as your sample average. There should be 39 stocks that qualify. Treat them as a sample of size 39.
Perform your test by calculating Z. What did you get for Z, what is the P-value associated with Z and how would you interpret that P-value (review pp. 456-457 in your text)? Is there evidence to support the Oracle's theory of stock selection?
Please use the theoretical (the population) value for the standard error in your calculation of the Z.
bs "summarize totalret" "r(mean)", reps(1000) size(39) saving(myresults)
will "bootstrap" the population of 500 by drawing 1000 samples of size 39 (with replacement) and then saving the results to a new data file. What you are doing here is building up from the data the distribution of the test statistic assuming that the NULL is true (this should look like the normal distribution if the CLT has kicked in).
Use the new data file of 1000 samples of size 39:
use myresults, clear
And run some summary statistics on the saved dataset:
summarize bs1
How does Oracle's sample compare with this set of 1,000 samples, that is, what percentage of samples are as extreme or more extreme than his (you could use a tabulate bs1 to find this out)? Is the result of #4 above consistent with this part? And how well does the simulated distribution fit the normal? (you could plot the observations using qnorm or just a histogram, your choice)
The stocks mentioned in the handout are not meant to be a solicitation for purchase of any sort. I own none of those stocks, nor does The Oracle as far as I know!