STAT 13

(Sec. 1a-1c)

Introduction to Statistical Methods for the Life and Health Sciences

Instructor: Ivo Dinov, Asst. Prof.

Departments of Statistics & Neurology

http://www.stat.ucla.edu/~dinov/

Lab 7

Also see this for an additional tutorial.

Thursday, Nov. 15, 2001

This lab involves hypothesis testing.

Data: We will use the same data on the S & P 500 that was used previously: use "http://www.stat.ucla.edu/~darlene/datasets/sp500.dta"

First, treat the S & P stocks as a population and calculate two important population parameters for the total return variable: the mean and the standard deviation (use the summarize command).

Here is a novel investing theory. An investment advisor, "The Oracle", believes that most investors are basically greedy, lazy and not very smart (He is old and cranky). Therefore, investors are less likely to buy stocks from companies whose names are at the end of the alphabet than those at the beginning. He simply thinks that people think about things alphabetically. So for example, he might say, "Listen, I never want to see a stock in your portfolio whose company name begins with U, V, W, X, Y, or Z." For example, what he is suggesting is that an investor should never buy a stock named "Wells Fargo". Instead, if you really want a bank stock, buy "Bank of America", it's at the beginning of the alphabet. To him, stocks of companies with names at the end of the alphabet will generate lower returns on average than the average stock in the S & P 500.

Please test his theory for me and help me decide -- should I fire my advisor or is he onto something? I will get you started. First, state the NULL hypothesis. (The NULL is a statement of "no difference" or "no effect".)

A statement of no difference or no effect would look something like this:

NULL: m = 9.79622 (this is the overall average, the population mean, for the S & P 500 stocks total return year-to-date).

In other words, the null suggests that one could treat the stocks with company names which begin with U, V, X, Y and Z as a random sample of stocks because there should be no relationship between the year-to-date percentage returns and company name.

The alternative (ALT) hypothesis is "a name to the statement we hope is true instead of the NULL"

For this lab:

ALT: m < 9.79622

because "The Oracle's" Theory is stocks of companies with names at the end of the alphabet do worse on average than the average stock in the S & P 500.

The appropriate test statistic is Z. To calculate it, you will need to find the sample average for the 39. I would issue a summarize command with the "in" option. For example:

summarize totalret in 462/500

The information from the summarize command will serve as your sample average. There should be 39 stocks that qualify. Treat them as a sample of size 39.

Perform your test by calculating Z. What did you get for Z, what is the P-value associated with Z and how would you interpret that P-value (review pp. 456-457 in your text)? Is there evidence to support the Oracle's theory of stock selection?

Please use the theoretical (the population) value for the standard error in your calculation of the Z.

Finally, you can see how Oracle's "sample" fits into the distribution of possible samples of size 39. Using the "bootstrap" procedure in Stata, for example:

bs "summarize totalret" "r(mean)", reps(1000) size(39) saving(myresults)

will "bootstrap" the population of 500 by drawing 1000 samples of size 39 (with replacement) and then saving the results to a new data file. What you are doing here is building up from the data the distribution of the test statistic assuming that the NULL is true (this should look like the normal distribution if the CLT has kicked in).

Use the new data file of 1000 samples of size 39:

use myresults, clear

And run some summary statistics on the saved dataset:

summarize bs1

How does Oracle's sample compare with this set of 1,000 samples, that is, what percentage of samples are as extreme or more extreme than his (you could use a tabulate bs1 to find this out)? Is the result of #4 above consistent with this part? And how well does the simulated distribution fit the normal? (you could plot the observations using qnorm or just a histogram, your choice)

The stocks mentioned in the handout are not meant to be a solicitation for purchase of any sort. I own none of those stocks, nor does The Oracle as far as I know!

You may want to try some additional examples.
And compare your work to the template solution (this is not a unique solution, just a template)

\Ivo D. Dinov, Ph.D., Departments of Statistics and Neurology, UCLA School of Medicine/