Return to the Ozone data:

Homework 6

Due Friday, February 23

Below are the sample correlation and sample covariance matrices for the ozone data predictor variables. They are in table form here.

CORRELATIONS (Numeric Variables)

VARIABLES

OBSERVATIONS TEMP INVERSIO PRESSURE VISIBILI HEIGHT HUMIDTY TEMP2 WINDSPEE

TEMP 1.00 -0.58 0.41 -0.27 0.76 0.46 0.84 0.29

INVERSIONHT -0.58 1.00 -0.05 0.37 -0.54 -0.33 -0.84 0.07

PRESSURE 0.41 -0.05 1.00 -0.07 0.03 0.71 0.07 0.42

VISIBILITY -0.27 0.37 -0.07 1.00 -0.19 -0.44 -0.34 0.00

HEIGHT 0.76 -0.54 0.03 -0.19 1.00 0.10 0.80 0.09

HUMIDTY 0.46 -0.33 0.71 -0.44 0.10 1.00 0.31 0.35

TEMP2 0.84 -0.84 0.07 -0.34 0.80 0.31 1.00 0.10

WINDSPEED 0.29 0.07 0.42 0.00 0.09 0.35 0.10 1.00

COVARIANCES (Numeric Variables)

VARIABLES

OBSERVATIONS TEMP INVERSIONHT PRESSURE VISIBILITY HEIGHT HUMIDTY TEMP2 WINDSPEED

TEMP 160.98 -13410.14 165.61 -269.87 780.78 127.30 130.55 8.25

INVERSIONHT -13410.14 3344832.43 -2716.72 53281.29 -81108.33 -13240.81 -18889.12 290.19

PRESSURE 165.61 -2716.72 1029.90 -180.01 87.17 490.83 27.97 29.71

VISIBILITY -269.87 53281.29 -180.01 6146.79 -1200.22 -753.59 -329.02 0.46

HEIGHT 780.78 -81108.33 87.17 -1200.22 6631.37 184.34 799.23 15.98

HUMIDTY 127.30 -13240.81 490.83 -753.59 184.34 468.12 81.81 16.49

TEMP2 130.55 -18889.12 27.97 -329.02 799.23 81.81 149.74 2.68

WINDSPEED 8.25 290.19 29.71 0.46 15.98 16.49 2.68 4.86

a) Which matrix, correlation or covariance, should you perform your principal components analysis on, and why?

b) Suppose you base your PCA (Principal Components Analysis) on the covariance matrix. In the first principal component, which variable will have the biggest value? Put differently, the first principal component is a linear combination of these 7 variables. Which variable will have the biggest co-efficient in the linear combination? Why?

Here's the output of a PCA on the Correlations:

FIT MEASURES

COMPONENTS E-Value Prop. CumProp

PC1 3.68398 0.46050 0.46050

PC2 1.82334 0.22792 0.68841

PC3 1.07382 0.13423 0.82264

PC4 0.63570 0.07946 0.90210

PC5 0.42500 0.05313 0.95523

PC6 0.17152 0.02144 0.97667

PC7 0.15307 0.01913 0.99580

PC8 0.03358 0.00420 1.00000

COMPONENTS

VARIABLES PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8

TEMP 0.4751 0.0064 0.2366 0.0620 -0.2652 0.0560 0.6718 0.4355

INVERSIONHT -0.4056 0.2655 0.1807 -0.1991 -0.6500 -0.2584 0.2481 -0.3766

PRESSURE 0.2097 0.5906 0.0501 0.4405 -0.2067 0.5354 -0.2240 -0.1764

VISIBILITY -0.2505 -0.0100 0.7143 0.5183 0.3004 -0.2604 0.0166 0.0025

HEIGHT 0.3951 -0.2880 0.3234 -0.0762 -0.4644 -0.2077 -0.6171 0.1043

HUMIDTY 0.3180 0.4670 -0.3239 0.1815 0.1092 -0.7273 -0.0406 0.0191

TEMP2 0.4740 -0.2602 0.0899 -0.0046 0.1680 -0.0086 0.2154 -0.7904

WINDSPEED 0.1455 0.4602 0.4256 -0.6746 0.3440 0.0588 -0.0878 0.0367

c) Make a scree plot. How many principal components should be retained according to this plot?

d) Can you place any physical interpretation on the first PC? The second?

e) Verify that the length of PC1 is 1 (with allowances for round-off error of course).

f) Calculate the first PC Loadings vector. (In other words, find the eigenvector of length sqrt(first eigenvalue).)

g) Suppose we retain just the first two dimensions. The first observation in this data set was

80, 1298, 32, 40, 5860, 80, 75.2, 3 in the order given in the matrix above. (i.e. Temp is the first, and windspeed the last, variable.) What is the score of this observation for the first two principal components?

h) Make part of a bi-plot: plot each variable as a vector in the space defined by the first two PCs. Which variables contribute most to the first PC? Which to the second?

II. For the following, assume Z(t) is discrete, white noise time series, and E(Z(t)) = 0, Var(Z(t)) = sigma-squared, Cov(Z(t), Z(t + k)) = 0 for all k not equal to 0.

a) Find the auto-correlation function of the MA process given by

X(t) = Z(t) + 0.7 Z (t-1) - 0.2Z(t-2)

b) Find the auto-correlation function -- rho(k) -- of the first-order AR process defined by

X(t) = 0.7 X (t - 1) + Z(t)

Plot rho(k) for k = -6, -5, ... -1, 0, 1, ..., 6

c) Use the time-series package in xlispstat to simulate the processes in (a) and (b). Make plots against time of these processes. (Choose the menu item Series: Create series.) See the note at the bottom of the page.
Mac version (a .sit file)
PC or unix (a "zip" file)

d) Make your own time series! Get a coin and some graph paper. Put the origin in the lower-left corner of the page. The x-axis is "time", which in this case will really be "number of coin flips". Put a mark at (0, 0). Flip a coin. If it lands "heads", then y increases by 1. If it lands "tails", then y stays the same. So if I throw a heads on the first toss, I put a mark at (1,1). If I throw tails, I put the mark at (1,0). Connect the points. Continue for 30 flips or so.

a) What is the probability that your time series will reach (x,20) or beyond? (x is any integer > 0).

b) Suppose you flip many many times. What's the probability you will reach (x,20) or beyond?

c) This is a "random walk" time series:

X(t) = X(t-1) + Z(t). What type of random variable is Z(t)?

d) Use the xlisp software to simulate this time series. Suppose you were to model with a regression. What would be the values of the parameters?

e) Suppose, now, that your coin lands heads with p = .25. Now what would be the values of the parameters in the regression?

Download Software

I admit, this is a bit risky. I'm not sure how this will work on a non-Mac machine. If necessary, I can email it to you, but the file is big. Still, if you click, here is what should happen:
a) you should download a file called timeseries.sit (Macintosh) or timeseries.zip (other).
b) You should "unstuff" it with Stuffit Expander or some other expanding software. (Note: some browsers will unstuff automatically and you can skip this test.)
c) Place it in the Vista folder.
d) You don't need to run Vista. Just double-click on the xlstsp.lsp icon. This will load all of the appropriate files.
These are just text files. So if worse comes to worse, you can open up xlispstat (or ViSta) and load them in by cutting and pasting. But there are many of them, and that would be tedious.

If worse comes to worse, skip those exercises. They are primarily to help give you some experience "seeing" the models.

III. Additional time series problems.

1) A random walk is given by X(t) = X(t-1) + Z(t). Show that X(t) = Z(1) + Z(2) +...+Z(t)

2) Define the difference operator to be Dd. (This is what was written with an upside down delta in class.) Hence
D1(Xt) = X(t) - X(t-1) (First order difference). And
D2(Xt) = D1(D1(X(t)) = D1(X(t) - X(t-1)) = [X(t) - X(t-1)] - [X(t-1) - X(t-2)] = X(t) - 2X(t-1) - X(t-2)
a) What is D3(X(t))?
b) Define W(t) = Dd. Then W(t) = alpha1 W(t-1) + alpha2 W(t-2) +...+alphap W(t-p) + Z(t) + Beta1 Z(t-1) + Beta2 Z(t-2) +...+Betaq Z(t-q)
is the definition of an ARIMA(p,d,q) process. Show that a random walk is an ARIMA(0,1,0) process.
c) What is another name for an ARIMA(p,0,0) process?
d) What is another name for an ARIMA(0,0,q) process?

3. Suppose X(t) = Z(t) + Beta Z(t-1). This is a moving average (MA) process. As we said in class, moving average processes can be difficult to estimate (there is no closed-form solution). As a result, some analysts prefer to rewrite MA processes as AR processes. Re-write this as an AR processess.

OBSERVATIONS	TEMP	INVERSIO	PRESSURE	VISIBILI	HEIGHT	HUMIDTY	TEMP2	WINDSPEE
TEMP	1.00	-0.58	0.41	-0.27	0.76	0.46	0.84	0.29
INVERSIONHT	-0.58	1.00	-0.05	0.37	-0.54	-0.33	-0.84	0.07
PRESSURE	0.41	-0.05	1.00	-0.07	0.03	0.71	0.07	0.42
VISIBILITY	-0.27	0.37	-0.07	1.00	-0.19	-0.44	-0.34	0.00
HEIGHT	0.76	-0.54	0.03	-0.19	1.00	0.10	0.80	0.09
HUMIDTY	0.46	-0.33	0.71	-0.44	0.10	1.00	0.31	0.35
TEMP2	0.84	-0.84	0.07	-0.34	0.80	0.31	1.00	0.10
WINDSPEED	0.29	0.07	0.42	0.00	0.09	0.35	0.10	1.00

COMPONENTS	E-Value	Prop.	CumProp
PC1	3.68398	0.46050	0.46050
PC2	1.82334	0.22792	0.68841
PC3	1.07382	0.13423	0.82264
PC4	0.63570	0.07946	0.90210
PC5	0.42500	0.05313	0.95523
PC6	0.17152	0.02144	0.97667
PC7	0.15307	0.01913	0.99580
PC8	0.03358	0.00420	1.00000

VARIABLES	PC1	PC2	PC3	PC4	PC5	PC6	PC7	PC8
TEMP	0.4751	0.0064	0.2366	0.0620	-0.2652	0.0560	0.6718	0.4355
INVERSIONHT	-0.4056	0.2655	0.1807	-0.1991	-0.6500	-0.2584	0.2481	-0.3766
PRESSURE	0.2097	0.5906	0.0501	0.4405	-0.2067	0.5354	-0.2240	-0.1764
VISIBILITY	-0.2505	-0.0100	0.7143	0.5183	0.3004	-0.2604	0.0166	0.0025
HEIGHT	0.3951	-0.2880	0.3234	-0.0762	-0.4644	-0.2077	-0.6171	0.1043
HUMIDTY	0.3180	0.4670	-0.3239	0.1815	0.1092	-0.7273	-0.0406	0.0191
TEMP2	0.4740	-0.2602	0.0899	-0.0046	0.1680	-0.0086	0.2154	-0.7904
WINDSPEED	0.1455	0.4602	0.4256	-0.6746	0.3440	0.0588	-0.0878	0.0367