pca-section3.html

Population version of PCA

Let the sample size tend to the infinity
Sample covariance-matrix converges to a matrix which is the population covariance-matrix (due to law of large number)
The rest of steps remain the same
We shall use the population version for theoretical discussion

Basic facts :

1. Variance of linear combination of random variables
var(a x + b y)= a^2 var(x) + b^2 var(y) + 2 a b cov(x,y)

2. Life Easier if using matrix representation :

(B.1) var ( mí X)= mí Cov(X) m
here m is a p-vector, X consists of p random variables (x_1, Ö,x_p)í

From (B.1), it follows that

Maximizing var(míx) subject to ||m||=1 is the same as Max mícov(X)m subject to ||m||=1
(here ||m|| denotes the length of the vector m)

3. Eigenvalue decomposition :

(B.2) M v_i = l_i v_i, where
l_1 >= l_2 >= Ö. >= l_p

Basic linear algebra tells us that the first eigenvector will do :
Solution of max mí M m subject to ||m||=1 must satisfy M m= l_1 m

4. Covariance matrix is degenerated (I.e, some eigenvalues are zero) if data are confined to a lower dimensional space S
Rank of covariance matrix = number of non-zero eigenvalues = dim. of the space S

This explain why pca works for our first example

Why small errors can be tolerated ?

Large i.i.d. errors are fine too

Heterogeneity is harmful, correlated errors too

Further Discussion :

No guarantee of finding nonlinear structure like clusters , curves, etc.

In fact, sampling properties for pca are mostly developed for normal data
(Mardia, Kent, Bibby 1979, Multivariate Analysis. New York: Academic Press)

Still useful

Scaling problem ( use correlation or not ?) Options are available in PCA-model.lsp

Projection pursuit: guided; random