Basic Definition :
SS(sum of squares) : the sum of squares of any list of
n real numbers, a_1, ..., a_n, refers to
(a_1 - \bar a )^2 + .... + (a_n - \bar a)^2
where \bar a
= average of a_i
[Note ; if \bar a =0, then SS is nothing but the
squared length of the vector a = (a_1, ...a_n)' ]
[ Geometrically, the vector (\bar a, ...\bar a)'
is the projection of vector a in the direction of (1,
.., 1)]
[ The vector (a_1-\bar a, ...a_n-\bar a)' is the orthogonal
complement ]
[ From Pythagorean thorem , we can see why
||a||^2 = SS + n (\bar a)^2 must hold. ]
One-way ANOVA :
The data are grouped (classified, partitioned
) into several groups (classes, clusters) :
Group 1 : (Y_11, Y_12, .... Y_1n_1)
Group 2 : (Y_21, ....
, Y_2n_2)
...............
Group k : (Y_k1, ... Y_k n_k)
In total we have n = n_1 + ... + n_k obervations.
Total Sum of Squares : Apply SS to the list
of n Y values
Between group Sum of squares :
(1) create a list of n values by replacing each Y_ij with the corresponding
group mean; call this group-mean vector
(remember that it is also an n-dimensional vector, not a k-dimensional
vector)
(2) Apply SS to this new list of n values.
Within group Sum of squares (also called error sum of squares, residual
sum of squares) :
(1) Create a list of n values by replacing each Y_ij with the difference
between Y_ij and its group mean
(2) Apply Sum of squares to (1)
Again the ANOVA identity holds because of Pythagorean theorem.
There are three spaces :
(1) S_1 : The dimensional space generated by the vector of ones,
(1, 1, ..1)'.
(2) S_2 : The k dimensional space generated by k group
indicator vectors, e_1, ..e_k. For instance,
e_2 = (0,....0,
1, 1,....,1, 0,0,.....) where the places for 1 corresponding to the
second group.
(3) S_3 : The entire n dimensional space.
[Note : S_1 is within S_2 and S_2 is within S_3. Thus these spaces
are nested.]
The projection of Y vector onto space S_1 is the vector of grand
mean.
The projection of Y onto S_2 is the vector of group means as
created in the first step of Between
The projection of group-mean vector onto S_1 = vector
of grand mean.
Now to see how the Pythogrean Theorem comes in, we need a 3-D diagram ,
vector C = vector Y - vector of grand mean
vector A = group-mean vector - vector of grand mean
vector B = vector Y - group-mean vector
Geometry shows that A, B are othoognal to each other, while algeraically
C=A+B ; so Pythogorean applies :
||C||^2 = ||A||^2 + ||B||^2
from which we can easily see the ANOVA identity holds because
||C||^2 = total sum of squares
||A||^2 = Between group sum of squares
||B||^2 = Within group sum of squares.