index.html

Stat 216 Friday, April 7, Spring 2000

Analysis of variance (ANOVA)

This is one of the most fundamental tools in Statistics. It appears very simple. Indeed, it is just
a way of representing Pythagorean theorem.
Yet it has very far reaching implications both in real data application and in concept/theory
development. ANOVA ideas reappear in such seemingly unrelated areas like
power spectrum analysis in time series or the study of large sample properties of
U-statistics (Efron-Stein identity) and bootstrapping,.

ANOVA when used creatively can lead to a powerful system for exploring large dimensional data.

However, unfortunately from standard textbook, one gets the feeling that
" What ANOVA does is mainly for using F-test of signficance "

In this course , you shall see how this is such a missleading conclusion.
+++++++++++++++++++++++++++++++++++++++++++
(1) One-way (Program in xlis-stat, oneway.lsp : usage (oneway-model y) ; Show me)

variation measure : sum of squares (SS) (Tell me more in class)

          Partitioning of SS :
        Total = Between group(class, cluster) + Within group(class, cluster)
        Degree of freedom

(2). Random variable version of ANOVA identity (Tell me more)

Var (Y) = Var( E(Y|X)) + E ( Var (Y|X))

(3) Two-way (Download two-way program : usage ; ( twoway-model-additive y); show me)

Additive model : response = grand mean + row effect + column effect (Tell me more in class)

Interaction model: response = grand mean + row effect + column effect + interaction

Degree of freedoms (use or absue)

Single replicate : interaction cannot be seperated from error
no degrees of freedom left for interactions. What to do ?

(4) Creative ways of using ANOVA :

      4.1. Tukey's one degree's of freedom test :
               response = grand mean + row effect + column effect + b ( row-effect times column-effect) (Tell me more in class)
                           It is very easy to fit this model. Tukey shows that
                       to test b=0 or not, all you have to do is to pretend that grand mean, row and column effects
                                     are known; so you have a simple linear regression problem of testing if the slope is zero.

4.2. Many rows or columns : More Fun for exploring the residual patterns.

4.2a. More to plot : for example, plot the sum of squares of residuals in the additive model for each row ( each column) against
the corresponding column effect (the row effect, respectively).

       4.2b Combined use with PCA-model for dimension reduction:
                         Since the residuals are naturally arranged in the matrix form( let's call it
                               the residual matrix) , we can treat it as a data matrix,
                                and apply PCA to reduce dimension.

                     Rubber's Data - revisited :   (Show me )
                                It turns out that the residual matrix is nearly degenerated to a rank one matrix.
                                Automatic Basis curve finding (to be discussed later)

(5) Singular value decomposition of data matrix

(6) Nested structure and crossing structure (To be discussed more later)