gene-expression.lsp

Gene expression data :

In particular,   suppose all genes are already groups into several blocks , according to some structural
     or functional (metabolic pathway) properties; allowing the unknown ones are a separate group. Then
    we can perform two-way ANOVA , examine residuals, perform pca, either separately or jointly for each block.
It will become more transparent as the course developed   for you to see how such a framework is tied to the SIR methodology.

What can you do to help ?

1. Software development. A user friendly implementation environment that would take a general blocking structure and perform
several analysis modules based on many ways of combined use of two-way analysis and pca in xlisp-stat is hopefully ready soon. See attachment at the end.
To use that protocol for gene expression, additional customization would be important, such as providing linkage to gene data bank and other analysis methods. Other implementation environment
using C++, matlab, or S+ would be desirable too, as some of you told me earlier.

2.    Work on specific examples of Blocking . (need persons who know biochemistry, and know how to search gene/protein bank). In addition to the micro-array data, I have an Affymetrics data
          on aging gene from a research group in Scripps.
         2.1. structrual
         2.2. pathway
         2.3. phylogenetic profile
         2.4. multiple indexing

3.    K-mean and clustering. Blocking can be provided by k-means or other clustering method.
       3.1. incorporate existing methods
      3.2. landscape description.
      3.3. re-grouping, collapsing or dividing

4. multiple index . It is ok to allow for more than one ways of blocking. How to synthesize different results from different indexing methods?

5. classifying an unknown case.

6. Between block analysis.

7. Crossing many data resources.

+++++++++++++++++++++++++
As a first step, the following will be implemented in xlisp-stat (hopefully in a couple of days by Rober Yuan, adopping
partly from his project in OIS imaging analysis)

(two-way-block-model y block-index)

where y is a list of n lists, each list has p elements (like in two-way model)

block-index is a list of k lists, each list represnts a block, so the values must be between 0 and n; block sizes could be different

output :

this is similar to your imaging problem
0. grand mean.
1. column effect
2. block effect
3. block X column effect
4. row effect within block
5. row X column within block

6. provide sum of squares for 1, ..., 5.

7. provide a dialog box and ask what follow-up analysis wanted ? button-click
7.1. pca for block X column effect (using weighted version, weighted by block sizes; equivalently, you can expand the matrix into n by p (instead of k by p) in the obvious way).
7.2. pca for row X column within i th block. i = input a number between 1 and k.

7.3. sir model for block X column, (this is just the SIR analysis you do to find basis curves, using the block as slicing)

8. provide plotting options

9. allow send message for further analysis.

10. you can provide menu bar if you want.