In particular, suppose all genes are already groups
into several blocks , according to some structural
or functional (metabolic pathway) properties;
allowing the unknown ones are a separate group. Then
we can perform two-way ANOVA , examine residuals,
perform pca, either separately or jointly for each block.
It will become more transparent as the course developed
for you to see how such a framework is tied to the SIR methodology.
What can you do to help ?
1. Software development. A user friendly implementation
environment that would take a general blocking structure and
perform
several analysis
modules based on many ways of combined use of two-way analysis and
pca in xlisp-stat is hopefully ready soon. See attachment at the
end.
To use that protocol for gene expression, additional customization
would be important, such as providing linkage to gene data bank and other
analysis methods. Other implementation environment
using C++, matlab, or S+ would be desirable too, as some
of you told me earlier.
2. Work on specific examples of Blocking . (need persons
who know biochemistry, and know how to search gene/protein
bank). In addition to the micro-array data, I have an Affymetrics data
on aging gene
from a research group in Scripps.
2.1. structrual
2.2. pathway
2.3. phylogenetic
profile
2.4. multiple
indexing
3. K-mean and clustering. Blocking can be provided
by k-means or other clustering method.
3.1. incorporate existing methods
3.2. landscape description.
3.3. re-grouping, collapsing or
dividing
4. multiple index . It is ok to allow for more than one ways of blocking. How to synthesize different results from different indexing methods?
5. classifying an unknown case.
6. Between block analysis.
7. Crossing many data resources.
+++++++++++++++++++++++++
As a first step, the following will be implemented in xlisp-stat
(hopefully in a couple of days by Rober Yuan, adopping
partly from his project in OIS imaging analysis)
(two-way-block-model y block-index)
where y is a list of n lists, each list has p elements (like in two-way model)
block-index is a list of k lists, each list represnts a block, so the values must be between 0 and n; block sizes could be different
output :
this is similar to your imaging problem
0. grand mean.
1. column effect
2. block effect
3. block X column effect
4. row effect within block
5. row X column within block
6. provide sum of squares for 1, ..., 5.
7. provide a dialog box and ask what follow-up analysis wanted ? button-click
7.1. pca for block X column effect (using weighted version,
weighted by block sizes; equivalently, you can expand the matrix into n
by p (instead of k by p) in the obvious way).
7.2. pca for row X column within i th block. i = input a number
between 1 and k.
7.3. sir model for block X column, (this is just the SIR analysis you do to find basis curves, using the block as slicing)
8. provide plotting options
9. allow send message for further analysis.
10. you can provide menu bar if you want.