``To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.'' (R.A. Fisher, Indian Statistical Congress, Sankhya, ca 1938)
Fisher started working as a statistician in 1919 at the Rothamsted Experimental Station. This research center in England was conducting a series of experiments to measure the effects of different fertilizers on various crops (and still is, see http://www.iacr.bbsrc.ac.uk/res/treshome.html). It is this context that much of the modern statistical theory was born and that he studied ``The design of experiments''. The context of gene arrays (where different DNA fragments are spotted in a plane and groups of them are ``treated'' with different hybridizations) is actually very similar to the one in which Fisher was working.
In the terminology of experimental design, each outcome (observation of the variable of interest, which is the expression level in our case) is measured in correspondence of a set of levels of different factors, that may influence it. For example, in the case of arrays, these are some factors present in the experiment:
Some of these factors are of interest and the experimenter specifically wants to study how the outcome depends on their variation. Some other correspond to inevitable variability in the experimental setting and the researcher mainly wants to control for their presence in interpreting the results. In the typical microarray experiment, cell line and gene are the effects one wants to estimate and the remaining are factors one wants to control for. However the situation might be different when a laboratory is setting up the protocols and the machinery to do microarray experiments: then dye, position, pen, array are the effects one wants to measure to identify the procedure that minimize their variability.
To try to control for both dye and hybridization experiment, researcher in cDNA array technology often hybridize the same array to the cDNA from two cell lines colored with a different dye. The ratio of the expression levels in the two dyes is measured: this is supposed to correct for the different amount of hybridization from one array to the other. To correct for the dye effect the values are then renormalized, so that their logs have mean zero.
This is a particular way of addressing the problem. There are, however, others that statistical design of experiment can suggest and that may be more effective. In particular, notice that in the setting described above, dye and cell line effect are completely confounded. Is it really necessary for this to be the case?
Already in the literature, different methods are explored: for example it is often recommended for two arrays to be hybridized to the same couple of cell lines, switching the dye across experiments. This is only one example of the type of suggestions that could come from design of experiments; an other easy example can involve multiple spotting of the same gene on an array.
In general, the following are few basic principles of the design of experiments. Replication is the repetition of the basic experiment and is necessary to estimate the error and obtain a precise estimate of the effects of interest. Randomization allows to control for effects of extraneous factor that are not modeled. Blocking identifies subsets of the data that are more homogeneous.
References