Stats M254    Statistical Methods in Computational Biology

Course description: Introduction to statistical and computational methods in computational biology and bioinformatics. Emphasize on the understanding of basic statistical concepts and the ability to use statistical inference to solve biological problems. The course covers gene expression data, regulatory sequence analysis, ChIP-chip/seq data, RNA-seq data, and their applications in gene regulation analysis. Statistical methods include multivariate methods, statistical sequence analysis, machine learning, Markov chain Monte Carlo, etc. See syllabus for more details.

Syllabus

Link to the course Moodle site for lecture notes, homework assignments, etc.

More references:

[1] Ji, H.K. and Wong, W.H. (2006) Computational biology: Towards deciphering gene regulatory information in mammalian genomes. Biometrics,62, 645-663. (A review on cis-regulatory analysis, from gene expression to motif detection)

[2] Hastie T., Tibshirani, R and Friedman, J (2001) Elements of Statistical Learning, Springer-Verlag, New York. (A good text book on statistical learning, including all the classification methods we covered in this course)

[3] Freund, Y. and Schapire, R (1997) A decision-theoretical generalization of online learning and an application to boosting, J. Comp. Syst. Sci., 55, 119-139. (Boosting paper)

[4] Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., and Wooton, J.C. (1993) Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208-214. (Gibbs motif sampler paper)

[5] Liu, J.S., Neuwald, A.F., and Lawrence, C.E. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 90, 1156-1170. (More statistical view of Gibbs motif sampler)

[6] Krogh, A, Brown, M, Mian, IS, Sjolander, K, and Haussler, D (1994) Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol., 235, 1501-1531. (HMM and its application in sequence modeling and alignment and etc.)

[7] Rabiner, L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257-286. (A general review on HMMs)

[8] Kumar, S. and Filipski, A. (2007) Multiple sequence alignment: In pursuit of homologous DNA positions. Genome Research, 17, 127-135. (A recent review of multiple alignment methods)

[9] Jensen, S.T., Liu, X.S., Zhou, Q. and Liu, J.S. (2004) Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Statistical Science, 19: 188-204. (Review of Bayesian methods for motif discovery)

[10] Pepke, S., Wold, B. and Mortazavi, A. (2009) Computation for ChIP-seq and RNA-seq studies. Nature Methods, 16, s22-s32. (A recent review of computational methods for ChIP-seq and RNA-seq)