Stats M254/Biomath M271 Statistical Methods in Computational Biology
Course description: Introduction to the statistical models and methods that are recently developed and widely applied in several branches of computational biology, such as sequence alignment, motif discovery, gene expression, protein structure prediction, and comparative genomics. Emphasize on the understanding of basic statistical concepts and the ability to use statistical modeling and inference to solve biological problems. Statistical topics introduced include MLE, Bayesian estimation, posterior sampling, Markov chains, hidden Markov models, dynamic programming, EM algorithms, Markov chain Monte Carlo, model selection, multiple testing, clustering, classification, etc.
Instructor: Qing Zhou, Department of Statistics,
zhou@stat.ucla.edu
Lectures: MW 1pm ¨C 2:20pm, MS 3915H/3915A.
Office
Hours: Wed. 4pm ¨C 5:30 pm, MS 8979.
Reference books:
1) Durbin, R. et al. (1999) Biological sequence
analysis: Probabilistic models of proteins and nucleic acids.
2) Ewens, W.J. and
Grant, G.R. (2005) Statistical methods in bioinformatics: An introduction.
Lecture notes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Homework:
Hw1 (Due Wed. Feb 13) Dataset 1, Dataset 2, Dataset 3
Hw2 (Due Mon. Mar 10) Dataset 4
Final (Due Tue. Mar 18) Dataset 5, Dataset 6
More references:
[1] Ji, H.K. and Wong, W.H. (2006) Computational biology: Towards deciphering gene regulatory information in mammalian genomes. Biometrics,62, 645-663. (A review on cis-regulatory analysis, from gene expression to motif detection)
[2] Hastie T., Tibshirani, R and Friedman, J (2001) Elements of Statistical Learning, Springer-Verlag, New York. (A good text book on statistical learning, including all the classification methods we covered in this course)
[3] Freund, Y. and Schapire, R (1997) A decision-theoretical generalization of online learning and an application to boosting, J. Comp. Syst. Sci., 55, 119-139. (Boosting paper)
[4] Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., and Wooton, J.C. (1993) Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262, 208-214. (Gibbs motif sampler paper)
[5] Liu, J.S., Neuwald, A.F., and Lawrence, C.E. (1995) Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Am. Stat. Assoc. 90, 1156-1170. (More statistical view of Gibbs motif sampler)
[6] Krogh, A, Brown, M, Mian, IS, Sjolander, K, and Haussler, D (1994) Hidden Markov models in computational biology: Applications to protein modeling. J. Mol. Biol., 235, 1501-1531. (HMM and its application in sequence modeling and alignment and etc.)
[7] Rabiner, L.R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77, 257-286. (A general review on HMMs)
[8] Kumar S. and Filipski A. (2007) Multiple sequence alignment: In pursuit of homologous DNA positions. Genome Research, 17, 127-135. (A recent review of multiple alignment methods)