My Name is Jacky, I am currently a Phd Candidate in Statistics @ UCLA.

My thesis focuses on statistical modeling for Bioinformatics under Prof. Yingnian Wu, together with a versatile committee of Prof. Rick Schoenberg, Prof. Hongquan Xu and Prof. Yi Xing.

My work addresses two fundamental biological problems: detection of differential RNA-Seq alternatie splicing patterns and deconvolution of GCMS metabolomic data. Both works have demonstrated success through intense integration of statistics, machine learning and biology. R packages grMATS, gcmsDecon up soon.

I also have a wide variety of other modeling interests, including both machine learning and statistics. My initial thesis topic was on a unsupervised learing problem in computer vision. I also did modeling in environmental science and pharmaceutics. In addition, I always find it intriguing and love to learn more about modeling for searching relevance and personalized recommendation of music, movies, online products, etc.


My research is application-driven statistical modeling. I have to be capable of using tailored statistical knowledge and code it myself to cope with problems from a specific field. Thus, I have a wide range of interets in terms of statistics itself, either traditional statistics or modern statistics. In terms of classic statistics, I'm interested in composite hypothesis testing, maximum likelihood estimation and regression. For modern statistics (machine learning), I'm fond of penalized regression and love to master these hot classification techniques (boosting, SVM, deep learning, etc). Here are the main projects that I have been working or worked on. I'm still building the project pages.

Differential RNA-Seq Alternative Splicing in Multiple Isoforms

case vs control:

A single gene codes for multiple proteins, if the balance of these protein proportions is disturbed, a disease might occur. With limited biological samples and we identify these significant ones among thousands of genes using multinomial logistic model with random effects and provide rigorous composite hypothesis tests.

GCMS Metabolomic Deconvolution

Seperate blue from red:

GCMS data matrix can be imagined as an image, a snapshot of all metabolites in a bio-chemical samples. Every metabolite has its own signature spectrum overlapping with each other. This poses a deconvolution problem. We proposed an automatic matrix factorization model to accomplish this challenging deconvolution task (1k*400k) and learn these signals.

Unsupervised Learning on Sparse Coding of Images

Sparse representation of horses:

We believe images share parts that our brain visualize and understand these images using a sparse coding. We want to train a sparse dictionary from a group of similar images in a unsupervised way. We use K-SVD and group lasso to achieve some interesting preliminary results.


Some interesting moneyball-like websites. I'm still exploring.



  • Ph.D. in Statistics, University of California, Los Angeles, CA, 2016.
  • M.S. in Statistics, University of California, Los Angeles, CA, 2012.
  • B.S. in Applied Mathematics, Shanghai Univeristy, China, 2010.

Professional Experience

  • Data Analyst, Experian Interactive Media, Los Angeles, CA, 2012.
  • Statistician, Skinwest, La Verne, CA, 2011.


  • Y. Yi, S. Shen, Y. Wu, Y. Xing. Statistical Modeling and Testing for Detection of Differential Alternative Splicing in Multiple Isoforms Using RNA-Seq Data. (to be submitted soon)
  • Y. Yi, F. Fazlollahi, A. Quach, K. Faull, Y. Wu. Localized and Simultaneous Non-Negative Matrix Factorization for Deconvolution of Multiple GCMS Signals. (to be submitted soon)
  • D. DeYoung, K. Heinzerling, A. Swanson, J. Tsuang, B. Furst, Y. Yi, Y. Wu, D. Moody, D. Andrenyak, S. Shoptaw. Safety of Ibudilast Treatment during Intravenous Methamphetamine Administration. To appear in Journal of Clinical Psychopharmacology.
  • K. Heinzerling, A. Swanson, T. Hall, Y. Yi, Y, Wu, S. Shoptaw. Randomized, placebo-controlled trial of bupropion in methamphetamine-dependent participants with less than daily methamphetamine use. Addiction, 109(11):1878-1886, 2014.


  • Y. Yi, B. Ritz. Metabolomic Analysis of Parkinson’s Disease. The Burroughs Wellcome Fund Inter-school Training Program in Chronic Diseases, the Genomic Analysis & Interpretation Training Program, and the Systems and Integrative Biology Training Grant Joint Research Symposium, 2015. (Oral presentation)
  • Y. Yi, F. Fazlollahi, J. Gornbein, K. Faull and Y. Wu. Template-Based Aligner: A Toolbox for Metabolomic Data Analysis. 62nd American Society for Mass Spectrometry Conference. Metabolomics: Identification of Unknown Metabolites, MP 626, 2014. (Poster)
  • Y. Yi, K. Chan, E. Sobel, S. Liu, Y. Wu. Identification of Genetic Signals from Genome-Wide Association Study (GWAS) of Diabetes Using Penalized Regression. Burroughs Wellcome Fund Programs Unifying Population and Laboratory Based Sciences (PUP) Symposium. May, 2013. (Poster)

Honors and Awards

  • University Dissertation Year Fellowship, UCLA, Los Angeles, CA, 2015.
  • Burroughs Wellcome Fund-Chronic Diseases Inter-school Training Pro-gram (BWF-CHIP) Fellowship (formerly: Burroughs Wellcome Fund Inter-school Training Program in Metabolic Diseases, BWF-IT-MD), UCLA, Los Angeles, CA, 2013, 2014.


  • R, Matlab, Python.
  • C++, C.
  • MySQL, Unix, Latex.


  • STATS 200C: Large Sample Theory. Spring 2014, Spring 2015.
  • STATS 202A: Statistical Programming. Fall 2014.
  • STATS 202B: Matrix Algebra and Optimization. Winter 2014.
  • STATS 200A: Applied Probability. Fall 2013.
  • STATS 100B: Introduction to Mathematical Statistics. Winter 2015.
  • STATS 100A: Introduction to Probability. Fall 2012.
  • STATS 100C: Linear Models. Summer 2012.
  • STATS 10: Introduction to Statistical Reasoning. Spring 2012.
  • STATS 13: Introduction to Statistical Methods for Life and Health Sciences. Winter 2012.


Please contact me through email

yi (dot) yi (at) stat (dot) ucla (dot) edu