Selected Publications & Manuscripts under Review
Manuscripts patiently awaiting publication :-)
Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility (with Li, Y. and Dai, X.)
CTSyn: A Foundational Model for Cross Tabular Data Generation (with Lin, X., Xu, C. and Yang, M.)
TimeAutoDiff: Combining Autoencoder and Diffusion
model for time series tabular data synthesizing (with Suh, N. et al)
Watermarking Generative Tabular Data (with He, H., Yu, P., Ren, J. and Wu, Y.N.)
BadGD: A unified data-centric framework to identify gradient descent vulnerabilities (with Wang, C.-H.)
Latent Energy-Based Odyssey: Black-Box
Optimization via Expanded Exploration in the
Energy-Based Latent Space (with Yu, P. et al)
Discriminative Estimation of Total Variation Distance:
A Fidelity Auditor for Generative Data (with Tao, L., Xu, S., Wang, C.-H. and Suh, N.)
Minimax Optimal Fair Classification
with Bounded Demographic Disparity (with Zeng, X. and Dobriban, E.)
Approximation of RKHS Functionals by Neural Networks (with Zhou, T., Suh, N. and Huo, X.)
Rate-Optimal Rank Aggregation with Private Pairwise
Rankings (with Xu, S. and Sun, W.W.)
Bayes-Optimal Fair Classification with Linear Disparity Constraints
via Pre-, In-, and Post-processing (with Zeng, X. and Dorbriban, E.)
Benefits of Transformer: In-Context Learning in Linear Regression Tasks with
Unstructured Data (with Xing, Y., Lin, X., Sub, N. and Song, Q.)
MissDiff: Training Diffusion Models on Tabular Data
with Missing Values (with Ouyang, Y., Xie, L. and Li, C.)
Utility Theory of Synthetic Data Generation (with Xu, S. and Sun, W.-W.)
Ranking Differential Privacy (with Xu, S. and Sun, W.W.)
Differentially Private Bootstrap: New Privacy Analysis and Inference
Strategies (with Wang, Z. and Awan, J.)
Dynamic Matching Bandit For Two-Sided Online
Markets (with Li, Y., Wang, C.-H., and Sun, W.)
Attention Enables Zero Approximation Error
(with Fang, Z., Ouyang, Y. and Zhou, D.)
High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees (with Sun, Y., Maros, M. and Scutari, G.)
On Deep Instrumental Variables Estimate (with Liu, R. and Shang, Z.)
Selected Publications *: former/current PhD Student; **: former/current Postdoc
Ouyang*, Xie, Zha and Cheng (2024)
Transfer Learning for Diffusion Models , NeurIPS
Suh** and Cheng (2024)
A Survey on Statistical Theory of Deep Learning:
Approximation, Training Dynamics, and Generative Models , Annual Review of Statistics and Its Application
Xia, Wang**, Mabry and Cheng (2024)
Advancing Retail Data Science:
Comprehensive Evaluation of Synthetic Data, KDD Workshop on GenAI Evaluation
Ward*, Wang** and Cheng (2024)
Data Plagiarism Index: Characterizing the
Privacy Risk of Data-Copying in
Tabular Generative Models , KDD Workshop on GenAI Evaluation
Li*, Cheng and Dai (2024)
Two-sided Competing Matching Recommendation Markets With Quota and
Complementary Preferences Constraints , ICML
- Ward*, Zeng** and Cheng (2024)
FairRR: Pre-Processing for Group Fairness through Randomized Response , AISTATS
- Xing*, Lin*, Song, Xu, Zeng and Cheng (2024)
Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective , AISTATS
- Hsieh, Wang** and Cheng (2023) Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from
Data-centric Perspective, ACM International Conference on AI in Finance -- Workshop
- Wang*, Wang*, Sun and Cheng (2023)
Online Regularization towards
Always-Valid High-Dimensional Dynamic Pricing , Journal of the American Statistical Association -- T&M
- Xu**, Wang**, Sun and Cheng (2023) Binary Classification under Local Label Differential Privacy
Using Randomized Response Mechanisms, Transactions on Machine Learning Research
- Suh**, Lin*, Hsieh, Honarkhah and Cheng (2023)
AutoDiff: combining Auto-encoder and Diffusion
model for tabular data synthesizing, NeurIPS Workshop on SyntheticData4ML
Li*, Wang**, Cheng and Song (2023) Optimum-statistical Collaboration Towards General and
Efficient Black-box Optimization, Transactions on Machine Learning Research
- Ouyang, Xie and Cheng (2023) Improving Adversarial Robustness by Contrastive Guided Diffusion Process ICML
- Li, Wang** and Cheng (2023) Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms ICLR
- Ning and Cheng (2023)
Sparse Confidence Sets for Normal Mean Models Information and Inference: A Journal of the IMA
Fang** and Cheng (2023) Optimal Learning Rates of Deep Convolutional Neural Networks: Additive Ridge Functions, Transactions on Machine Learning Research
- Zeng**, Dorbriban and Cheng (2022) Fair Bayes-Optimal Classifiers under Predictive Parity NeurIPS
- Cheng, Wang**, Potluru, Balch and Cheng (2022) Downstream Task-Oriented Generative Model Selections on
Synthetic Data Training for Fraud Detection Models ACM International Conference on AI in Finance -- Workshop
- Xing*, Song and Cheng (2022) Phase Transition from Clean Training to Adversarial Training NeurIPS
- Xing*, Song and Cheng (2022) Why Do Artificially Generated Data Help Adversarial Robustness? NeurIPS
Ramprasad*, Li*, Yang, Wang, Sun, and Cheng (2022) Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning JASA -- T&M
Li*, Wang*, Zhang and and Cheng (2022) Variance Reduction on General Adaptive Stochastic Mirror Descent, Machine Learning Journal (short version accepted in NeurIPS OPT Workshop)
Song and Cheng (2022) Optimal False Discovery Control of Minimax Estimator Bernoulli
Yu*, Chao**, and Cheng (2022)
Distributed Bootstrap for Simultaneous Inference under High Dimensionality,
Wu*, Wang*, Li* and Cheng (2022)
Residual Bootstrap Exploration for Stochastic Linear Bandit,
Xing*, Song and Cheng (2022) Benefit of Interpolation in Nearest Neighbor Algorithms, SIAM Journal on Mathematics of Data Science
Huang, Huang, Yang** and Cheng (2022) Power Iteration for Tensor PCA, JMLR
Xing*, Song and Cheng (2022) Unlabelled Data Help: Minimax Analysis and Adversarial Robustness, AISTATS
Xing*, Song and Cheng (2021) On the Algorithmic Stability of Adversarial Training, NeurIPS
Liu*, Yang, Shang** and Cheng (2021) Nonparametric Testing under Random Projection, IEEE Transactions on Pattern Analysis and Machine Intelligence Talk Slides
Li*, Wang* and Cheng (2021) Online Forgetting Process for Linear Regression Models, AISTATS
Xing*, Song and Cheng (2021) On the Generalization Properties of Adversarial
Training, AISTATS
Hu*, Wang, Lin and Cheng (2021) Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network , AISTATS
Xing*, Zhang and Cheng (2021) Adversarially Robust Estimate and Risk Analysis in Linear Regression, AISTATS
Xing*, Song and Cheng (2021) Predictive Power of Nearest Neighbors Algorithm under Random Perturbation, AISTATS
Chen, Wan, Cai and and Cheng (2020) Machine Learning in/for Blockchain: Future and Challenges, Canadian Journal of Statistics
Chao**, Wang*, Xing* and and Cheng (2020) Directional Pruning of Deep Neural Networks, NeurIPS [code can be found in Github; theory is based on this work]
Bai*, Song and and Cheng (2020) Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee, NeurIPS
Duan*, Qiao and and Cheng (2020) Statistical Guarantees of Distributed Nearest Neighbor Classification Talk Slides, NeurIPS
Guo and and Cheng (2020) Moderate-Dimensional Inferences on Quadratic Functionals in Ordinary Least Squares
, JASA-T&M R Package: MDOLS
Yu*, Chao** and Cheng (2020) Simultaneous Inference for Massive Data: Distributed Bootstrap, ICML
Cheng*, Qiao** and Cheng (2020) Mutual Transfer Learning for Massive Data, ICML
Yang, Shang** and Cheng (2020)
Non-asymptotic Theory for Nonparametric Testing, COLT, Talk Slides
- Zheng** and Cheng (2020) Finite Time Analysis of Vector Autoregressive Models under Linear Restrictions, Biometrika, Talk Slides
- Hao*, Zhang and Cheng (2020) Sparse and Low-rank Tensor Estimation via Cubic Sketchings, IEEE-Information Theory, a short version published in AISTATS.
- Wang* and Cheng (2020) Online Batch Decision-Making with High-Dimensional Covariates, AISTATS
Liu*, Shang** and Cheng (2020) Nonparametric Distributed Learning under General Designs, Electronic Journal of Statistics
- Hao*, Abbasi-Yadkori, Wen and Cheng (2019) Bootstrapping Upper Confidence Bound, NeurIPS
- Shang**, Hao* and Cheng (2019) Nonparametric Bayesian Aggregation for Massive Data, Journal of Machine Learning Research Talk Slides
- Qiao, Duan* and Cheng (2019) Rates of Convergence for Large-scale Nearest Neighbor Classification, NeurIPS
Liu*, Shang** and Cheng (2019) Sharp Theoretical Analysis for Nonparametric Testing under Random Projection , COLT
Zhu, Yu* and Cheng (2019) High Dimensional Inference in Partially Linear Models, AISTATS
Lyu, Sun*, Wang, Liu, Yang and Cheng (2019) Tensor Graphical Model: Non-convex Optimization and
Statistical Inference, IEEE-Transactions on Pattern Analysis and Machine Intelligence.
Liu* and Cheng (2018) Early Stopping for Nonparametric Testing, NIPS Poster
Xu, Shang** and Cheng (2018) Optimal Tuning for Divide-and-Conquer Kernel Ridge Regression with Massive Data, ICML (oral), 80:5479-5487. An extended version published in Journal of Computational and Graphical Statistics.
Volgushev, Chao** and Cheng (2018) Distributed Inference for Quantile Regression Processes, Annals of Statistics, To Appear. Talk Slides
Yu*, Levine and Cheng (2018) Minimax Optimal Estimation in Partially Linear
Additive Models under High Dimension, Bernoulli, To Appear.
Li, Cheng, Fan and Wang (2018) Embracing Blessing of Dimensionality in Factor Models, Journal of the American Statistical Association - Theory & Methods, 113, 380-389
Hao*, Sun*, Liu and Cheng (2018) Simultaneous Clustering and Estimation of Heterogeneous Graphical Models, Journal of Machine Learning Research, 18(217):1−58.
Shang** and Cheng (2017) Gaussian Approximation of General Nonparametric Posterior Distributions, Information and Inference, To Appear. In memory of Prof. Jayanta Ghosh
Shang** and Cheng (2017) Computational Limits of a Distributed Algorithm for Smoothing Spline, Journal of Machine Learning Research, 18(108):1−37. Poster
Zhang and Cheng (2017) Gaussian Approximation for High Dimensional Vector under Physical Dependence, Bernoulli, To Appear
Chao**, Vogushev and Cheng (2017) Quantile Processes for Semi and Nonparametric Regression, Electronic Journal of Statistics, 11, 3272 - 3331
Zhang and Cheng (2017) Simultaneous Inference for High-Dimensional Linear Models, Journal of the American Statistical Association - Theory & Methods, 112, 757-768. R Package: SILM
Sun*, Lu, Liu and Cheng (2017) Provable Sparse Tensor Decomposition, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 899–916
Sun*, Qiao and Cheng (2016) Stabilized Nearest Neighbor Classifier and Its Statistical Properties, Journal of the American Statistical Association - Theory & Methods, 111, 1254-1265
Minsker, Zhao and Cheng (2016) Active Clinical Trials for
Personalized Medicine, Journal of the American Statistical Association - Theory & Methods, 111, 875-887
Zhao, Cheng and Liu (2016) A Partially Linear Framework for Massive Heterogeneous Data. Annals of Statistics, 44, 1400-1437. See Talk Slides, Full Manuscript
Pati, Bhattacharya and Cheng (2015) Optimal Bayesian estimation in random covariate design
with a rescaled Gaussian process prior, Journal of Machine Learning Research, 16, 2837−2851
Sun*, Wang, Liu and Cheng (2015) Non-Convex Statistical Optimization for Sparse Tensor Graphical Model, NIPS (Acceptance Rate: 21.9%).
Shang** and Cheng (2015) Nonparametric
Inference in Generalized Functional Linear Models Annals of Statistics, 43, 1742-1773 (See Talk Slides)
Cheng and Shang** (2015) Joint Asymptotics for Semi-Nonparametric
Regression Models under Partially Linear Structure, Annals of Statistics, 43, 1351-1390 (See Talk Slides)
Cheng, Zhang and Shang** (2015) Sparse
and Efficient Estimation for Partial Spline Models with Increasing Dimension,
Annals of Institute of Statistical Mathematics, 67, 93-127
Cheng (2015) Moment Consistency of the Exchangeably
Weighted Bootstrap for Semiparametric M-Estimation,
Scandinavian Journal of Statistics, 42, 665-684
Cheng, Zhou and Huang (2014) Efficient Semiparametric Estimation in Generalized
Partially Linear Additive Models
for Longitudinal/Clustered Data, Bernoulli,
20, 141-163
Shang** and Cheng (2013) Local and Global
Asymptotic Inference in Smoothing Spline Models, Annals of Statistics, 41, 2608-2638.
In the suppl file, [27] is Kosorok, M. R. (2008), and [38] is Pinelis, I. (1994, AoP).
Cheng (2013). How Many Iterations are Sufficient for Efficient
Semiparametric Estimation?, Scandinavian
Journal of Statistics, 40, 592-618 (See Talk Slides)
Zhang, Cheng and Liu (2011) Linear or Nonlinear? Automatic Discovery for
Partially Linear Models, Journal of the American Statistical Association - Theory
& Methods, 106, 1099-1112
Cheng and Wang (2011),
Semiparametric Additive Transformation Models under
Current Status Data, Electronic
Journal of Statistics, 5, 1735-1764
Cheng and Huang (2010) Bootstrap
Consistency for General Semiparametric M-estimation Annals
of Statistics, 38, 2884-2915 (See Talk Slides)
Cheng (2009), Semiparametric
Additive Isotonic Regression Journal of Statistical Planning and
Inference, 139, 1980-1991
Cheng and Kosorok (2008), General
Frequentist Properties of the Posterior Profile Distribution Annals of
Statistics, 36, 1819-1853
Cheng and Kosorok (2008), Higher Order Semiparametric Frequentist Inference with the
Profile Sampler Annals of Statistics, 36, 1786-1818
Old Manuscripts not Intended for Publication :(
Bayes-optimal Classifiers under Group Fairness (with Zeng, X. and Dorbriban, E.)
A Generalization of Regularized Dual Averaging and Its Dynamics
(with Chao, S.-K.)
Enhanced Nearest Neighbor Classification for
(with Duan, J. and Qiao, X.)
Residual Bootstrap Exploration for Bandit Algorithms (with Wang, C., Yu, Y. and Hao, B.)
Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting (with Hu, T. and Shang, Z.)
Enhancing Multi-model Inference with Natural Selection (with Cheng, C.W.)
Stein Neural Sampler (with Hu et al) Github
Quadratic Discriminant Analysis under Moderate Dimension (with Yang, Q.)
Nonparametric Heterogeneity Testing For Massive Data (with Lu, J. and Liu, H.)
Bootstrapping High Dimensional Time
Series (with Zhang, X.) See Talk
Semiparametric Bernstein-von Mises Theorem: Second Order Studies (with Yang, Y. and Dunson, D.)
Non-Refereed Discussions
Chao** and Cheng (2016) Discussion on "Of quantiles and expectiles: con-
sistent scoring functions, Choquet representations and forecast rankings" by Werner
Ehm, Tilmann Gneiting, Alexander Jordan and Fabian Krger. Journal of the Royal Statistical Society: Series B (Statistical Methodology), To Appear
Leng and Cheng (2012) Discussion
on “Probabilistic Index Models” by Thas, Neve, Clement and Ottoy, Journal of the Royal Statistical Society: Series B (Statistical Methodology)
, 74, 661-662
Interdisciplinery Work
Liang, Cheng, Wilxon, and Balser (2011) An Absorbing Markov Chain Approach to Understanding the Microbial Role in Soil Carbon Stabilization. Biogeochemistry
, 106, 303-309
Talk Slides
T1. Bootstrapping High Dimensional Vector: Interplay between Dependence and
Dimensionality. Link (presented by my
co-author Zhang at SAMSI
T2. Nonparametric Inference in Functional Data. Link
(presented by my co-author Shang at SAMSI
T3. Nearest Neighbor Classifier with Optimal Stability. Link
(presented by my PhD student Sun
at ISBIS 2014 and SLDM Meeting)
T4. A Long March towards Joint Asymptotics: My 1st Steps…. Link
T5. Semiparametric Model Based
Bootstrap. Link
T6. Bootstrap Consistency for General Semiparametric M-Estimate. Link
T7.How Many Iterations are Sufficient
for Semiparametric Estimation? Link
T8. Inverse Problems in Semiparametric
Statistical Models. Link