Selected Publications & Manuscripts under Review
Manuscripts patiently awaiting publication :-)
-
Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility (with Li, Y. and Dai, X.)
-
CTSyn: A Foundational Model for Cross Tabular Data Generation (with Lin, X., Xu, C. and Yang, M.)
-
TimeAutoDiff: Combining Autoencoder and Diffusion
model for time series tabular data synthesizing (with Suh, N. et al)
-
Watermarking Generative Tabular Data (with He, H., Yu, P., Ren, J. and Wu, Y.N.)
-
BadGD: A unified data-centric framework to identify gradient descent vulnerabilities (with Wang, C.-H.)
-
Latent Energy-Based Odyssey: Black-Box
Optimization via Expanded Exploration in the
Energy-Based Latent Space (with Yu, P. et al)
-
Discriminative Estimation of Total Variation Distance:
A Fidelity Auditor for Generative Data (with Tao, L., Xu, S., Wang, C.-H. and Suh, N.)
-
Minimax Optimal Fair Classification
with Bounded Demographic Disparity (with Zeng, X. and Dobriban, E.)
-
Approximation of RKHS Functionals by Neural Networks (with Zhou, T., Suh, N. and Huo, X.)
-
Rate-Optimal Rank Aggregation with Private Pairwise
Rankings (with Xu, S. and Sun, W.W.)
-
Bayes-Optimal Fair Classification with Linear Disparity Constraints
via Pre-, In-, and Post-processing (with Zeng, X. and Dorbriban, E.)
-
Benefits of Transformer: In-Context Learning in Linear Regression Tasks with
Unstructured Data (with Xing, Y., Lin, X., Sub, N. and Song, Q.)
-
MissDiff: Training Diffusion Models on Tabular Data
with Missing Values (with Ouyang, Y., Xie, L. and Li, C.)
-
Utility Theory of Synthetic Data Generation (with Xu, S. and Sun, W.-W.)
-
Ranking Differential Privacy (with Xu, S. and Sun, W.W.)
-
Differentially Private Bootstrap: New Privacy Analysis and Inference
Strategies (with Wang, Z. and Awan, J.)
-
Dynamic Matching Bandit For Two-Sided Online
Markets (with Li, Y., Wang, C.-H., and Sun, W.)
-
Attention Enables Zero Approximation Error
(with Fang, Z., Ouyang, Y. and Zhou, D.)
-
High-Dimensional Inference over Networks: Linear Convergence and Statistical Guarantees (with Sun, Y., Maros, M. and Scutari, G.)
-
On Deep Instrumental Variables Estimate (with Liu, R. and Shang, Z.)
Selected Publications *: former/current PhD Student; **: former/current Postdoc
-
Ouyang*, Xie, Zha and Cheng (2024)
Transfer Learning for Diffusion Models , NeurIPS
-
Suh** and Cheng (2024)
A Survey on Statistical Theory of Deep Learning:
Approximation, Training Dynamics, and Generative Models , Annual Review of Statistics and Its Application
-
Xia, Wang**, Mabry and Cheng (2024)
Advancing Retail Data Science:
Comprehensive Evaluation of Synthetic Data, KDD Workshop on GenAI Evaluation
-
Ward*, Wang** and Cheng (2024)
Data Plagiarism Index: Characterizing the
Privacy Risk of Data-Copying in
Tabular Generative Models , KDD Workshop on GenAI Evaluation
-
Li*, Cheng and Dai (2024)
Two-sided Competing Matching Recommendation Markets With Quota and
Complementary Preferences Constraints , ICML
- Ward*, Zeng** and Cheng (2024)
FairRR: Pre-Processing for Group Fairness through Randomized Response , AISTATS
- Xing*, Lin*, Song, Xu, Zeng and Cheng (2024)
Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective , AISTATS
- Hsieh, Wang** and Cheng (2023) Improve Fidelity and Utility of Synthetic Credit Card Transaction Time Series from
Data-centric Perspective, ACM International Conference on AI in Finance -- Workshop
- Wang*, Wang*, Sun and Cheng (2023)
Online Regularization towards
Always-Valid High-Dimensional Dynamic Pricing , Journal of the American Statistical Association -- T&M
- Xu**, Wang**, Sun and Cheng (2023) Binary Classification under Local Label Differential Privacy
Using Randomized Response Mechanisms, Transactions on Machine Learning Research
- Suh**, Lin*, Hsieh, Honarkhah and Cheng (2023)
AutoDiff: combining Auto-encoder and Diffusion
model for tabular data synthesizing, NeurIPS Workshop on SyntheticData4ML
-
Li*, Wang**, Cheng and Song (2023) Optimum-statistical Collaboration Towards General and
Efficient Black-box Optimization, Transactions on Machine Learning Research
- Ouyang, Xie and Cheng (2023) Improving Adversarial Robustness by Contrastive Guided Diffusion Process ICML
- Li, Wang** and Cheng (2023) Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms ICLR
- Ning and Cheng (2023)
Sparse Confidence Sets for Normal Mean Models Information and Inference: A Journal of the IMA
-
Fang** and Cheng (2023) Optimal Learning Rates of Deep Convolutional Neural Networks: Additive Ridge Functions, Transactions on Machine Learning Research
- Zeng**, Dorbriban and Cheng (2022) Fair Bayes-Optimal Classifiers under Predictive Parity NeurIPS
- Cheng, Wang**, Potluru, Balch and Cheng (2022) Downstream Task-Oriented Generative Model Selections on
Synthetic Data Training for Fraud Detection Models ACM International Conference on AI in Finance -- Workshop
- Xing*, Song and Cheng (2022) Phase Transition from Clean Training to Adversarial Training NeurIPS
- Xing*, Song and Cheng (2022) Why Do Artificially Generated Data Help Adversarial Robustness? NeurIPS
-
Ramprasad*, Li*, Yang, Wang, Sun, and Cheng (2022) Online Bootstrap Inference For Policy Evaluation In Reinforcement Learning JASA -- T&M
-
Li*, Wang*, Zhang and and Cheng (2022) Variance Reduction on General Adaptive Stochastic Mirror Descent, Machine Learning Journal (short version accepted in NeurIPS OPT Workshop)
-
Song and Cheng (2022) Optimal False Discovery Control of Minimax Estimator Bernoulli
-
Yu*, Chao**, and Cheng (2022)
Distributed Bootstrap for Simultaneous Inference under High Dimensionality,
JMLR
-
Wu*, Wang*, Li* and Cheng (2022)
Residual Bootstrap Exploration for Stochastic Linear Bandit,
UAI
-
Xing*, Song and Cheng (2022) Benefit of Interpolation in Nearest Neighbor Algorithms, SIAM Journal on Mathematics of Data Science
-
Huang, Huang, Yang** and Cheng (2022) Power Iteration for Tensor PCA, JMLR
-
Xing*, Song and Cheng (2022) Unlabelled Data Help: Minimax Analysis and Adversarial Robustness, AISTATS
-
Xing*, Song and Cheng (2021) On the Algorithmic Stability of Adversarial Training, NeurIPS
-
Liu*, Yang, Shang** and Cheng (2021) Nonparametric Testing under Random Projection, IEEE Transactions on Pattern Analysis and Machine Intelligence Talk Slides
-
Li*, Wang* and Cheng (2021) Online Forgetting Process for Linear Regression Models, AISTATS
-
Xing*, Song and Cheng (2021) On the Generalization Properties of Adversarial
Training, AISTATS
-
Hu*, Wang, Lin and Cheng (2021) Regularization Matters: A Nonparametric Perspective on Overparametrized Neural Network , AISTATS
-
Xing*, Zhang and Cheng (2021) Adversarially Robust Estimate and Risk Analysis in Linear Regression, AISTATS
-
Xing*, Song and Cheng (2021) Predictive Power of Nearest Neighbors Algorithm under Random Perturbation, AISTATS
-
Chen, Wan, Cai and and Cheng (2020) Machine Learning in/for Blockchain: Future and Challenges, Canadian Journal of Statistics
-
Chao**, Wang*, Xing* and and Cheng (2020) Directional Pruning of Deep Neural Networks, NeurIPS [code can be found in Github; theory is based on this work]
-
Bai*, Song and and Cheng (2020) Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee, NeurIPS
-
Duan*, Qiao and and Cheng (2020) Statistical Guarantees of Distributed Nearest Neighbor Classification Talk Slides, NeurIPS
-
Guo and and Cheng (2020) Moderate-Dimensional Inferences on Quadratic Functionals in Ordinary Least Squares
, JASA-T&M R Package: MDOLS
-
Yu*, Chao** and Cheng (2020) Simultaneous Inference for Massive Data: Distributed Bootstrap, ICML
-
Cheng*, Qiao** and Cheng (2020) Mutual Transfer Learning for Massive Data, ICML
-
Yang, Shang** and Cheng (2020)
Non-asymptotic Theory for Nonparametric Testing, COLT, Talk Slides
- Zheng** and Cheng (2020) Finite Time Analysis of Vector Autoregressive Models under Linear Restrictions, Biometrika, Talk Slides
- Hao*, Zhang and Cheng (2020) Sparse and Low-rank Tensor Estimation via Cubic Sketchings, IEEE-Information Theory, a short version published in AISTATS.
- Wang* and Cheng (2020) Online Batch Decision-Making with High-Dimensional Covariates, AISTATS
-
Liu*, Shang** and Cheng (2020) Nonparametric Distributed Learning under General Designs, Electronic Journal of Statistics
- Hao*, Abbasi-Yadkori, Wen and Cheng (2019) Bootstrapping Upper Confidence Bound, NeurIPS
- Shang**, Hao* and Cheng (2019) Nonparametric Bayesian Aggregation for Massive Data, Journal of Machine Learning Research Talk Slides
- Qiao, Duan* and Cheng (2019) Rates of Convergence for Large-scale Nearest Neighbor Classification, NeurIPS
-
Liu*, Shang** and Cheng (2019) Sharp Theoretical Analysis for Nonparametric Testing under Random Projection , COLT
-
Zhu, Yu* and Cheng (2019) High Dimensional Inference in Partially Linear Models, AISTATS
-
Lyu, Sun*, Wang, Liu, Yang and Cheng (2019) Tensor Graphical Model: Non-convex Optimization and
Statistical Inference, IEEE-Transactions on Pattern Analysis and Machine Intelligence.
-
Liu* and Cheng (2018) Early Stopping for Nonparametric Testing, NIPS Poster
-
Xu, Shang** and Cheng (2018) Optimal Tuning for Divide-and-Conquer Kernel Ridge Regression with Massive Data, ICML (oral), 80:5479-5487. An extended version published in Journal of Computational and Graphical Statistics.
-
Volgushev, Chao** and Cheng (2018) Distributed Inference for Quantile Regression Processes, Annals of Statistics, To Appear. Talk Slides
-
Yu*, Levine and Cheng (2018) Minimax Optimal Estimation in Partially Linear
Additive Models under High Dimension, Bernoulli, To Appear.
-
Li, Cheng, Fan and Wang (2018) Embracing Blessing of Dimensionality in Factor Models, Journal of the American Statistical Association - Theory & Methods, 113, 380-389
-
Hao*, Sun*, Liu and Cheng (2018) Simultaneous Clustering and Estimation of Heterogeneous Graphical Models, Journal of Machine Learning Research, 18(217):1−58.
-
Shang** and Cheng (2017) Gaussian Approximation of General Nonparametric Posterior Distributions, Information and Inference, To Appear. In memory of Prof. Jayanta Ghosh
-
Shang** and Cheng (2017) Computational Limits of a Distributed Algorithm for Smoothing Spline, Journal of Machine Learning Research, 18(108):1−37. Poster
-
Zhang and Cheng (2017) Gaussian Approximation for High Dimensional Vector under Physical Dependence, Bernoulli, To Appear
-
Chao**, Vogushev and Cheng (2017) Quantile Processes for Semi and Nonparametric Regression, Electronic Journal of Statistics, 11, 3272 - 3331
-
Zhang and Cheng (2017) Simultaneous Inference for High-Dimensional Linear Models, Journal of the American Statistical Association - Theory & Methods, 112, 757-768. R Package: SILM
-
Sun*, Lu, Liu and Cheng (2017) Provable Sparse Tensor Decomposition, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 899–916
-
Sun*, Qiao and Cheng (2016) Stabilized Nearest Neighbor Classifier and Its Statistical Properties, Journal of the American Statistical Association - Theory & Methods, 111, 1254-1265
-
Minsker, Zhao and Cheng (2016) Active Clinical Trials for
Personalized Medicine, Journal of the American Statistical Association - Theory & Methods, 111, 875-887
-
Zhao, Cheng and Liu (2016) A Partially Linear Framework for Massive Heterogeneous Data. Annals of Statistics, 44, 1400-1437. See Talk Slides, Full Manuscript
-
Pati, Bhattacharya and Cheng (2015) Optimal Bayesian estimation in random covariate design
with a rescaled Gaussian process prior, Journal of Machine Learning Research, 16, 2837−2851
-
Sun*, Wang, Liu and Cheng (2015) Non-Convex Statistical Optimization for Sparse Tensor Graphical Model, NIPS (Acceptance Rate: 21.9%).
-
Shang** and Cheng (2015) Nonparametric
Inference in Generalized Functional Linear Models Annals of Statistics, 43, 1742-1773 (See Talk Slides)
-
Cheng and Shang** (2015) Joint Asymptotics for Semi-Nonparametric
Regression Models under Partially Linear Structure, Annals of Statistics, 43, 1351-1390 (See Talk Slides)
-
Cheng, Zhang and Shang** (2015) Sparse
and Efficient Estimation for Partial Spline Models with Increasing Dimension,
Annals of Institute of Statistical Mathematics, 67, 93-127
-
Cheng (2015) Moment Consistency of the Exchangeably
Weighted Bootstrap for Semiparametric M-Estimation,
Scandinavian Journal of Statistics, 42, 665-684
-
Cheng, Zhou and Huang (2014) Efficient Semiparametric Estimation in Generalized
Partially Linear Additive Models
for Longitudinal/Clustered Data, Bernoulli,
20, 141-163
-
Shang** and Cheng (2013) Local and Global
Asymptotic Inference in Smoothing Spline Models, Annals of Statistics, 41, 2608-2638.
In the suppl file, [27] is Kosorok, M. R. (2008), and [38] is Pinelis, I. (1994, AoP).
-
Cheng (2013). How Many Iterations are Sufficient for Efficient
Semiparametric Estimation?, Scandinavian
Journal of Statistics, 40, 592-618 (See Talk Slides)
-
Zhang, Cheng and Liu (2011) Linear or Nonlinear? Automatic Discovery for
Partially Linear Models, Journal of the American Statistical Association - Theory
& Methods, 106, 1099-1112
-
Cheng and Wang (2011),
Semiparametric Additive Transformation Models under
Current Status Data, Electronic
Journal of Statistics, 5, 1735-1764
-
Cheng and Huang (2010) Bootstrap
Consistency for General Semiparametric M-estimation Annals
of Statistics, 38, 2884-2915 (See Talk Slides)
-
Cheng (2009), Semiparametric
Additive Isotonic Regression Journal of Statistical Planning and
Inference, 139, 1980-1991
-
Cheng and Kosorok (2008), General
Frequentist Properties of the Posterior Profile Distribution Annals of
Statistics, 36, 1819-1853
-
Cheng and Kosorok (2008), Higher Order Semiparametric Frequentist Inference with the
Profile Sampler Annals of Statistics, 36, 1786-1818
Old Manuscripts not Intended for Publication :(
-
Bayes-optimal Classifiers under Group Fairness (with Zeng, X. and Dorbriban, E.)
-
A Generalization of Regularized Dual Averaging and Its Dynamics
(with Chao, S.-K.)
-
Enhanced Nearest Neighbor Classification for
Crowdsourcing
(with Duan, J. and Qiao, X.)
-
Residual Bootstrap Exploration for Bandit Algorithms (with Wang, C., Yu, Y. and Hao, B.)
-
Sharp Rate of Convergence for Deep Neural Network Classifiers under the Teacher-Student Setting (with Hu, T. and Shang, Z.)
-
Enhancing Multi-model Inference with Natural Selection (with Cheng, C.W.)
-
Stein Neural Sampler (with Hu et al) Github
-
Quadratic Discriminant Analysis under Moderate Dimension (with Yang, Q.)
-
Nonparametric Heterogeneity Testing For Massive Data (with Lu, J. and Liu, H.)
-
Bootstrapping High Dimensional Time
Series (with Zhang, X.) See Talk
Slides
-
Semiparametric Bernstein-von Mises Theorem: Second Order Studies (with Yang, Y. and Dunson, D.)
Non-Refereed Discussions
Chao** and Cheng (2016) Discussion on "Of quantiles and expectiles: con-
sistent scoring functions, Choquet representations and forecast rankings" by Werner
Ehm, Tilmann Gneiting, Alexander Jordan and Fabian Krger. Journal of the Royal Statistical Society: Series B (Statistical Methodology), To Appear
Leng and Cheng (2012) Discussion
on “Probabilistic Index Models” by Thas, Neve, Clement and Ottoy, Journal of the Royal Statistical Society: Series B (Statistical Methodology)
, 74, 661-662
Interdisciplinery Work
Liang, Cheng, Wilxon, and Balser (2011) An Absorbing Markov Chain Approach to Understanding the Microbial Role in Soil Carbon Stabilization. Biogeochemistry
, 106, 303-309
Talk Slides
-
T1. Bootstrapping High Dimensional Vector: Interplay between Dependence and
Dimensionality. Link (presented by my
co-author Zhang at SAMSI
workshop)
-
T2. Nonparametric Inference in Functional Data. Link
(presented by my co-author Shang at SAMSI
workshop)
-
T3. Nearest Neighbor Classifier with Optimal Stability. Link
(presented by my PhD student Sun
at ISBIS 2014 and SLDM Meeting)
-
T4. A Long March towards Joint Asymptotics: My 1st Steps…. Link
-
T5. Semiparametric Model Based
Bootstrap. Link
-
T6. Bootstrap Consistency for General Semiparametric M-Estimate. Link
-
T7.How Many Iterations are Sufficient
for Semiparametric Estimation? Link
-
T8. Inverse Problems in Semiparametric
Statistical Models. Link
|