Statistics Seminar Series

Choose which semester to display:

Schedule for Spring 2023

Seminars are on Mondays
Time: 4:00pm - 5:00pm

Location: Room 903 SSW, 1255 Amsterdam Avenue



Krishna Balasubramanium (UC Davis)

Title: Recent Advances in Non-log-concave and Heavy-tailed Sampling

Abstract: This talk will be about recent advances in the complexity of sampling from non-log-concave and heavy-tailed densities. Taking motivation from the theory of non-convex optimization, first, a framework for establishing the iteration complexity of sampling of the Langevin Monte Carlo (LMC) when the non-log-concave target density satisfies only the relatively milder Holder-smoothness assumption will be discussed. In particular, this approach yields a new state-of-the-art guarantee for sampling with LMC from distributions which satisfy a Poincare inequality. Next, the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Ito diffusions associated with weighted Poincare inequalities will be discussed. Based on a mean-square analysis, we obtain the iteration complexity in the Wasserstein-2 metric for sampling from a class of heavy-tailed target distributions. Our approach takes the mean-square analysis to its limits, i.e., we invariably only require that the target density has finite variance, the minimal requirement for a mean-square analysis.

Bio:  Krishna Balasubramanian is an assistant professor in the Department of Statistics, University of California, Davis. His research interests include stochastic optimization and sampling, geometric and topological statistics, and theoretical machine learning. His research was/is supported by a Facebook PhD fellowship, and CeDAR and NSF grants.


Johannes Wiesel (Columbia)

Title: The out-of-sample prediction error of the square-root lasso and related estimators
Abstract: We study the classical problem of predicting an outcome variable, Y, using a linear combination of a d-dimensional covariate vector, X. We are interested in linear predictors whose coefficients solve: inf_β (E[(Y - < β, X >)^r])^(1/r) +  || β ||, where r >1 and  > 0 is a regularization parameter. We provide conditions under which linear predictors based on these estimators minimize the worst-case prediction error over a ball of distributions determined by a type of max-sliced Wasserstein metric. A detailed analysis of the statistical properties of this metric yields a simple recommendation for the choice of regularization parameter. The suggested order of after a suitable normalization of the covariates, is typically d/n, up to logarithmic factors. Our recommendation is computationally straightforward to implement, pivotal, has provable out-of-sample performance guarantees, and does not rely on sparsity assumptions about the true data generating process. 
This is joint work with Jose Montiel Olea, Amilcar Velez and Cindy Rush.
BioJohannes Wiesel is an Assistant Professor in the Department of Statistics at Columbia University. In summer 2020 he received a PhD from Oxford University under the supervision of Jan Obloj. His research focuses on mathematical statistics with a special emphasis on statistical optimal transport. He is also interested in the robust approach to mathematical finance, which does not start with an a priori model but rather with the information available in the markets. In this context he has established new connections to the theory of optimal transport on the one hand and robust statistics as well as machine learning on the other, with the ultimate goal to develop a universal toolbox for the implementation of robust and time-consistent trading strategies and risk assessment.


Yun Yang (UIUC)

Title: Implicit estimation of high-dimensional distributions using generative models 
Abstract: The estimation of distributions of complex objects from high-dimensional data with low-dimensional structures is an important topic in statistics and machine learning. Deep generative models achieve this by encoding and decoding data to generate synthetic realistic images and texts. A key aspect of these models is the extraction of low-dimensional latent features, assuming data lies on a low-dimensional manifold. We study this by developing a minimax framework for distribution estimation on unknown submanifolds with smoothness assumptions on the target distribution and the manifold. The framework highlights how problem characteristics, such as intrinsic dimensionality and smoothness, impact the limits of high-dimensional distribution estimation. Our estimator, which is a mixture of locally fitted generative models, is motivated by differential geometry techniques and covers cases where the data manifold lacks a global parametrization.
Bio: Yun Yang is an associate professor in the Department of Statistics, University of Illinois Urbana-Champaign. His research interests include Bayesian inference, high dimensional statistics, optimal transport and statistical learning theory. His research was/is supported by NSF grants.


Edward Kennedy (CMU)

Title: Optimal estimation of heterogeneous causal effects
Abstract: Estimation of heterogeneous causal effects – i.e., how effects of policies and treatments vary across units – is fundamental to medical, social, and other sciences, and plays a crucial role in optimal treatment allocation, generalizability, subgroup effects, and more. Many methods for estimating conditional average treatment effects (CATEs) have been proposed in recent years, but there have remained important theoretical gaps in understanding if and when such methods make optimally efficient use of the data at hand. This is especially true when the CATE has nontrivial structure (e.g., smoothness or sparsity). This talk surveys work across two recent papers in this context. First, we study a two-stage doubly robust estimator and give a generic model-free error bound, which, despite its generality, yields sharper results than those in the current literature. The second contribution is aimed at understanding the fundamental statistical limits of CATE estimation. We resolve this long-standing problem by deriving a minimax lower bound, with matching upper bound obtained via a new estimator based on higher order influence functions. Applications in medicine and political science are considered.
Bio: Edward Kennedy is an associate professor of Statistics & Data Science at Carnegie Mellon University. He joined the department after graduating with a PhD in biostatistics from the University of Pennsylvania. Edward's methodological interests lie at the intersection of causal inference, machine learning, and nonparametric theory, especially in settings involving high-dimensional and otherwise complex data. His applied research focuses on problems in criminal justice, health services, medicine, and public policy. Edward is a recipient of the NSF CAREER award, the David P. Byar Young Investigator award, and the Thomas Ten Have Award for exceptional research in causal inference.


Jiayang Sun (George Mason University)

Title: Semi-parametric Learning for Explainable Models -Triathlon

Abstract: Feature selection is critical for developing drug targets or understanding reproductive success at high altitudes. However, selected features depend on the model assumption used for feature selection. Determining variable transformations to make the model more realistic or interpretable is not trivial in the case of many features or variables. This talk presents our advance toward a semi-parametric learning pipeline to study feature, transformation, and model selection in a “triathlon.” We introduce a concept of transformation necessity-sufficiency guarantee,  open up dialogues for paradigm changes, provide our learning procedure for explainable models, illustrate its performance, and demonstrate its application in understanding social, physiological, and genetic contributions to reproductive success of Tibetan women. This is the joint work with Shenghao Ye, Cynthia Beall, and Mary Meyer.


Jiayang Sun currently holds the positions of Professor, Chair, and Bernard Dunn Eminent Scholar at the Department of Statistics at George Mason University. Before joining GMU, she was Professor of the Department of Population and Quantitative Health Sciences and the Director of the Center for Statistical Research, Computing and Collaboration (SR2c), Case Western Reserve University.  She has published in top statistical and computational journals, including AOS, JASA, AOP, Biometrika, Statistica Sinica, Biometrical Journal, Statistics in Medicine, JCGS and SIAM J Sci. & Stat. Comp, and other statistical and scientific journals. Her statistical research has included simultaneous confidence bounds, multiple comparisons, biased sampling, measurement errors, mixtures, machine learning, causal Inference, crowdsourcing, EHR, text mining, bioinformatics, network analysis, imaging,  high-dimensional and big data. Her interdisciplinary work has included cancer, environmental science, neuroscience, wound care, dental, and biomaterial, in addition to astronomy, computer science, energy, law, and agriculture. She is an elected Fellow of ASA, IMS, and an elected member of ISI, and an active member of the profession, having been the 2016 President of CWS, and on various committees in the ASA, IMS, CWS, ICSA and ISI. Her work has been supported by awards from the NSF, NIH, NSA, DOD, DOE, VA, ASA, and INOVA.


Speaker: David Banks (Duke University)
Title: Statistics and the Industries of Today
Abstract: Dr. Deming was one of the foundational leaders in industrial statistics, with contributions to experimental design, sampling, and process control.   More importantly, he changed the culture of business leadership in two nations, and implicitly, around the world.  But the industries of his day focused on manufacturing, while today’s industries reflect the knowledge economy.  This talk asks the industrial statistics community to consider how to update and apply Dr. Deming’s ideas in the Big Data era.  There are some very direct correspondences. 
Bio: David Banks is a professor in the Department of Statistical Science at Duke University.  He obtained his PhD in statistics from Virginia Tech in 1984, then did a two-year postdoctoral fellowship at the University of California at Berkeley.  He taught at Carnegie Mellon University and the University of Cambridge, was chief statistician at the U.S. Department of Transportation, and also worked at the National Institute of Standards and Technology and at the Food and Drug Administration.  He is a former editor of the Journal of the American Statistical Association and a founding editor of Statistics and Public Policy.  He is the former director of the Statistical and Applied Mathematical Sciences Institute.  He works in risk analysis, dynamic models for text networks, human rights statistics, and some aspects of machine learning.


Speaker: Steve Hanneke (Purdue University)

Title: A Theory of Universal Learning

Abstract: How quickly can functions in a given function class be learned from data? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the classification risk (or "error rate") as a function of the number of training samples. However, the classical theoretical framework for understanding optimal rates of convergence of the risk in statistical learning, rooted in the works of Vapnik-Chervonenkis and Valiant (known as the PAC model), does not explain the behavior of learning curves: rather, it focuses on minimax analysis, which can only provide an upper envelope of the learning curves over a family of data distributions, and the "optimal" rate is the smallest such upper envelope achievable. This does not match the practice of machine learning, where in any given scenario, the data source is typically fixed, while the number of training samples may be chosen as needed.

In this talk, I will describe an alternative framework that better captures such practical aspects of machine learning, but still gives rise to a complete theory of optimal learning rates in the spirit of the PAC model. Namely, we consider the problem of universal learning, which aims to understand the convergence rates achievable on every data distribution, without requiring uniformity of the guarantee over distributions for each sample size. In regard to supervised learning, the main result of this work is a remarkable trichotomy: there are only three possible optimal rates of universal learning. More precisely, we show that the learning curves of any given function class decay either at exponential, linear, or arbitrarily slow rates, under the realizability assumption. Moreover, each of these cases is completely characterized by appropriate combinatorial dimensions, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. Allowing for non-realizable (so-called "agnostic") distributions, essentially the same trichotomy remains, with the linear rate replaced by sub-square-root rates.

In recent extensions, we have also characterized the optimal universal rates for multiclass learning, general interactive learning, active learning with label queries, semi-supervised learning, and several other variations. In the course of these works, some general principles have emerged regarding the design of optimal learning algorithms based on winning strategies for certain infinite sequential games (Gale-Stewart games), which are used to define data-dependent partial function classes whose minimax rates match the optimal universal rate for the original function class. The corresponding combinatorial dimensions determine the existence of such winning strategies, and reflect a fascinating blending of familiar dimensions from the classical theories of statistical learning and adversarial online learning.

Based on joint work with Olivier Bousquet, Shay Moran, Ramon van Handel, and Amir Yehudayoff, which appeared at STOC 2021, and various follow-up works (in preparation) with the aforementioned authors, as well as Idan Attias, Ariel Avital, Klim Efremenko, Alkis Kalavasis, Amin Karbasi, Amirreza Shaeiri, Jonathan Shafer, Ilya Tolstikhin, Grigoris Velegkas, and Qian Zhang.


Spring Break


Mouli Banerjee (University of Michigan)

Title: Tackling Posterior Drift via Linear Adjustments and Exponential Tilts

Abstract: I will speak on some of our recent work on transfer learning from a source to a target population in the presence of `posterior drift': i.e. the regression function/Bayes classifier in the target population is different from that in the source. In the situation where labeled samples from the target domain are available, by modeling the posterior drift through a linear adjustment (on an appropriately transformed scale), we are able to learn the nature of the posterior drift using relatively few samples from the target population as compared to the source population, which provides an abundance of samples. The other (semi-supervised) case, where labels from the target are unavailable, is addressed by connecting the probability distribution in the target domain to that in the source domain via an exponential family formulation, and learning the corresponding parameters. Both approaches are motivated by ideas originating in classical statistics.

I will present theoretical guarantees for these procedures as well as applications to real data from the UK Biobank study (mortality prediction) and the Waterbirds dataset (image classification).
This is joint work primarily with Subha Maity and Yuekai Sun.

Bio: Moulinath Banerjee was born and raised in India where he completed both his Bachelors and Masters in Statistics at the Indian Statistical Institute, Kolkata. He obtained his Ph.D. from the Statistics department at University of Washington, Seattle, in December 2000, served as lecturer there for Winter and Spring quarters, 2001, and joined University of Michigan in Fall 2001. Mouli's research interests are in the fields of non-standard problems, empirical process theory, threshold and boundary estimation, and more recently, distributed estimation and inference, transfer learning and distributional shift, and problems at the Stat-ML interface. He is currently the editor of Statistical Science. He also has a broad range of interests outside of statistics including classical music, literature, history, philosophy, physics and ancestral genetics, and is also, most emphatically, a gourmet and believes that a life without good food and fine beverages is a life less lived.


Qingyuan Zhao (University of Cambridge)

Title: Simultaneous hypothesis testing using negative controls

Abstract: Negative control is a common technique in scientific investigations and broadly refers to the situation where a null effect (“negative result”) is expected. Motivated by a real proteomic dataset and an ad hoc procedure shared with us by collaborators, I will present three promising and closely connected ways of using negative controls to assist simultaneous hypothesis testing. The first perspective uses negative controls to construct a permutation p-value for every hypothesis under investigation, and we give several sufficient conditions for such p-values to be valid and positive regression dependent on the set (PRDS) of true nulls. The second perspective uses negative controls to construct an estimate of the false discovery rate (FDR). We give a sufficient condition under which the step-up procedure based on this estimate controls the FDR. The third perspective, derived from the original ad hoc procedure given by our collaborators, uses negative controls to construct a nonparametric estimator of the local false discovery rate. I will conclude the talk with a twist. This talk is based on joint work with Zijun Gao.

Short bio: Qingyuan Zhao is an Assistant Professor at the Statistical Laboratory, University of Cambridge. He is interested in improving the general quality and appraisal of statistical research, including new methodology and a better understanding of causal inference, novel study designs, sensitivity analysis, multiple testing, and selective inference.


Yi Yu (University of Warwick)








5/1/23 Heather Battey (Imperial College London)