Schedule for Spring 2023
Seminars are on Mondays
Time: 4:00pm - 5:00pm
Location: Room 903 SSW, 1255 Amsterdam Avenue
1/23/23
|
Krishna Balasubramanium (UC Davis) Title: Recent Advances in Non-log-concave and Heavy-tailed Sampling Abstract: This talk will be about recent advances in the complexity of sampling from non-log-concave and heavy-tailed densities. Taking motivation from the theory of non-convex optimization, first, a framework for establishing the iteration complexity of sampling of the Langevin Monte Carlo (LMC) when the non-log-concave target density satisfies only the relatively milder Holder-smoothness assumption will be discussed. In particular, this approach yields a new state-of-the-art guarantee for sampling with LMC from distributions which satisfy a Poincare inequality. Next, the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Ito diffusions associated with weighted Poincare inequalities will be discussed. Based on a mean-square analysis, we obtain the iteration complexity in the Wasserstein-2 metric for sampling from a class of heavy-tailed target distributions. Our approach takes the mean-square analysis to its limits, i.e., we invariably only require that the target density has finite variance, the minimal requirement for a mean-square analysis. Bio: Krishna Balasubramanian is an assistant professor in the Department of Statistics, University of California, Davis. His research interests include stochastic optimization and sampling, geometric and topological statistics, and theoretical machine learning. His research was/is supported by a Facebook PhD fellowship, and CeDAR and NSF grants. |
1/30/23 |
Johannes Wiesel (Columbia) Title: The out-of-sample prediction error of the square-root lasso and related estimators
Abstract: We study the classical problem of predicting an outcome variable, Y, using a linear combination of a d-dimensional covariate vector, X. We are interested in linear predictors whose coefficients solve: inf_β (E[(Y - < β, X >)^r])^(1/r) + || β ||, where r >1 and > 0 is a regularization parameter. We provide conditions under which linear predictors based on these estimators minimize the worst-case prediction error over a ball of distributions determined by a type of max-sliced Wasserstein metric. A detailed analysis of the statistical properties of this metric yields a simple recommendation for the choice of regularization parameter. The suggested order of , after a suitable normalization of the covariates, is typically d/n, up to logarithmic factors. Our recommendation is computationally straightforward to implement, pivotal, has provable out-of-sample performance guarantees, and does not rely on sparsity assumptions about the true data generating process.
This is joint work with Jose Montiel Olea, Amilcar Velez and Cindy Rush.
Bio: Johannes Wiesel is an Assistant Professor in the Department of Statistics at Columbia University. In summer 2020 he received a PhD from Oxford University under the supervision of Jan Obloj. His research focuses on mathematical statistics with a special emphasis on statistical optimal transport. He is also interested in the robust approach to mathematical finance, which does not start with an a priori model but rather with the information available in the markets. In this context he has established new connections to the theory of optimal transport on the one hand and robust statistics as well as machine learning on the other, with the ultimate goal to develop a universal toolbox for the implementation of robust and time-consistent trading strategies and risk assessment.
|
2/6/23 |
Yun Yang (UIUC) Title: Implicit estimation of high-dimensional distributions using generative models
Abstract: The estimation of distributions of complex objects from high-dimensional data with low-dimensional structures is an important topic in statistics and machine learning. Deep generative models achieve this by encoding and decoding data to generate synthetic realistic images and texts. A key aspect of these models is the extraction of low-dimensional latent features, assuming data lies on a low-dimensional manifold. We study this by developing a minimax framework for distribution estimation on unknown submanifolds with smoothness assumptions on the target distribution and the manifold. The framework highlights how problem characteristics, such as intrinsic dimensionality and smoothness, impact the limits of high-dimensional distribution estimation. Our estimator, which is a mixture of locally fitted generative models, is motivated by differential geometry techniques and covers cases where the data manifold lacks a global parametrization.
Bio: Yun Yang is an associate professor in the Department of Statistics, University of Illinois Urbana-Champaign. His research interests include Bayesian inference, high dimensional statistics, optimal transport and statistical learning theory. His research was/is supported by NSF grants.
|
2/13/23 |
Edward Kennedy (CMU) Title: Optimal estimation of heterogeneous causal effects
Abstract: Estimation of hetero
Bio: Edward Kennedy is an associate professor of Statistics & Data Science at Carnegie Mellon University. He joined the department after graduating with a PhD in biostatistics from the University of Pennsylvania. Edward's methodological interests lie at the intersection of causal inference, machine learning, and nonparametric theory, especially in settings involving high-dimensional and otherwise complex data. His applied research focuses on problems in criminal justice, health services, medicine, and public policy. Edward is a recipient of the NSF CAREER award, the David P. Byar Young Investigator award, and the Thomas Ten Have Award for exceptional research in causal inference.
|
2/20/23 |
Jiayang Sun (George Mason University) Title: Semi-parametric Learning for Explainable Models -Triathlon Abstract: Feature selection is critical for developing drug targets or understanding reproductive success at high altitudes. However, selected features depend on the model assumption used for feature selection. Determining variable transformations to make the model more realistic or interpretable is not trivial in the case of many features or variables. This talk presents our advance toward a semi-parametric learning pipeline to study feature, transformation, and model selection in a “triathlon.” We introduce a concept of transformation necessity-sufficiency guarantee, open up dialogues for paradigm changes, provide our learning procedure for explainable models, illustrate its performance, and demonstrate its application in understanding social, physiological, and genetic contributions to reproductive success of Tibetan women. This is the joint work with Shenghao Ye, Cynthia Beall, and Mary Meyer. Bio: Jiayang Sun currently holds the positions of Professor, Chair, and Bernard Dunn Eminent Scholar at the Department of Statistics at George Mason University. Before joining GMU, she was Professor of the Department of Population and Quantitative Health Sciences and the Director of the Center for Statistical Research, Computing and Collaboration (SR2c), Case Western Reserve University. She has published in top statistical and computational journals, including AOS, JASA, AOP, Biometrika, Statistica Sinica, Biometrical Journal, Statistics in Medicine, JCGS and SIAM J Sci. & Stat. Comp, and other statistical and scientific journals. Her statistical research has included simultaneous confidence bounds, multiple comparisons, biased sampling, measurement errors, mixtures, machine learning, causal Inference, crowdsourcing, EHR, text mining, bioinformatics, network analysis, imaging, high-dimensional and big data. Her interdisciplinary work has included cancer, environmental science, neuroscience, wound care, dental, and biomaterial, in addition to astronomy, computer science, energy, law, and agriculture. She is an elected Fellow of ASA, IMS, and an elected member of ISI, and an active member of the profession, having been the 2016 President of CWS, and on various committees in the ASA, IMS, CWS, ICSA and ISI. Her work has been supported by awards from the NSF, NIH, NSA, DOD, DOE, VA, ASA, and INOVA.
|
2/27/23 |
Speaker: David Banks (Duke University)
Title: Statistics and the Industries of Today
Abstract: Dr. Deming was one of the foundational leaders in industrial statistics, with contributions to experimental design, sampling, and process control. More importantly, he changed the culture of business leadership in two nations, and implicitly, around the world. But the industries of his day focused on manufacturing, while today’s industries reflect the knowledge economy. This talk asks the industrial statistics community to consider how to update and apply Dr. Deming’s ideas in the Big Data era. There are some very direct correspondences.
Bio: David Banks is a professor in the Department of Statistical Science at Duke University. He obtained his PhD in statistics from Virginia Tech in 1984, then did a two-year postdoctoral fellowship at the University of California at Berkeley. He taught at Carnegie Mellon University and the University of Cambridge, was chief statistician at the U.S. Department of Transportation, and also worked at the National Institute of Standards and Technology and at the Food and Drug Administration. He is a former editor of the Journal of the American Statistical Association and a founding editor of Statistics and Public Policy. He is the former director of the Statistical and Applied Mathematical Sciences Institute. He works in risk analysis, dynamic models for text networks, human rights statistics, and some aspects of machine learning.
|
3/6/23 |
Speaker: Steve Hanneke (Purdue University) Title: A Theory of Universal Learning Abstract: How quickly can functions in a given function class be learned from data? It is common to measure the performance of a supervised machine learning algorithm by plotting its "learning curve", that is, the decay of the classification risk (or "error rate") as a function of the number of training samples. However, the classical theoretical framework for understanding optimal rates of convergence of the risk in statistical learning, rooted in the works of Vapnik-Chervonenkis and Valiant (known as the PAC model), does not explain the behavior of learning curves: rather, it focuses on minimax analysis, which can only provide an upper envelope of the learning curves over a family of data distributions, and the "optimal" rate is the smallest such upper envelope achievable. This does not match the practice of machine learning, where in any given scenario, the data source is typically fixed, while the number of training samples may be chosen as needed. In this talk, I will describe an alternative framework that better captures such practical aspects of machine learning, but still gives rise to a complete theory of optimal learning rates in the spirit of the PAC model. Namely, we consider the problem of universal learning, which aims to understand the convergence rates achievable on every data distribution, without requiring uniformity of the guarantee over distributions for each sample size. In regard to supervised learning, the main result of this work is a remarkable trichotomy: there are only three possible optimal rates of universal learning. More precisely, we show that the learning curves of any given function class decay either at exponential, linear, or arbitrarily slow rates, under the realizability assumption. Moreover, each of these cases is completely characterized by appropriate combinatorial dimensions, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. Allowing for non-realizable (so-called "agnostic") distributions, essentially the same trichotomy remains, with the linear rate replaced by sub-square-root rates. In recent extensions, we have also characterized the optimal universal rates for multiclass learning, general interactive learning, active learning with label queries, semi-supervised learning, and several other variations. In the course of these works, some general principles have emerged regarding the design of optimal learning algorithms based on winning strategies for certain infinite sequential games (Gale-Stewart games), which are used to define data-dependent partial function classes whose minimax rates match the optimal universal rate for the original function class. The corresponding combinatorial dimensions determine the existence of such winning strategies, and reflect a fascinating blending of familiar dimensions from the classical theories of statistical learning and adversarial online learning. Based on joint work with Olivier Bousquet, Shay Moran, Ramon van Handel, and Amir Yehudayoff, which appeared at STOC 2021, and various follow-up works (in preparation) with the aforementioned authors, as well as Idan Attias, Ariel Avital, Klim Efremenko, Alkis Kalavasis, Amin Karbasi, Amirreza Shaeiri, Jonathan Shafer, Ilya Tolstikhin, Grigoris Velegkas, and Qian Zhang. |
3/13/23 |
Spring Break |
3/20/23 |
Mouli Banerjee (University of Michigan) Title: Tackling Posterior Drift via Linear Adjustments and Exponential Tilts
Abstract: I will speak on some of our recent work on transfer learning from a source to a target population in the presence of `posterior drift': i.e. the regression function/Bayes classifier in the target population is different from that in the source. In the situation where labeled samples from the target domain are available, by modeling the posterior drift through a linear adjustment (on an appropriately transformed scale), we are able to learn the nature of the posterior drift using relatively few samples from the target population as compared to the source population, which provides an abundance of samples. The other (semi-supervised) case, where labels from the target are unavailable, is addressed by connecting the probability distribution in the target domain to that in the source domain via an exponential family formulation, and learning the corresponding parameters. Both approaches are motivated by ideas originating in classical statistics. I will present theoretical guarantees for these procedures as well as applications to real data from the UK Biobank study (mortality prediction) and the Waterbirds dataset (image classification).
This is joint work primarily with Subha Maity and Yuekai Sun.
Bio: Moulinath Banerjee was born and raised in India where he completed both his Bachelors and Masters in Statistics at the Indian Statistical Institute, Kolkata. He obtained his Ph.D. from the Statistics department at University of Washington, Seattle, in December 2000, served as lecturer there for Winter and Spring quarters, 2001, and joined University of Michigan in Fall 2001. Mouli's research interests are in the fields of non-standard problems, empirical process theory, threshold and boundary estimation, and more recently, distributed estimation and inference, transfer learning and distributional shift, and problems at the Stat-ML interface. He is currently the editor of Statistical Science. He also has a broad range of interests outside of statistics including classical music, literature, history, philosophy, physics and ancestral genetics, and is also, most emphatically, a gourmet and believes that a life without good food and fine beverages is a life less lived. |
3/27/23 |
Qingyuan Zhao (University of Cambridge) Title: Simultaneous hypothesis testing using negative controls
|
4/3/23 |
Yi Yu (University of Warwick) |
4/10/23 |
|
4/17/23 |
|
4/24/23 |
|
5/1/23 | Heather Battey (Imperial College London) |