Schedule for Spring 2023
Seminars are on Mondays
Time: 4:00pm – 5:00pm
Location: Room 903 SSW, 1255 Amsterdam Avenue
1/23/23
|
Krishna Balasubramanium (UC Davis) Title: Recent Advances in Non-log-concave and Heavy-tailed Sampling Abstract: This talk will be about recent advances in the complexity of sampling from non-log-concave and heavy-tailed densities. Taking motivation from the theory of non-convex optimization, first, a framework for establishing the iteration complexity of sampling of the Langevin Monte Carlo (LMC) when the non-log-concave target density satisfies only the relatively milder Holder-smoothness assumption will be discussed. In particular, this approach yields a new state-of-the-art guarantee for sampling with LMC from distributions which satisfy a Poincare inequality. Next, the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Ito diffusions associated with weighted Poincare inequalities will be discussed. Based on a mean-square analysis, we obtain the iteration complexity in the Wasserstein-2 metric for sampling from a class of heavy-tailed target distributions. Our approach takes the mean-square analysis to its limits, i.e., we invariably only require that the target density has finite variance, the minimal requirement for a mean-square analysis. Bio: Krishna Balasubramanian is an assistant professor in the Department of Statistics, University of California, Davis. His research interests include stochastic optimization and sampling, geometric and topological statistics, and theoretical machine learning. His research was/is supported by a Facebook PhD fellowship, and CeDAR and NSF grants. |
1/30/23 |
Johannes Wiesel (Columbia) Title: The out-of-sample prediction error of the square-root lasso and related estimators
Abstract: We study the classical problem of predicting an outcome variable, Y, using a linear combination of a d-dimensional covariate vector, X. We are interested in linear predictors whose coefficients solve: inf_β (E[(Y – < β, X >)^r])^(1/r) + || β ||, where r >1 and > 0 is a regularization parameter. We provide conditions under which linear predictors based on these estimators minimize the worst-case prediction error over a ball of distributions determined by a type of max-sliced Wasserstein metric. A detailed analysis of the statistical properties of this metric yields a simple recommendation for the choice of regularization parameter. The suggested order of , after a suitable normalization of the covariates, is typically d/n, up to logarithmic factors. Our recommendation is computationally straightforward to implement, pivotal, has provable out-of-sample performance guarantees, and does not rely on sparsity assumptions about the true data generating process.
This is joint work with Jose Montiel Olea, Amilcar Velez and Cindy Rush.
Bio: Johannes Wiesel is an Assistant Professor in the Department of Statistics at Columbia University. In summer 2020 he received a PhD from Oxford University under the supervision of Jan Obloj. His research focuses on mathematical statistics with a special emphasis on statistical optimal transport. He is also interested in the robust approach to mathematical finance, which does not start with an a priori model but rather with the information available in the markets. In this context he has established new connections to the theory of optimal transport on the one hand and robust statistics as well as machine learning on the other, with the ultimate goal to develop a universal toolbox for the implementation of robust and time-consistent trading strategies and risk assessment.
|
2/6/23 |
Yun Yang (UIUC) Title: Implicit estimation of high-dimensional distributions using generative models
Abstract: The estimation of distributions of complex objects from high-dimensional data with low-dimensional structures is an important topic in statistics and machine learning. Deep generative models achieve this by encoding and decoding data to generate synthetic realistic images and texts. A key aspect of these models is the extraction of low-dimensional latent features, assuming data lies on a low-dimensional manifold. We study this by developing a minimax framework for distribution estimation on unknown submanifolds with smoothness assumptions on the target distribution and the manifold. The framework highlights how problem characteristics, such as intrinsic dimensionality and smoothness, impact the limits of high-dimensional distribution estimation. Our estimator, which is a mixture of locally fitted generative models, is motivated by differential geometry techniques and covers cases where the data manifold lacks a global parametrization.
Bio: Yun Yang is an associate professor in the Department of Statistics, University of Illinois Urbana-Champaign. His research interests include Bayesian inference, high dimensional statistics, optimal transport and statistical learning theory. His research was/is supported by NSF grants.
|
2/13/23 |
Edward Kennedy (CMU) Title: Optimal estimation of heterogeneous causal effects
Abstract: Estimation of hetero
Bio: Edward Kennedy is an associate professor of Statistics & Data Science at Carnegie Mellon University. He joined the department after graduating with a PhD in biostatistics from the University of Pennsylvania. Edward’s methodological interests lie at the intersection of causal inference, machine learning, and nonparametric theory, especially in settings involving high-dimensional and otherwise complex data. His applied research focuses on problems in criminal justice, health services, medicine, and public policy. Edward is a recipient of the NSF CAREER award, the David P. Byar Young Investigator award, and the Thomas Ten Have Award for exceptional research in causal inference.
|
2/20/23 |
Jiayang Sun (George Mason University) Title: Semi-parametric Learning for Explainable Models -Triathlon Abstract: Feature selection is critical for developing drug targets or understanding reproductive success at high altitudes. However, selected features depend on the model assumption used for feature selection. Determining variable transformations to make the model more realistic or interpretable is not trivial in the case of many features or variables. This talk presents our advance toward a semi-parametric learning pipeline to study feature, transformation, and model selection in a “triathlon.” We introduce a concept of transformation necessity-sufficiency guarantee, open up dialogues for paradigm changes, provide our learning procedure for explainable models, illustrate its performance, and demonstrate its application in understanding social, physiological, and genetic contributions to reproductive success of Tibetan women. This is the joint work with Shenghao Ye, Cynthia Beall, and Mary Meyer. Bio: Jiayang Sun currently holds the positions of Professor, Chair, and Bernard Dunn Eminent Scholar at the Department of Statistics at George Mason University. Before joining GMU, she was Professor of the Department of Population and Quantitative Health Sciences and the Director of the Center for Statistical Research, Computing and Collaboration (SR2c), Case Western Reserve University. She has published in top statistical and computational journals, including AOS, JASA, AOP, Biometrika, Statistica Sinica, Biometrical Journal, Statistics in Medicine, JCGS and SIAM J Sci. & Stat. Comp, and other statistical and scientific journals. Her statistical research has included simultaneous confidence bounds, multiple comparisons, biased sampling, measurement errors, mixtures, machine learning, causal Inference, crowdsourcing, EHR, text mining, bioinformatics, network analysis, imaging, high-dimensional and big data. Her interdisciplinary work has included cancer, environmental science, neuroscience, wound care, dental, and biomaterial, in addition to astronomy, computer science, energy, law, and agriculture. She is an elected Fellow of ASA, IMS, and an elected member of ISI, and an active member of the profession, having been the 2016 President of CWS, and on various committees in the ASA, IMS, CWS, ICSA and ISI. Her work has been supported by awards from the NSF, NIH, NSA, DOD, DOE, VA, ASA, and INOVA.
|
2/27/23 |
Speaker: David Banks (Duke University)
Title: Statistics and the Industries of Today
Abstract: Dr. Deming was one of the foundational leaders in industrial statistics, with contributions to experimental design, sampling, and process control. More importantly, he changed the culture of business leadership in two nations, and implicitly, around the world. But the industries of his day focused on manufacturing, while today’s industries reflect the knowledge economy. This talk asks the industrial statistics community to consider how to update and apply Dr. Deming’s ideas in the Big Data era. There are some very direct correspondences.
Bio: David Banks is a professor in the Department of Statistical Science at Duke University. He obtained his PhD in statistics from Virginia Tech in 1984, then did a two-year postdoctoral fellowship at the University of California at Berkeley. He taught at Carnegie Mellon University and the University of Cambridge, was chief statistician at the U.S. Department of Transportation, and also worked at the National Institute of Standards and Technology and at the Food and Drug Administration. He is a former editor of the Journal of the American Statistical Association and a founding editor of Statistics and Public Policy. He is the former director of the Statistical and Applied Mathematical Sciences Institute. He works in risk analysis, dynamic models for text networks, human rights statistics, and some aspects of machine learning.
|
3/6/23 |
Speaker: Steve Hanneke (Purdue University) Title: A Theory of Universal Learning Abstract: How quickly can functions in a given function class be learned from data? It is common to measure the performance of a supervised machine learning algorithm by plotting its “learning curve”, that is, the decay of the classification risk (or “error rate”) as a function of the number of training samples. However, the classical theoretical framework for understanding optimal rates of convergence of the risk in statistical learning, rooted in the works of Vapnik-Chervonenkis and Valiant (known as the PAC model), does not explain the behavior of learning curves: rather, it focuses on minimax analysis, which can only provide an upper envelope of the learning curves over a family of data distributions, and the “optimal” rate is the smallest such upper envelope achievable. This does not match the practice of machine learning, where in any given scenario, the data source is typically fixed, while the number of training samples may be chosen as needed. In this talk, I will describe an alternative framework that better captures such practical aspects of machine learning, but still gives rise to a complete theory of optimal learning rates in the spirit of the PAC model. Namely, we consider the problem of universal learning, which aims to understand the convergence rates achievable on every data distribution, without requiring uniformity of the guarantee over distributions for each sample size. In regard to supervised learning, the main result of this work is a remarkable trichotomy: there are only three possible optimal rates of universal learning. More precisely, we show that the learning curves of any given function class decay either at exponential, linear, or arbitrarily slow rates, under the realizability assumption. Moreover, each of these cases is completely characterized by appropriate combinatorial dimensions, and we exhibit optimal learning algorithms that achieve the best possible rate in each case. Allowing for non-realizable (so-called “agnostic”) distributions, essentially the same trichotomy remains, with the linear rate replaced by sub-square-root rates. In recent extensions, we have also characterized the optimal universal rates for multiclass learning, general interactive learning, active learning with label queries, semi-supervised learning, and several other variations. In the course of these works, some general principles have emerged regarding the design of optimal learning algorithms based on winning strategies for certain infinite sequential games (Gale-Stewart games), which are used to define data-dependent partial function classes whose minimax rates match the optimal universal rate for the original function class. The corresponding combinatorial dimensions determine the existence of such winning strategies, and reflect a fascinating blending of familiar dimensions from the classical theories of statistical learning and adversarial online learning. Based on joint work with Olivier Bousquet, Shay Moran, Ramon van Handel, and Amir Yehudayoff, which appeared at STOC 2021, and various follow-up works (in preparation) with the aforementioned authors, as well as Idan Attias, Ariel Avital, Klim Efremenko, Alkis Kalavasis, Amin Karbasi, Amirreza Shaeiri, Jonathan Shafer, Ilya Tolstikhin, Grigoris Velegkas, and Qian Zhang. |
3/13/23 |
Spring Break |
3/20/23 |
Mouli Banerjee (University of Michigan) Title: Tackling Posterior Drift via Linear Adjustments and Exponential Tilts
Abstract: I will speak on some of our recent work on transfer learning from a source to a target population in the presence of `posterior drift’: i.e. the regression function/Bayes classifier in the target population is different from that in the source. In the situation where labeled samples from the target domain are available, by modeling the posterior drift through a linear adjustment (on an appropriately transformed scale), we are able to learn the nature of the posterior drift using relatively few samples from the target population as compared to the source population, which provides an abundance of samples. The other (semi-supervised) case, where labels from the target are unavailable, is addressed by connecting the probability distribution in the target domain to that in the source domain via an exponential family formulation, and learning the corresponding parameters. Both approaches are motivated by ideas originating in classical statistics. I will present theoretical guarantees for these procedures as well as applications to real data from the UK Biobank study (mortality prediction) and the Waterbirds dataset (image classification).
This is joint work primarily with Subha Maity and Yuekai Sun.
Bio: Moulinath Banerjee was born and raised in India where he completed both his Bachelors and Masters in Statistics at the Indian Statistical Institute, Kolkata. He obtained his Ph.D. from the Statistics department at University of Washington, Seattle, in December 2000, served as lecturer there for Winter and Spring quarters, 2001, and joined University of Michigan in Fall 2001. Mouli’s research interests are in the fields of non-standard problems, empirical process theory, threshold and boundary estimation, and more recently, distributed estimation and inference, transfer learning and distributional shift, and problems at the Stat-ML interface. He is currently the editor of Statistical Science. He also has a broad range of interests outside of statistics including classical music, literature, history, philosophy, physics and ancestral genetics, and is also, most emphatically, a gourmet and believes that a life without good food and fine beverages is a life less lived. |
3/27/23 |
Qingyuan Zhao (University of Cambridge) Title: Simultaneous hypothesis testing using negative controls Abstract: Negative control is a common technique in scientific investigations and broadly refers to the situation where a null effect (“negative result”) is expected. Motivated by a real proteomic dataset and an ad hoc procedure shared with us by collaborators, I will present three promising and closely connected ways of using negative controls to assist simultaneous hypothesis testing. The first perspective uses negative controls to construct a permutation p-value for every hypothesis under investigation, and we give several sufficient conditions for such p-values to be valid and positive regression dependent on the set (PRDS) of true nulls. The second perspective uses negative controls to construct an estimate of the false discovery rate (FDR). We give a sufficient condition under which the step-up procedure based on this estimate controls the FDR. The third perspective, derived from the original ad hoc procedure given by our collaborators, uses negative controls to construct a nonparametric estimator of the local false discovery rate. I will conclude the talk with a twist. This talk is based on joint work with Zijun Gao. Short bio: Qingyuan Zhao is an Assistant Professor at the Statistical Laboratory, University of Cambridge. He is interested in improving the general quality and appraisal of statistical research, including new methodology and a better understanding of causal inference, novel study designs, sensitivity analysis, multiple testing, and selective inference. |
4/3/23 |
Yi Yu (University of Warwick) Title: Change point inference in high-dimensional regression models under temporal dependence Abstract: This paper concerns about the limiting distributions of change point estimators, in a high-dimensional linear regression time series context, where a regression object $(y_t, X_t) \in \mathbb{R} \times \mathbb{R}^p$ is observed at every time point $t \in \{1, \ldots, n\}$. At unknown time points, called change points, the regression coefficients change, with the jump sizes measured in $\ell_2$-norm. We provide limiting distributions of the change point estimators in the regimes where the minimal jump size vanishes and where it remains a constant. We allow for both the covariate and noise sequences to be temporally dependent, in the functional dependence framework, which is the first time seen in the change point inference literature. We show that a block-type long-run variance estimator is consistent under the functional dependence, which facilitates the practical implementation of our derived limiting distributions. We also present a few important byproducts of their own interest, including a novel variant of the dynamic programming algorithm to boost the computational efficiency, consistent change point localisation rates under functional dependence and a new Bernstein inequality for data possessing functional dependence. The paper is available at http://arxiv.org/abs/2207.12453 Bio: I am a Reader in the Department of Statistics, University of Warwick and a Turing Fellow at the Alan Turing Institute, previously an Associate Professor in the University of Warwick, a Lecturer in the University of Bristol, a postdoc of Professor Richard Samworth and a graduate student of Professor Zhiliang Ying. I obtained my academic degrees from Fudan University (B.Sc. in Mathematics, June 2009 and Ph.D. in Mathematical Statistics, June 2013). |
4/10/23 |
Bryon Aragam (University of Chicago) Title: A modern approach to nonparametric latent variable models and representation learning: Identifiability, consistency, and a nonstandard minimax rate Abstract: One of the key paradigm shifts in statistical machine learning over the past decade has been the transition from handcrafted features to automated, data-driven representation learning, typically via deep neural networks. As these methods are being used in high stakes settings such as medicine, health care, law, and finance where accountability and transparency are not just desirable but often legally required, it has become necessary to place representation learning on a rigorous scientific footing. In this talk we will re-visit the statistical foundations of nonparametric latent variable models, and discuss how even basic statistical properties such as identifiability and consistency are surprisingly subtle. We will also discuss new results characterizing the optimal sample complexity for learning simple nonparametric mixtures, which turns out to have a nonstandard super-polynomial bound. Time permitting, we will end with applications to deep generative models that are widely used in practice. This talk is based on joint work with Wai Ming Tai and Ruiyi Yang. Bio: Bryon Aragam studies statistical machine learning, nonparametric statistics, and unsupervised learning. His recent work focuses on applications to artificial intelligence and deep generative models, and attempts to understand the statistical foundations of these models and how to improve them from both practical and theoretical perspectives. He is also involved with developing open-source software and solving problems in interpretability, ethics, and fairness in artificial intelligence. Prior to joining the University of Chicago, he was a project scientist and postdoctoral researcher in the Machine Learning Department at Carnegie Mellon University. He completed his PhD in Statistics and a Masters in Applied Mathematics at UCLA, where he was an NSF graduate research fellow. Bryon has also served as a data science consultant for technology and marketing firms, where he has worked on problems in survey design and methodology, ranking, customer retention, and logistics. |
4/17/23 |
Jason Altschuler (NYU) Title: Shifted divergences for sampling, privacy, and beyond Abstract: Shifted divergences provide a principled way of making information theoretic divergences (e.g. KL) geometrically aware via optimal transport smoothing. In this talk, I will argue that shifted divergences provide a powerful approach towards unifying optimization, sampling, differential privacy, and beyond. For concreteness, I will demonstrate these connections via three recent highlights. (1) The fastest high-accuracy algorithm for sampling from log-concave distributions. (2) Resolving the mixing time of the Langevin Algorithm to its stationary distribution for log-concave sampling. (3) Resolving the differential privacy of Noisy-SGD, the standard algorithm for private convex optimization in both theory and practice. A recurring theme is a certain notion of algorithmic stability, and the central technique for establishing this is shifted divergences. Based on joint work with Kunal Talwar, and with Sinho Chewi. Bio: Jason Altschuler is a CDS Faculty Fellow at NYU in 2022-2023, and an assistant professor at the UPenn Department of Statistics and Data Science starting July 2023. Previously, he received his PhD from MIT, and before that he received his undergraduate degree from Princeton. His research interests are broadly at the intersection of optimization, probability, and machine learning, with a recent focus on computational aspects of problems related to optimal transport. |
4/24/23 |
Sabyasachi Chatterjee (University of Illinois at Urbana Champaign)
Title: Theory for Cross Validation in Nonparametric Regression
Abstract: We formulate a general cross validation framework for signal denoising. The general framework is then applied to nonparametric regression methods such as Trend Filtering and Dyadic CART. The resulting cross validated versions are then shown to attain nearly the same rates of convergence as are known for the optimally tuned analogues. There did not exist any previous theoretical analyses of cross validated versions of Trend Filtering or Dyadic CART. Our general framework is inspired by the ideas in Chatterjee and Jafarov (2015) and is potentially applicable to a wide range of estimation methods which use tuning parameters.
Bio: I am an Assistant Professor (from 2017 onwards) in the Statistics Department at University of Illinois at Urbana Champaign. Most of my research has been in Nonparametric Function Estimation/ Statistical Signal Processing. I am also interested in Probability and all theoretical aspects of Machine Learning. I obtained my Phd in 2014 at Yale University and then was a Kruskal Instructor at University of Chicago till 2017.
|
5/1/23 |
Heather Battey (Imperial College London) Title: Inducement of population-level sparsity. Abstract: The work on parameter orthogonalization by Cox and Reid (1987) is presented as inducement of population-level sparsity. The latter is taken as a unifying theme for the talk, in which sparsity-inducing parameterizations or data transformations are sought. Three recent examples are framed in this light: sparse parameterizations of covariance models; construction of factorizable transformations for the elimination of nuisance parameters; and inference in high-dimensional regression. The solution strategy for the problem of exact or approximate sparsity inducement appears to be context specific and may entail, for instance, solving one or more partial differential equation, or specifying a parameterized path through transformation or parameterization space. |