Statistics Seminar Series

Choose which semester to display:

Schedule for Spring 2025

Seminars are on Mondays
Time: 4:10 pm - 5:00 pm

Location: Room 903 SSW, 1255 Amsterdam Avenue

 

1/27/2025

Speaker:  Professor Ilias Zadik

Title: Characterizing the power of MCMC methods for sparse estimation

Abstract: Markrov Chain Monte Carlo (MCMC) and local-search methods have been extensively used in the practice of statistics for many decades now. However, their exact theoretical performance has been strikingly eluding even for simple parametric estimation tasks. This is in stark contrast to other classes of estimators such as low-degree polynomials or spectral methods where a much deeper theoretical understanding has been achieved over the recent years.

In this talk we are going to discuss several recent results that characterize the performance of a large class of (low-temperature) MCMC methods for a series of canonical estimation models, such as sparse tensor PCA and sparse linear regression. This characterization reveals that in some models (e.g., in sparse regression) MCMC methods achieve the performance of the conjecturably optimal polynomial-time estimators, but in some other cases (e.g, sparse PCA) they significantly underperform. If time permits, we are going to discuss how one can boost MCMC methods for sparse PCA to achieve polynomial-time optimality by appropriately extending the parameter space.

Bio:  Ilias Zadik is an Assistant Professor of Statistics and Data Science at Yale University. His research mainly focuses on the mathematical theory of statistics and its many connections with other fields such as computer science, probability theory, and statistical physics. His primary area of interest is the study of “computational-statistical trade-offs,” where the goal is to understand whether computational bottlenecks are unavoidable in modern statistical models or a limitation of currently used techniques. Prior to Yale, he held postdoctoral positions at MIT and NYU. He received his PhD from MIT in 2019. 

2/3/2025

Speaker:  Professor Johan Ugander (Stanford University)

Title: Counterfactual Evaluation of Peer-Review Assignment Policies

Abstract: Peer review assignment algorithms aim to match research papers to suitable expert reviewers, working to maximize the quality of the resulting reviews. A key challenge in designing effective assignment policies is evaluating how changes to the assignment algorithm map to changes in review quality. In this work, we leverage recently proposed policies that introduce randomness in peer-review assignment--in order to mitigate fraud--as a valuable opportunity to evaluate counterfactual assignment policies. To address challenges in applying standard off-policy evaluation methods, such as violations of positivity, we introduce methods for partial identification based on monotonicity and smoothness assumptions. We apply our methods to peer-review data from two computer science venues, including a major conference with over 8000 submissions and over 3000 reviewers. Joint work with Martin Saveski, Steven Jecmen, Samir Khan, and Nihar Shah.

Bio: Johan Ugander is an Associate Professor at Stanford University in the Department of Management Science & Engineering, within the School of Engineering. His research develops algorithmic and statistical frameworks for analyzing social networks, social systems, and other large-scale social and behavioral data. Prior to joining the Stanford faculty he was a postdoctoral researcher at Microsoft Research Redmond 2014-2015 and held an affiliation with the Facebook Data Science team 2010-2014. He obtained his Ph.D. in Applied Mathematics from Cornell University in 2014. His awards include a NSF CAREER Award, a Young Investigator Award from the Army Research Office (ARO), three Best Paper Awards (2012 ACM WebSci Best Paper, 2013 ACM WSDM Best Student Paper, 2020 AAAI ICWSM Best Paper), and the 2016 Eugene L. Grant Undergraduate Teaching Award from the Department of Management Science & Engineering.

2/10/2025

Speaker: Dylan Foster (Microsoft Research)

Title: Is Behavior Cloning All You Need? Revisiting the Role of Horizon and Interaction in Imitation Learning

Abstract: Imitation learning (IL) aims to mimic the behavior of an expert in a sequential decision making task by learning from demonstrations, and has been widely applied to robotics, autonomous driving, and autoregressive language generation. The simplest approach to IL, behavior cloning (BC), is thought to incur sample complexity with unfavorable quadratic dependence on the problem horizon, motivating a variety of different online algorithms that attain improved linear horizon dependence under stronger assumptions on the data and the learner’s access to the expert.

In this talk, we revisit the apparent gap between offline and online IL from a learning-theoretic perspective, with a focus on general policy classes up to and including deep neural networks. Through a new analysis of behavior cloning with the logarithmic loss, we will show that it is possible to achieve horizon-independent sample complexity in offline IL whenever (i) the range of the cumulative payoffs is controlled, and (ii) an appropriate notion of supervised learning complexity for the policy class is controlled. When specialized to stationary policies, this implies that the gap between offline and online IL is smaller than previously thought. We will then discuss implications of this result and investigate the extent to which it bears out empirically.

Bio: Dylan Foster is a principal researcher at Microsoft Research, New York. Previously, he was a postdoctoral fellow at MIT and received his PhD in computer science from Cornell University, advised by Karthik Sridharan. His research focuses on problems at the intersection of machine learning and decision making. He has received several awards for his work, including the best paper award at COLT (2019) and best student paper award at COLT (2018, 2019).

 

2/17/2025

Speaker: Professor Linda Valeri (Columbia University)

Title:  Causal inference in non-stationary time series from mHealth studies in Psychiatry

Abstract:  The adoption of digital technologies in Psychiatry holds promise for the evaluation of personalized causal effects to better inform behavioral treatment decisions in a patient population that displays substantial diversity in symptomatology even within the same diagnostic category.
In this presentation I will discuss challenges in estimating the individual causal effect of mobile communication social network size on negative mood of bipolar and schizophrenia patients enrolled in a cohort study part of the Intensive Longitudinal Health Behavior Network. The first challenge is missing data, potentially dependent on participant health status, and the second challenge is non-stationarity of the time series, when the treatment effect may change over time. To address these challenges, we propose a Monte Carlo EM (MCEM) algorithm of the state space model to properly address missing data in non-stationary multivariate time series. We also propose a set of novel causal estimands for (potentially non-stationary) multivariate time series in N-of-1 studies to systematically summarize how time-varying exposures affect outcomes in the short and long term and derive their identification via the g-formula in the presence of exposure- and outcome-covariates feedbacks.

Bio:  Linda Valeri is an assistant professor of Biostatistics at the Columbia University Mailman School of Public Health and adjunct assistant professor of Epidemiology and the Harvard T.H. Chan School of Public Health. Linda Valeri is an expert biostatistician specializing in causal inference, with a focus on biostatistical methodology and statistical learning. She received her doctorate degree in Biostatistics from Harvard University. Her research encompasses causal mediation analysis, measurement error, missing data, and the integration of data from multiple sources, such as smartphone and wearable devices, life-course cohort studies, and electronic medical records, in diverse populations. Dr. Valeri has developed widely utilized open-access computational tools for causal inference, benefiting scientists across biomedical and social sciences. As PI of a career development award from the National Institute of Mental Health, of an R01 research grant from the National Institute of Aging and of an R21 from the National Institute of Environmental Health Sciences she collaborates with interdisciplinary teams to advance our understanding of mental health across the life-course, environmental determinants of health, and health disparities, contributing to informed policy-making.

She is active as associate editor for the International Journal of Biostatistics and as Statistical Editor for JAMA Psychiatry and JAMA Network Open. Dr. Valeri is also enthusiastic in her service to her local community, serving as the Biostatistics faculty representative in the Columbia Mailman School of Public Health Faculty Steering Committee, as well as the regional and national community of statisticians and biostatisticians, serving as a member of the Regional Advisory Board of the Eastern North American Region of the Biometrics society and as elected Council of Sections Representative for the ASA Mental Health Statistics Section.

2/24/2025

Speaker:  Professor Stephen Bates (MIT)

Title:  Hypothesis testing with information asymmetry

Abstract:  Contemporary scientific research is a distributed, collaborative endeavor, carried out by teams of researchers, regulatory institutions, funding agencies, commercial partners, and scientific bodies, all interacting with each other and facing different incentives. To maintain scientific rigor, statistical methods should acknowledge this state of affairs. To this end, we study hypothesis testing when there is an agent (e.g., a researcher or a pharmaceutical company) with a private prior about an unknown parameter and a principal (e.g., a policymaker or regulator) who wishes to make decisions based on the parameter value. The agent chooses whether to run a statistical trial based on their private prior and then the result of the trial is used by the principal to reach a decision. We show how the principal can conduct statistical inference that leverages the information that is revealed by an agent's strategic behavior -- their choice to run a trial or not. In particular, we show how the principal can design a policy to elicit partial information about the agent's private prior beliefs and use this to control the posterior probability of the null. One implication is a simple guideline for the choice of significance threshold in clinical trials: the type-I error level should be set to be strictly less than the cost of the trial divided by the firm's profit if the trial is successful.

This talk is based on the work "Incentive-Theoretic Bayesian Inference for Collaborative Science" with Michael I. Jordan, Michael Sklar, and Jake Soloff and “Sharp Results for Hypothesis Testing with Risk-Sensitive Agents” with Flora Shi and Martin Wainwright.

Bio: Stephen Bates is an Assistant Professor in the MIT EECS department, where he is part of the Laboratory for Information and Decision Systems and holds the X-Window Career Development Chair. He works on statistical inference, uncertainty, and reliable decision-making with data. His recent work is about tools for statistical inference with AI models, data impacted by strategic behavior, and settings with distribution shift. Prior to joining the faculty at MIT, he was a postdoctoral researcher at UC Berkeley hosted by Michael I. Jordan, and he earned his PhD from the Stanford Statistics Department under the supervision of Emmanuel Candès where his thesis work was awarded the Theodore Anderson Dissertation Award and was featured on the cover of Proceedings of the National Academy of Sciences (USA).

3/3/2025

Speaker:  Professor Reese Pathak (UC Berkeley)

Title:  Generalizing beyond the training data: new theory and algorithms for optimal transfer learning

Abstract:  Traditional machine learning often assumes that training (source) data closely resembles the testing (target) data. However, in many contemporary applications this is unrealistic: in e-commerce, consumer behavior is time-varying; in medicine, patient populations can exhibit more or less heterogeneity; in autonomous driving, models are rolled out to new environments. Ignoring these “distribution shifts” can lead to costly, harmful, and even dangerous outcomes. My research tackles these challenges by developing an algorithmic and statistical toolkit for addressing distribution shifts.

This talk focuses on covariate shift, a form of distribution shift where the source and target distributions have different covariate laws. In the first part of the talk, I demonstrate that for a large class of problems, transfer learning is possible, even when the source and target data have non-overlapping support. We introduce the “defect” of a covariate shift, which quantifies the severity of a distribution shift. We demonstrate how the defect can be leveraged algorithmically, leading to methods with optimal learning guarantees. In the second part of the talk, we refine the notion of defect to provide even stronger learning guarantees. We introduce a new method: penalized risk minimization with a non-traditional choice of regularization which is chosen via semidefinite programming. We show that our method has performance which is optimal with respect to the particular covariate shift instance. To our knowledge, these are the first instance-optimal guarantees for transfer learning. Moreover, our results are assumption-light: we impose essentially no restrictions on the underlying covariate laws, thereby broadening the applicability of our theory.

Bio:  Reese Pathak is a Ph.D. candidate in the Department of Electrical Engineering and Computer Sciences (EECS) at the University of California, Berkeley, where he is advised by Martin Wainwright and Michael Jordan. Prior to Berkeley, Reese was an undergraduate at Stanford University. His research interests span high-dimensional and nonparametric statistics as well as continuous optimization, particularly as inspired by modern data science and machine learning applications. Most recently, his work has focused on transfer learning, with the aim of developing new methods and theory adapted to tackling distribution shift problems.

3/10/2025

Speaker:  Professor Zhou Fan (Yale University)

Title:  Dynamical mean-field analysis of adaptive Langevin diffusions

Abstract:  Estimation via sampling is a common paradigm in statistical learning, in which one may wish to sample from a high-dimensional target distribution that is adaptively evolving to past samples. This talk will study an example of such dynamics, given by a Langevin diffusion that performs posterior sampling in a linear model whose prior is being simultaneously learned via a maximum marginal likelihood scheme. Using techniques of dynamical mean-field theory, in a high-dimensional asymptotic framework, we provide a precise characterization of a deterministic joint evolution of the prior parameter and law of the Langevin sample over dimension-independent time horizons, and we formalize a "single-particle" mean-field approximation for individual coordinates of the Langevin trajectory. Combining these results with Markov semigroup techniques, we show under an assumption of a uniform log-Sobolev inequality that the Langevin sample converges (in time) to an equilibrium state which is predicted by a system of replica-symmetric fixed-point equations, and that the prior parameter converges to a critical point of a replica-symmetric limit for the marginal log-likelihood. We explore the nature of the marginal log-likelihood landscape and its critical points in a few simple examples, where such critical points may or may not be unique.

Bio:  Zhou Fan is an Assistant Professor of Statistics and Data Science at Yale University. He received his Ph.D. in Statistics in 2018 from Stanford University. His research lies broadly at the intersection of mathematical statistics, probability theory, and computational algorithms. Recent focuses include random matrix theory for statistical and learning applications; probabilistic inference using techniques from statistical physics; mean-field phenomena and their universality in the dynamics of learning; empirical Bayes methods for regression and dimensionality reduction; and inferential problems arising in quantitative genetics and computational biology.

3/17/2025

HOLIDAY

3/24/2025

Speaker:  Professor Victor Panaretos (EPFL)

Title:  Positive-Definite Extensions and Continuum Graphical Models

Abstract:  We discuss the problem of positive-definite continuation: extending a partially specified covariance kernel from a subdomain Ω of the unit square [0,1]^2 to a covariance kernel on the entire unit square [0,1]^2. For a broad class of subdomains Ω, we obtain a complete picture. Namely, we demonstrate that a canonical extension always exists and can be explicitly constructed. We characterise all possible extensions as suitable perturbations of the canonical extension, and determine necessary and sufficient conditions for a unique extension to exist. We then re-interpret the canonical extension as a graphical model on the associated Gaussian process. We show that this leads to a valid and operational definition for arbitrarily (e.g. uncountably infinitely) indexed Gaussian processes, based directly on the covariance kernel, and describe how this allows for nonparametric estimation of the underlying Markov structure. Based on joint work in collaboration with K.G. Waghmare (ETH Zürich).

Bio:  Victor M. Panaretos is Professor of Mathematical Statistics at the EPFL. He received his PhD in 2007 from UC Berkeley, advised by David Brillinger. Upon graduation he was appointed Assistant Professor at EPFL's Mathematics Institute, where he rose the ranks to Full Professor, also serving as Institute Director. His work studies the interplay between geometrical, functional, and nonparametric statistics. He received the Erich Lehmann Award and an ERC Starting Grant Award. He is an Elected Member of the ISI, a Fellow of the IMS, was the Bernoulli Society Forum Lecturer in 2019, and an IMS Medallion Lecturer in 2025. He has served on the Editorial Boards of the Annals of Statistics, the Annals of Applied Statistics, Biometrika, JASA (Theory & Methods), and EJS. He is currently serving as President of the Bernoulli Society for Mathematical Statistics and Probability.

3/31/2025

Speaker:  Professor Stephane Guerrier (University of Geneva)

Title:  Accurate Inference for Penalized Logistic Regression

Abstract:  Inference for high-dimensional logistic regression models using penalized methods has been a challenging research problem. As an illustration, a major difficulty is the significant bias of the Lasso estimator, which limits its direct application in inference. Although various bias corrected Lasso estimators have been proposed, they often still exhibit substantial biases in finite samples, undermining their inference performance. These finite sample biases become particularly problematic in one-sided inference problems, such as one-sided hypothesis testing. This paper proposes a novel two-step procedure for accurate inference in high-dimensional logistic regression models. In the first step, we propose a Lasso-based variable selection method to select a suitable submodel of moderate size for subsequent inference. In the second step, we introduce a bias corrected estimator to fit the selected submodel. We demonstrate that the resulting estimator from this two-step procedure has a small bias order and enables accurate inference. Numerical studies and an analysis of alcohol consumption data are included, where our proposed method is compared to alternative approaches. Our results indicate that the proposed method exhibits significantly smaller biases than alternative methods in finite samples, thereby leading to improved inference performance. This is a joint work with Yuming Zhang and Runze Li.

Bio:  Stéphane Guerrier is an Associate Professor of Statistics and Data Science at the University of Geneva. He earned his Ph.D. in Statistics from the University of Geneva in 2013. During 2013 to 2018, he held faculty positions at the Pennsylvania State University and the University of Illinois at Urbana-Champaign. In 2019, he was awarded an SNSF Professorship by the Swiss National Science Foundation and moved to the University of Geneva. His research interests include computational statistics, signal processing and biostatistics, with a recent focus on simulation-based inferential methods and bioequivalence assessment.

 

4/7/2025

 

Speaker:  Professor Oliver Feng (University of Bath)

Title:  Optimal convex M-estimation via score matching

Abstract:  In the context of linear regression, we construct a data-driven convex loss function with respect to which empirical risk minimisation yields optimal asymptotic variance in the downstream estimation of the regression coefficients. Our semiparametric approach targets the best decreasing approximation of the derivative of the log-density of the noise distribution. At the population level, this fitting process is a nonparametric extension of score matching, corresponding to a log-concave projection of the noise distribution with respect to the Fisher divergence. The procedure is computationally efficient, and we prove that it attains the minimal asymptotic covariance among all convex M-estimators. As an example of a non-log-concave setting, for Cauchy errors, the optimal convex loss function is Huber-like, and our procedure yields an asymptotic efficiency greater than 0.87 relative to the oracle maximum likelihood estimator of the regression coefficients that uses knowledge of this error distribution; in this sense, we obtain robustness without sacrificing much efficiency. Numerical experiments using our accompanying R package asm confirm the practical merits of our proposal.

Bio:  I’ve been a Lecturer (Assistant Professor) at the University of Bath, UK since 2023. Previously, I was a PhD student and postdoc at the University of Cambridge. My research interests include nonparametric and shape-restricted inference, and approximate message passing.

4/14/2025

Speaker:  Professor Ali Shojaie (University of Washington)

Title: A Unified Framework for Semiparametrically Efficient Semi-Supervised Learning

Abstract:  We consider statistical inference under a semi-supervised setting with access to both labeled and unlabeled datasets and ask the question: under what circumstances, and by how much, can incorporating the unlabeled dataset improve upon inference using the labeled data? To answer this question, we investigate semi-supervised learning through the lens of semiparametric efficiency theory. We characterize the efficiency lower bound under the semi-supervised setting for an arbitrary inferential problem, and show that incorporating unlabeled data can potentially improve efficiency if the parameter is not well-specified. We then propose two types of semi-supervised estimators: a safe estimator that imposes minimal assumptions, is simple to compute, and is guaranteed to be at least as efficient as the initial supervised estimator; and an efficient estimator, which (under stronger assumptions) achieves the semiparametric efficiency bound. Our findings unify existing semiparametric efficiency results for particular special cases, and extend these results to a much more general class of problems. Moreover, we show that our estimators can flexibly incorporate predicted outcomes arising from “black-box” machine learning models, and thereby achieve the same goal as prediction-powered inference (PPI), but with superior theoretical guarantees. We also provide a complete understanding of the theoretical basis for the existing set of PPI methods. Finally, we apply the theoretical framework developed to derive and analyze efficient semi-supervised estimators in a number of settings, including M-estimation, U-statistics, and average treatment effect estimation, and demonstrate the performance of the proposed estimators in simulation.

Bio:  Ali Shojaie is Norm Breslow Endowed Professor of Biostatistics and Statistics at the University of Washington (UW). He is Associate Chair of the Department of Biostatistics, Founding Director of the Summer Institute for Statistics in Big Data (SISBID) at the University of Washington and Lead of the Data Management and Statistics (DMS) Core of the UW Alzheimer’s Disease Research Center (ADRC). Dr. Shojaie’s research lies in the intersection of statistical machine learning, statistical network analysis and applications in biology and social sciences. He is an elected Fellow of the American Statistical Association (ASA) and the Institute of Mathematical Statistics (IMS) and recipient of the 2022 Leo Breiman Award from the ASA Section on Statistical Learning and Data Science (SLDS).

 

4/21/2025

Speaker: Spencer Frei (Google)

Title:  Generalization and benign overfitting in-context in trained transformer classifiers

Abstract:  Transformers have the capacity to act as supervised learning algorithms: by properly encoding a set of labeled training ("in-context") examples and an unlabeled test example into an input sequence of vectors of the same dimension, the forward pass of the transformer can produce predictions for that unlabeled test example. A line of recent work has shown that when linear transformers are pre-trained on random instances for linear regression tasks, these trained transformers make predictions using an algorithm similar to that of ordinary least squares. In this work, we investigate the behavior of linear transformers trained on random linear classification tasks. Via an analysis of the implicit regularization of gradient descent, we characterize how many pre-training tasks and in-context examples are needed for the trained transformer to generalize well at test-time. We further show that in some settings, these trained transformers can exhibit "benign overfitting in-context": when in-context examples are corrupted by label flipping noise, the transformer memorizes all of its in-context examples (including those with noisy labels) yet still generalizes near-optimally for clean test examples.

Bio:  Spencer Frei is a research scientist at Google DeepMind, where he works on foundational research in artificial intelligence.  Prior to joining GDM, he was an assistant professor in the statistics department at UC Davis and before that a postdoctoral fellow at the Simons Institute for the Theory of Computing at UC Berkeley, hosted by Peter Bartlett and Bin Yu. He received his Ph.D at UCLA, supervised by Quanquan Gu and Ying Nian Wu. He was a co-organizer of a tutorial at NeurIPS 2023 on benign overfitting and was named a Rising Star in Machine Learning by the University of Maryland.

4/28/2025

Speaker: Professor Tengyuan Liang (University of Chicago)

Title:  Denoising Diffusions with Optimal Transport

Abstract:  Adding noise is easy; what about denoising? Diffusion is easy; what about reverting a diffusion? Diffusion-based generative models aim to denoise a Langevin diffusion chain, moving from a log-concave equilibrium measure ν, say isotropic Gaussian, back to a complex, possibly non-log-concave initial measure µ. The score function performs denoising, going backward in time, predicting the conditional mean of the past location given the current. We show that score denoising is the optimal backward map in transportation cost. What is its localization uncertainty? We show that the curvature function determines this localization uncertainty, measured as the conditional variance of the past location given the current. We study in this paper the effectiveness of the diffuse-then-denoise process: the contraction of the forward diffusion chain, offset by the possible expansion of the backward denoising chain, governs the denoising difficulty. For any initial measure µ, we prove that this offset net contraction at time t is characterized by the curvature complexity of a smoothed µ at a specific signal-to-noise ratio (SNR) scale r(t). We discover that the multi-scale curvature complexity collectively determines the difficulty of the denoising chain. Our multi-scale complexity quantifies a fine-grained notion of average-case curvature instead of the worst-case. Curiously, it depends on an integrated tail function, measuring the relative mass of locations with positive curvature versus those with negative curvature; denoising at a specific SNR scale is easy if such an integrated tail is light. We conclude with several non-log-concave examples to demonstrate how the multi-scale complexity probes the bottleneck SNR for the diffuse-then-denoise process.

Bio:  Tengyuan Liang is a Professor of Econometrics and Statistics in the Wallman Society of Fellows at the University of Chicago, Booth School of Business. His research focuses on the statistical and computational foundations of AI and its reliable applications in business and economics. He has published in leading journals in applied mathematics, economics, machine learning, and statistics. He was awarded a National Science Foundation CAREER Grant for his work on modern statistical learning paradigms. He served as an Associate Editor for the Journal of the American Statistical Association and the Operations Research.

5/5/2025

Speaker:  Professor Philippe Rigollet (MIT)

Title: The Emergence of Clusters in Self-Attention Dynamics

Abstract:  Since their introduction in 2017, Transformers have revolutionized large language models and the broader field of deep learning. Central to this success is the groundbreaking self-attention mechanism. In this presentation, I’ll introduce a mathematical framework that casts this mechanism as a mean-field interacting particle system, revealing a desirable long-time clustering behavior. This perspective leads to a trove of fascinating questions with unexpected connections to Kuramoto oscillators, sphere packing, Wasserstein gradient flows, and slow dynamics.

Bio: Philippe Rigollet is a Distinguished Professor of Mathematics at MIT, where he serves as Chair of the Applied Math Committee and Director of the Statistics and Data Science Center. His research spans multiple dimensions of mathematical data science, including statistics, machine learning, and optimization, with recent emphasis on optimal transport and its applications. See https://math.mit.edu/~rigollet/ for more information.