Statistics Seminar – Fall 2020

Schedule for Fall 2020

The Statistics Seminar has migrated to Zoom for the Fall 2020 semester.

Seminars are on Mondays
Time: 1:00pm – 2:00pm
Zoom Link:


For an archive of past seminars, please click here.



Liam Paninski (Columbia)

Title: Some open directions in neural data science

Abstract: Neuroscience has rapidly moved into the realm of science fiction in the last few years (lasers! genetic engineering! glowing brains! mind-reading! memory writing!), and with these advances have come an array of challenging and interesting data science problems. This talk will be an informal tour of a few of these problems, with an emphasis on open research directions.



Ashwin Pananjady (UC Berkeley)

Title: Flexible models for learning from people: Statistics meets computation
A plethora of latent variable models are used to learn from data generated by people. Specific examples include the Bradley–Terry–Luce and multinomial logit models for comparison and choice data, the Dawid–Skene model for crowdsourced question answering, and the Rasch model for categorical data that arises in psychometric analysis. In this talk, I will present a class of “permutation-based” models that borrows from the literature on sociology and economics and significantly generalizes classical approaches in these contexts, thereby improving their robustness to mis-specification. The talk will focus on the mathematical statistics of fitting these models, and describe a methodological toolbox that is inspired by considerations of adaptation as well as computation. These considerations highlight connections between the theory of adaptation in nonparametric statistics and conjectures in average-case computational complexity. The talk will present vignettes from two papers, one jointly with Cheng Mao and Martin Wainwright, and another jointly with Richard Samworth.



Jose Luis Montiel Olea (Columbia)
Title: Dropout Training is Distributionally Robust Optimal
Abstract: Dropout training is an increasingly popular estimation method in machine learning that minimizes some given loss function (e.g., the negative expected log-likelihood), but averaged over nested submodels chosen at random.  
This paper shows that dropout training in Generalized Linear Models is the minimax solution of a two-player, zero-sum game where an adversarial nature corrupts a statistician’s covariates  using  a  multiplicative nonparametric  errors-in-variables  model. In this game–known as a Distributionally Robust Optimization problem—nature’s least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some fixed probability $\delta$. Our decision-theoretic analysis shows that dropout training—the statistician’s minimax strategy in the game—indeed provides out-of-sample expected loss guarantees for distributions that arise from multiplicative perturbations of in-sample data. 

This paper also provides a novel, parallelizable, Unbiased Multi-Level Monte Carlo algorithm to speed-up the implementation of dropout training. Our algorithm has a much smaller computational cost compared to the naive implementation of dropout,  provided the number of data points is much smaller than the dimension of the covariate vector.

This is joint work with José Blanchet, Yang Kang, Viet Nguyen, and Xuhui Zhang. 


Bhaswar Bhattacharya (U Penn)

Title: Detection Thresholds for Non-Parametric Tests Based on Geometric Graphs: The Curious Case of Dimension 8

Abstract: Two of the fundamental problems in non-parametric statistical inference are goodness-of-fit and two-sample testing. These two problems have been extensively studied and several multivariate tests have been proposed over the last thirty years, many of which are based on geometric graphs. These include, among several others, the celebrated Friedman-Rafsky two-sample test based on the minimal spanning tree and the K-nearest neighbor graphs, and the Bickel-Breiman spacings tests for goodness-of-fit. These tests are asymptotically distribution-free, universally consistent, and computationally efficient (both in sample size and in dimension), making them particularly attractive for modern statistical applications.

In this talk, we will derive the detection thresholds and limiting local power of these tests, thus providing a way to compare and justify the performance of these tests in various applications. Several interesting properties emerge, such as a curious phase transition in dimension 8, and a remarkable blessing of dimensionality in detecting scale changes.


Stefan Wager (Stanford)

Title: Synthetic Difference in Differences

Abstract: We present a new estimator for causal effects with panel data that builds on insights behind the widely used difference in differences and synthetic control methods. We find, both theoretically and in empirical studies, that this “synthetic difference in differences” estimator has desirable robustness properties relative to both difference in differences and synthetic controls, and that it performs well in settings where either of these conventional estimators are commonly used in practice. We also study the asymptotic behavior of the proposed estimator in a low-rank confounding model, and articulate conditions for consistency and asymptotic normality.

joint work with Dmitry Arkhangelsky, Susan Athey, David Hirshberg and Guido Imbens



Yuting Wei (CMU)

Title: Reliable hypothesis testing paradigms in high dimensions


Modern scientific discovery and decision making require the development of trustworthy and informative inferential procedures, which are particularly challenging when coping with high-dimensional data. This talk presents two vignettes on the theme of reliable high-dimensional inference.

The first vignette considers performing inference based on the Lasso estimator when the number of covariates is of the same order or larger than the number of observations. Classical asymptotic statistics theory fails due to two fundamental reasons: (1) The regularized risk is non-smooth; (2) The discrepancy between the estimator and the true parameter vector cannot be neglected. We pin down the distribution of the Lasso, as well as its debiased version, under a broad class of Gaussian correlated designs with non-singular covariance structure. Our findings suggest that a careful degree-of-freedom correction is crucial for computing valid confidence intervals in this challenging regime.

The second vignette investigates the Model-X knockoffs framework — a general procedure that can leverage any feature importance measure to produce a variable selection algorithm. Model-X knockoffs rely on the construction of synthetic random variables and is, therefore, random. We propose a method for derandomizing — and hence stabilizing — model-X knockoffs. By aggregating the selection results across multiple runs of the knockoffs algorithm, our method provides stable decisions without compromising statistical power. Our approach, when applied to the multi-stage GWAS of prostate cancer, reports locations on the genome that have been replicated with other studies.

The first vignette is based on joint work with Michael Celentano and Andrea Montanari, whereas the second one is based on joint work with Zhimei Ren and Emmanuel Candes.

Bio: Yuting Wei is currently an assistant professor in the Statistics and Data Science department at Carnegie Mellon University. Prior to that, she was a Stein Fellow at Stanford University, and she received her Ph.D. in statistics at University of California, Berkeley working with Martin Wainwright and Aditya Guntuboyina. She was the recipient of the 2018 Erich L. Lehmann Citation from the Berkeley statistics department for her Ph.D. dissertation in theoretical statistics. Her research interests include high-dimensional and non-parametric statistics, statistical machine learning, and reinforcement learning.


Alexander Aue (UC Davis)

Title: Random matrix theory aids statistical inference in high dimensions


“The first part of the talk is on bootstrapping spectral statistics in high dimensions. Spectral statistics play a central role in many multivariate testing problems. It is therefore of interest to approximate the distribution of functions of the eigenvalues of sample covariance matrices. Although bootstrap methods are an established approach to approximating the laws of spectral statistics in low-dimensional problems, these methods are relatively unexplored in the high-dimensional setting. The aim of this talk is to focus on linear spectral statistics (LSS) as a class of “prototype statistics” for developing a new bootstrap method in the high-dimensional setting. In essence, the method originates from the parametric bootstrap, and is motivated by the notion that, in high dimensions, it is difficult to obtain a non-parametric approximation to the full data-generating distribution. From a practical standpoint, the method is easy to use, and allows the user to circumvent the difficulties of complex asymptotic formulas for LSS. In addition to proving the consistency of the proposed method, I will discuss encouraging empirical results in a variety of settings. Lastly, and perhaps most interestingly, simulations indicate that the method can be applied successfully to statistics outside the class of LSS, such as the largest sample eigenvalue and others.

The second part of the talk briefly highlights two-sample tests in high dimensions by discussing ridge-regularized generalization of Hotelling’s T^2. The main novelty of this work is in devising a method for selecting the regularization parameter based on the idea of maximizing power within a class of local alternatives. The performance of the proposed test procedures will be illustrated through an application to a breast cancer data set where the goal is to detect the pathways with different DNA copy number alterations across breast cancer subtypes.”


Academic Holiday – No Seminar



Sebastian Engelke (University of Geneva)

Title: Gradient boosting for extreme quantile regression

Quantile regression relies on minimizing the conditional quantile loss, which is based on the quantile check function. This has been extended to flexible regression functions such as the quantile regression forest (Meinshausen, 2006) and the gradient forest (Athey et al., 2019). These methods break down if the quantile of interest lies outside of the range of the data. Extreme value theory provides the mathematical foundation for estimation of such extreme quantiles. A common approach is to approximate the exceedances over a high threshold by the generalized Pareto distribution. For conditional extreme quantiles, one may model the parameters of this distribution as functions of the predictors. Up to now, the existing methods are either not flexible enough (e.g., linear methods) or do not generalize well in higher dimensions (e.g., kernel based methods). We develop a new approach based on gradient boosting for extreme quantile regression that estimates the parameters of the generalized Pareto distribution in a flexible way even in higher dimensions. We discuss cross-validation of the tuning parameters and show how the importance of the different predictors can be measured. Our estimator outperforms classical quantile regression methods and methods from extreme value theory in simulations studies. We study an application to forecasting of extreme precipitation in statistical post-processing.

This is joint work with Jasper Velthoen, Clement Dombry and Juan-Juan Cai.


Speaker: Mathias Drton, Technical University of Munich, Germany

Title: Score matching for graphical models

Abstract: A common challenge in estimation of parameters of multivariate probability density functions is the intractability of the normalizing constant.  For continuous data, the score matching method of Hyvärinen (2005) provides a way to circumvent this issue and is particularly convenient for graphical modeling.  In this talk I will present regularized score matching methods for high-dimensional and possibly non-Gaussian graphical models.  In particular, I will discuss generalizations of score matching for observations that are non-negative or otherwise constrained in their support.


Maya Gupta (Didero, CEO/Founder)

Title: Shape Constraints Make ML Smarter, Fairer, and More Robust

Abstract: A classic shape constraint is monotonicity, which forces a model’s output to only increase if a specific input increases, and can be imposed on one-dimensional functions by isotonic regression. Newer shape constraints are diminishing returns and unimodality, and most recently, multi-dimensional shape constraints like that two inputs are complements, or that one input dominates another input.   We will show how shape constraints are important for capturing prior knowledge, imposing deontological ethics, and making models robust to distribution shift.  We’ll show how to fit arbitrarily flexible functions with shape constraints, using lattice models, which are linear splines over regular grids, and that lattices can be composed as layers in deep lattice networks. Google’s open-source Tensor Flow library TF Lattice makes it easy to design and build deep lattice networks and hybrid neural networks that satisfy these shape constraints.

Short Bio: Maya Gupta is a researcher and entrepreneur.  From 2013-2020, she led the Glassbox Machine Learning R&D team  at Google Research, developing and deploying new ideas in constrained machine learning to make products more accurate, interpretable, safe, and fair.   Gupta was an Associate Professor of Electrical Engineering at the University of Washington from 2003-2012, where she received the  PECASE (presidential early career award for scientists and engineers), and Office of Naval Research Young Investigator Award for her work in sonar statistical signal processing.  Gupta received her PhD in EE from Stanford in 2003,  a BS EE and BA Econ from Rice University in 1997.  Gupta is founder and CEO of 6 companies; her current focus is on building AI-powered distributed libraries  (Hoefnagel Puzzle Club and Carpe Noctem Books), and developing new tools to increase knowledge (Didero). 


Alex Volfovsky (Duke)

Title: Machine learning methods for causal inference from complexobservational data

Abstract: A classical problem in causal inference is that of matchingtreatment units to control units in an observational dataset. This problem isdistinct from simple estimation of treatment effects as it provides additionalpractical interpretability of the underlying causal mechanisms that is notavailable without matching. Some of the main challenges in developing matchingmethods arise from the tension among (i) inclusion of as many relevantcovariates as possible in defining the matched groups, (ii) having matchedgroups with enough treated and control units for a valid estimate of averagetreatment effect in each group, (iii) computing the matched groups efficientlyfor large datasets, and (iv) dealing with complicating factors such asnon-independence among units. Many matching methods require expert input intothe choice of distance metric that guides which covariates to match on and howto match on them. This task becomes impractical for modern electronic healthrecord and large online social network data simply because humans are notnaturally adept at constructing high dimensional functions manually. We proposethe Almost Matching Exactly (AME) framework to tackle these problems forcategorical covariates. At its core this framework proposes an optimizationobjective for match quality that captures covariates that are integral formaking causal statements while encouraging as many matches as possible. Wedemonstrate that this framework is able to construct good matched groups onrelevant covariates and leverage these high quality matches to estimateconditional average treatment effects (CATEs) in the study of the effects of amother’s smoking status on pregnancy outcomes. We further extend themethodology to incorporate continuous and other complex covariates.


Michael Daniels (U Florida)

Title: Bayesian nonparametrics for causal inference with multiple mediators

Abstract: We introduce an approach for causal mediation with multiple mediators.  We model the observed data distribution using a new Bayesian nonparametric approach that allows for flexible default specifications for the distribution of the outcome and the mediators conditional on mediator/outcome confounders. We briefly explore the properties of this specification and introduce assumptions that allow for the identification of direct and both joint and individual indirect effects. We use this approach to examine the effect of antibiotics as mediators of the relationship between bacterial community dominance and ventilator associated pneumonia.  We then outline remaining work and extensions of this approach.

Joint work with Samrat Roy (UF) and Jason Roy (Rutgers) and Brendan Kelly (UPENN)


Rina Foygel Barber (U Chicago)

Title: Is distribution-free inference possible for binary regression?

Abstract: For a regression problem with a binary label response, we examine the problem of constructing confidence intervals for the label probability conditional on the features. In a setting where we do not have any information about the underlying distribution, we would ideally like to provide confidence intervals that are distribution-free—that is, valid with no assumptions on the distribution of the data. Our results establish an explicit lower bound on the length of any distribution-free confidence interval, and construct a procedure that can approximately achieve this length. In particular, this lower bound is independent of the sample size and holds for all distributions with no point masses, meaning that it is not possible for any distribution-free procedure to be adaptive with respect to any type of special structure in the distribution.