Schedule for Fall 2020
The Statistics Seminar has migrated to Zoom for the Fall 2020 semester.
Seminars are on Mondays
Time: 1:00pm – 2:00pm
Zoom Link: https://columbiauniversity.zoom.us/j/99271904414
For an archive of past seminars, please click here.
9/14/20

Liam Paninski (Columbia) Title: Some open directions in neural data science Abstract: Neuroscience has rapidly moved into the realm of science fiction in the last few years (lasers! genetic engineering! glowing brains! mindreading! memory writing!), and with these advances have come an array of challenging and interesting data science problems. This talk will be an informal tour of a few of these problems, with an emphasis on open research directions.

9/21/20 
Ashwin Pananjady (UC Berkeley) Title: Flexible models for learning from people: Statistics meets computation
Abstract:
A plethora of latent variable models are used to learn from data generated by people. Specific examples include the Bradley–Terry–Luce and multinomial logit models for comparison and choice data, the Dawid–Skene model for crowdsourced question answering, and the Rasch model for categorical data that arises in psychometric analysis. In this talk, I will present a class of “permutationbased” models that borrows from the literature on sociology and economics and significantly generalizes classical approaches in these contexts, thereby improving their robustness to misspecification. The talk will focus on the mathematical statistics of fitting these models, and describe a methodological toolbox that is inspired by considerations of adaptation as well as computation. These considerations highlight connections between the theory of adaptation in nonparametric statistics and conjectures in averagecase computational complexity. The talk will present vignettes from two papers, one jointly with Cheng Mao and Martin Wainwright, and another jointly with Richard Samworth.

9/28/20 
Jose Luis Montiel Olea (Columbia)
Title: Dropout Training is Distributionally Robust Optimal
Abstract: Dropout training is an increasingly popular estimation method in machine learning that minimizes some given loss function (e.g., the negative expected loglikelihood), but averaged over nested submodels chosen at random.
This paper shows that dropout training in Generalized Linear Models is the minimax solution of a twoplayer, zerosum game where an adversarial nature corrupts a statistician’s covariates using a multiplicative nonparametric errorsinvariables model. In this game–known as a Distributionally Robust Optimization problem—nature’s least favorable distribution is dropout noise, where nature independently deletes entries of the covariate vector with some fixed probability $\delta$. Our decisiontheoretic analysis shows that dropout training—the statistician’s minimax strategy in the game—indeed provides outofsample expected loss guarantees for distributions that arise from multiplicative perturbations of insample data.
This paper also provides a novel, parallelizable, Unbiased MultiLevel Monte Carlo algorithm to speedup the implementation of dropout training. Our algorithm has a much smaller computational cost compared to the naive implementation of dropout, provided the number of data points is much smaller than the dimension of the covariate vector. This is joint work with José Blanchet, Yang Kang, Viet Nguyen, and Xuhui Zhang.

10/5/20 
Bhaswar Bhattacharya (U Penn) Title: Detection Thresholds for NonParametric Tests Based on Geometric Graphs: The Curious Case of Dimension 8 Abstract: Two of the fundamental problems in nonparametric statistical inference are goodnessoffit and twosample testing. These two problems have been extensively studied and several multivariate tests have been proposed over the last thirty years, many of which are based on geometric graphs. These include, among several others, the celebrated FriedmanRafsky twosample test based on the minimal spanning tree and the Knearest neighbor graphs, and the BickelBreiman spacings tests for goodnessoffit. These tests are asymptotically distributionfree, universally consistent, and computationally efficient (both in sample size and in dimension), making them particularly attractive for modern statistical applications. In this talk, we will derive the detection thresholds and limiting local power of these tests, thus providing a way to compare and justify the performance of these tests in various applications. Several interesting properties emerge, such as a curious phase transition in dimension 8, and a remarkable blessing of dimensionality in detecting scale changes. 
10/12/20 
Stefan Wager (Stanford) Title: Synthetic Difference in Differences Abstract: We present a new estimator for causal effects with panel data that builds on insights behind the widely used difference in differences and synthetic control methods. We find, both theoretically and in empirical studies, that this “synthetic difference in differences” estimator has desirable robustness properties relative to both difference in differences and synthetic controls, and that it performs well in settings where either of these conventional estimators are commonly used in practice. We also study the asymptotic behavior of the proposed estimator in a lowrank confounding model, and articulate conditions for consistency and asymptotic normality. joint work with Dmitry Arkhangelsky, Susan Athey, David Hirshberg and Guido Imbens 
10/19/20 
Yuting Wei (CMU) Title: Reliable hypothesis testing paradigms in high dimensions Abstract: Modern scientific discovery and decision making require the development of trustworthy and informative inferential procedures, which are particularly challenging when coping with highdimensional data. This talk presents two vignettes on the theme of reliable highdimensional inference. The first vignette considers performing inference based on the Lasso estimator when the number of covariates is of the same order or larger than the number of observations. Classical asymptotic statistics theory fails due to two fundamental reasons: (1) The regularized risk is nonsmooth; (2) The discrepancy between the estimator and the true parameter vector cannot be neglected. We pin down the distribution of the Lasso, as well as its debiased version, under a broad class of Gaussian correlated designs with nonsingular covariance structure. Our findings suggest that a careful degreeoffreedom correction is crucial for computing valid confidence intervals in this challenging regime. The second vignette investigates the ModelX knockoffs framework — a general procedure that can leverage any feature importance measure to produce a variable selection algorithm. ModelX knockoffs rely on the construction of synthetic random variables and is, therefore, random. We propose a method for derandomizing — and hence stabilizing — modelX knockoffs. By aggregating the selection results across multiple runs of the knockoffs algorithm, our method provides stable decisions without compromising statistical power. Our approach, when applied to the multistage GWAS of prostate cancer, reports locations on the genome that have been replicated with other studies. The first vignette is based on joint work with Michael Celentano and Andrea Montanari, whereas the second one is based on joint work with Zhimei Ren and Emmanuel Candes. Bio: Yuting Wei is currently an assistant professor in the Statistics and Data Science department at Carnegie Mellon University. Prior to that, she was a Stein Fellow at Stanford University, and she received her Ph.D. in statistics at University of California, Berkeley working with Martin Wainwright and Aditya Guntuboyina. She was the recipient of the 2018 Erich L. Lehmann Citation from the Berkeley statistics department for her Ph.D. dissertation in theoretical statistics. Her research interests include highdimensional and nonparametric statistics, statistical machine learning, and reinforcement learning. 
10/26/20 
Alexander Aue (UC Davis) Title: Random matrix theory aids statistical inference in high dimensions Abstract: “The first part of the talk is on bootstrapping spectral statistics in high dimensions. Spectral statistics play a central role in many multivariate testing problems. It is therefore of interest to approximate the distribution of functions of the eigenvalues of sample covariance matrices. Although bootstrap methods are an established approach to approximating the laws of spectral statistics in lowdimensional problems, these methods are relatively unexplored in the highdimensional setting. The aim of this talk is to focus on linear spectral statistics (LSS) as a class of “prototype statistics” for developing a new bootstrap method in the highdimensional setting. In essence, the method originates from the parametric bootstrap, and is motivated by the notion that, in high dimensions, it is difficult to obtain a nonparametric approximation to the full datagenerating distribution. From a practical standpoint, the method is easy to use, and allows the user to circumvent the difficulties of complex asymptotic formulas for LSS. In addition to proving the consistency of the proposed method, I will discuss encouraging empirical results in a variety of settings. Lastly, and perhaps most interestingly, simulations indicate that the method can be applied successfully to statistics outside the class of LSS, such as the largest sample eigenvalue and others. The second part of the talk briefly highlights twosample tests in high dimensions by discussing ridgeregularized generalization of Hotelling’s T^2. The main novelty of this work is in devising a method for selecting the regularization parameter based on the idea of maximizing power within a class of local alternatives. The performance of the proposed test procedures will be illustrated through an application to a breast cancer data set where the goal is to detect the pathways with different DNA copy number alterations across breast cancer subtypes.” 
11/2/20 
Academic Holiday – No Seminar 
11/9/20 
Sebastian Engelke (University of Geneva) Title: Gradient boosting for extreme quantile regression Abstract: This is joint work with Jasper Velthoen, Clement Dombry and JuanJuan Cai. 
11/16/20 
Speaker: Mathias Drton, Technical University of Munich, Germany Title: Score matching for graphical models Abstract: A common challenge in estimation of parameters of multivariate probability density functions is the intractability of the normalizing constant. For continuous data, the score matching method of Hyvärinen (2005) provides a way to circumvent this issue and is particularly convenient for graphical modeling. In this talk I will present regularized score matching methods for highdimensional and possibly nonGaussian graphical models. In particular, I will discuss generalizations of score matching for observations that are nonnegative or otherwise constrained in their support. 
11/23/20 
Maya Gupta (Didero, CEO/Founder) Title: Shape Constraints Make ML Smarter, Fairer, and More Robust Abstract: A classic shape constraint is monotonicity, which forces a model’s output to only increase if a specific input increases, and can be imposed on onedimensional functions by isotonic regression. Newer shape constraints are diminishing returns and unimodality, and most recently, multidimensional shape constraints like that two inputs are complements, or that one input dominates another input. We will show how shape constraints are important for capturing prior knowledge, imposing deontological ethics, and making models robust to distribution shift. We’ll show how to fit arbitrarily flexible functions with shape constraints, using lattice models, which are linear splines over regular grids, and that lattices can be composed as layers in deep lattice networks. Google’s opensource Tensor Flow library TF Lattice makes it easy to design and build deep lattice networks and hybrid neural networks that satisfy these shape constraints. Short Bio: Maya Gupta is a researcher and entrepreneur. From 20132020, she led the Glassbox Machine Learning R&D team at Google Research, developing and deploying new ideas in constrained machine learning to make products more accurate, interpretable, safe, and fair. Gupta was an Associate Professor of Electrical Engineering at the University of Washington from 20032012, where she received the PECASE (presidential early career award for scientists and engineers), and Office of Naval Research Young Investigator Award for her work in sonar statistical signal processing. Gupta received her PhD in EE from Stanford in 2003, a BS EE and BA Econ from Rice University in 1997. Gupta is founder and CEO of 6 companies; her current focus is on building AIpowered distributed libraries (Hoefnagel Puzzle Club and Carpe Noctem Books), and developing new tools to increase knowledge (Didero). 
11/30/20 
Alex Volfovsky (Duke) Title: Machine learning methods for causal inference from complexobservational data Abstract: A classical problem in causal inference is that of matchingtreatment units to control units in an observational dataset. This problem isdistinct from simple estimation of treatment effects as it provides additionalpractical interpretability of the underlying causal mechanisms that is notavailable without matching. Some of the main challenges in developing matchingmethods arise from the tension among (i) inclusion of as many relevantcovariates as possible in defining the matched groups, (ii) having matchedgroups with enough treated and control units for a valid estimate of averagetreatment effect in each group, (iii) computing the matched groups efficientlyfor large datasets, and (iv) dealing with complicating factors such asnonindependence among units. Many matching methods require expert input intothe choice of distance metric that guides which covariates to match on and howto match on them. This task becomes impractical for modern electronic healthrecord and large online social network data simply because humans are notnaturally adept at constructing high dimensional functions manually. We proposethe Almost Matching Exactly (AME) framework to tackle these problems forcategorical covariates. At its core this framework proposes an optimizationobjective for match quality that captures covariates that are integral formaking causal statements while encouraging as many matches as possible. Wedemonstrate that this framework is able to construct good matched groups onrelevant covariates and leverage these high quality matches to estimateconditional average treatment effects (CATEs) in the study of the effects of amother’s smoking status on pregnancy outcomes. We further extend themethodology to incorporate continuous and other complex covariates. 
12/7/20 
Michael Daniels (U Florida) Title: Bayesian nonparametrics for causal inference with multiple mediators Abstract: We introduce an approach for causal mediation with multiple mediators. We model the observed data distribution using a new Bayesian nonparametric approach that allows for flexible default specifications for the distribution of the outcome and the mediators conditional on mediator/outcome confounders. We briefly explore the properties of this specification and introduce assumptions that allow for the identification of direct and both joint and individual indirect effects. We use this approach to examine the effect of antibiotics as mediators of the relationship between bacterial community dominance and ventilator associated pneumonia. We then outline remaining work and extensions of this approach.
Joint work with Samrat Roy (UF) and Jason Roy (Rutgers) and Brendan Kelly (UPENN)

12/14/20 
Rina Foygel Barber (U Chicago)
Title: Is distributionfree inference possible for binary regression? Abstract: For a regression problem with a binary label response, we examine the problem of constructing confidence intervals for the label probability conditional on the features. In a setting where we do not have any information about the underlying distribution, we would ideally like to provide confidence intervals that are distributionfree—that is, valid with no assumptions on the distribution of the data. Our results establish an explicit lower bound on the length of any distributionfree confidence interval, and construct a procedure that can approximately achieve this length. In particular, this lower bound is independent of the sample size and holds for all distributions with no point masses, meaning that it is not possible for any distributionfree procedure to be adaptive with respect to any type of special structure in the distribution.
