Statistics Seminar Series – Spring 2016

Schedule for Spring 2016

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, Room 1025

Cheese and Wine will be served after the seminar at 5:10 PM, Room 1025

For an archive of past seminars, please click here.



*Time: 12noon – 1:00pm

Dr. Lizhen Lin, University of Austin, Texas

Title: Robust and scalable inference using median posteriors

Abstract. While theoretically justified and computationally efficient point estimators were developed in robust estimation for many problems, robust Bayesian analogues are not sufficiently well-understood. We propose a novel approach to Bayesian analysis that is provably robust to the presence of outliers in the data, and often has noticeable computational advantages over standard methods. Our approach is based on the idea of splitting the data into several non-overlapping subsets, evaluating the posterior distribution given each subset data, and then combining the resulting subset posterior measures by taking the geometric medians. The resulting final measure is called the median posterior which is the ultimate object used for inference. We show several strong theoretical results for the median posterior, including concentration rates and provable robustness. We illustrate and validate the method through experiments on simulated and real data.



*Time: 12:00 – 1:00pm

Jennifer Hill (NYU)

“Nonparametric Causal Inference and Its Contribution to Applied Statistics”

This talk will explore the potential for contributions from nonparametric statistics (and machine learning) to causal inference. Two specific causal inference scenarios will be examined in which researchers can capitalize on advances in our ability to flexibly model response surfaces using approaches at the intersection of Statistics and Machine Learning. The first scenario explores sensitivity to unobserved confounding and proposes a specific solution that relies on a Bayesian nonparametric fitting algorithm. The second scenario proposes the idealistic notion of a Causal Inference Machine that would permit estimation of point-in-time causal effects in the absence of randomization via one strategy that would work in a wide variety of circumstances. Here we consider tradeoffs between several methods (notably BART and Gaussian processes) with regard to performance, computational efficiency, flexibility, and ease of use. The talk will be embedded in a discussion of the role statisticians can play in advancing science, both through consulting on specific projects and through developing user-friendly software that can be used without the aid of a PhD statistician by researchers “in the trenches”.



*Time: 1:30 – 2:30pm

Weijie Su (Stanford University)

“Multiple Testing and Adaptive Estimation via the Sorted L-One Norm”

Abstract: In many real-world statistical problems, we observe a large number of potentially explanatory variables of which a majority may be irrelevant. For this type of problem, controlling the false discovery rate (FDR) guarantees that most of the discoveries are truly explanatory and thus replicable. In this talk, we propose a new method named SLOPE to control the FDR in sparse high-dimensional linear regression. This computationally efficient procedure works by regularizing the fitted coefficients according to their ranks: the higher the rank, the larger the penalty. This is analogous to the Benjamini-Hochberg procedure, which compares more significant p-values with more stringent thresholds. Whenever the columns of the design matrix are not strongly correlated, we show empirically that SLOPE obtains FDR control at a reasonable level while offering substantial power.

Although SLOPE is developed from a multiple testing viewpoint, we show the surprising result that it achieves optimal squared errors under Gaussian random designs over a wide range of sparsity classes. An appealing feature is that SLOPE does not require any knowledge of the degree of sparsity. This adaptivity to unknown sparsity has to do with the FDR control, which strikes the right balance between bias and variance. The proof of this result presents several elements not found in the high-dimensional statistics literature.


Dr. Veronika Rockova (University of Pennsylvania)

“Fast Bayesian Factor  Analysis via Automatic Rotations  to Sparsity”

Rotational post-hoc transformations have traditionally  played a key role in enhancing the interpretability of factor analysis. Regularization methods also serve to achieve this goal by prioritizing sparse loading matrices. In this work, we bridge these two paradigms with a unify- ing Bayesian framework. Our approach deploys intermediate factor rotations throughout the learning process, greatly enhancing the effectiveness of sparsity inducing priors. These auto- matic rotations to sparsity are embedded within a PXL-EM  algorithm, a Bayesian variant of parameter-expanded EM for posterior mode detection. By iterating between soft-thresholding of small factor loadings and transformations of the factor basis, we obtain (a) dramatic accel- erations, (b) robustness against poor initializations  and (c) better oriented sparse solutions. To avoid the pre-specification of the factor cardinality, we extend the loading matrix to have infinitely  many columns with the Indian Buffet Process (IBP)  prior.  The factor dimension- ality  is learned from the posterior, which is shown to concentrate on sparse matrices. Our deployment of PXL-EM performs a dynamic posterior exploration, outputting a solution path indexed by a sequence of spike-and-slab priors. For accurate recovery of the factor loadings, we deploy the Spike-and-Slab LASSO prior, a two-component refinement of the Laplace prior (Rockova 2015). A companion criterion, motivated as an integral lower bound, is provided to effectively select the best recovery. The potential of the proposed procedure is demonstrated on both simulated and real high-dimensional  gene expression data, which would render posterior simulation impractical.



*Time:12:00 – 1:00

Dr. Stefan Wager (Stanford University)

“Statistical Estimation with Random Forests”

Random forests, introduced by Breiman (2001), are among the most widely used machine learning algorithms today, with applications in fields as varied as ecology, genetics, and remote sensing. Random forests have been found empirically to fit complex interactions in high dimensions, all while remaining strikingly resilient to overfitting. In principle, these qualities ought to also make random\ forests good statistical estimators. However, our current understanding of the statistics of random forest predictions is not good enough to make random forests usable as a part of a standard applied statistics pipeline: in particular, we lack robust consistency guarantees and asymptotic inferential tools. In this talk, I will present some recent results that seek to overcome these limitations.

The first half of the talk develops a Gaussian theory for random forests in low dimensions that allows for valid asymptotic inference, and applies the resulting methodology to the problem of heterogeneous treatment effect estimation. The second half of the talk then considers high-dimensional properties of regression trees and forests in a setting motivated by the work of Berk et al. (2013) on valid post-selection inference; at a high level, we find that the amount by which a random forest can overfit to training data scales only logarithmically in the ambient dimension of the problem.

This talk is based on joint work with Susan Athey, Brad Efron, Trevor Hastie, and Guenther Walther


Lester Mackey (Stanford University)

“Measuring Sample Quality with Stein’s Method”

To improve the efficiency of Monte Carlo estimation, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. The reasoning is sound: a reduction in variance due to more rapid sampling can outweigh the bias introduced. However, the inexactness creates new challenges for sampler and parameter selection, since standard measures of sample quality like effective sample size do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein’s method that quantifies the maximum discrepancy between sample and target expectations over a large class of test functions. We use our tool to compare exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.


Philippe Rigollet (MIT)

TITLE: The statistical price for computational efficiency.

ABSTRACT: With the explosion of the size of data, computation has become an integral part of statistics. Convex relaxations for example have been successfully employed to derive efficient procedures with provably optimal statistical guarantees. Unfortunately, computational efficiency sometimes comes at an inevitable statistical cost. Therefore, one needs to redefine optimality among computationally efficient procedures.
Using tools from information theory and computational complexity, we quantify this cost in the context of two models:
(i) the multi-armed bandit problem, and
(ii) sparse principal component analysis
[Based on joint work with Q. Berthet, S. Chassang, V. Perchet and E. Snowberg]


Yen-Chi Chen (Carnegie Mellon)

Statistical Inference using Geometric Features

In many scientific studies, researchers are interested in geometric structure in the underlying density function. Common examples are local modes, ridges, and level sets. In this talk, I will focus on two geometric structures: density ridges and modal regression. Density ridges are curve-like structures characterizing high density regions. I will first describe statistical models for ridges and then discuss their asymptotic theory and methods for constructing confidence sets. I will also show applications to astronomy. Modal regression is an alternative way to study the conditional structure of the response variable given covariates. Instead of estimating the conditional expectation, modal regression focuses on conditional local modes. I will present several useful statistical properties for modal regression, including asymptotic theory, confidence sets, prediction sets, and clustering.


Nozer Singpurwalla (City University of Hong Kong)

“Feynman’s Foibles on the Concept of Probability in Quantum Mechanics.”

In the 1951 Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, the Noble Prize winning physicist Richard Feynman made the striking claim the Bayes/Laplace laws of probability are unable to address problems of uncertainty in quantum physics. This claim, made in the presence of many a distinguished member of our profession, appears to have been unchallenged and continues to be upheld by the few physicists that I interact with.

This is an expository/ introductory talk, with time devoted to the famous double split experiment which gave birth to the view that probability was at the heart of quantum mechanics. We will articulate Feynman’s arguments which lead him to question classical probability and to propose Born’s Rule as an alternative. We will then present criticisms of Feynman’s arguments by Koopman (of Columbia) and Suppes (of Stanford). In the sequel, we will propose an approach which entails taking random mixtures of probability distributions to produce results similar to those given by Born’s Rule. The caveat here is that now one needs to engage with negative probabilities, a concept anticipated by Feynman, and endorsed by Bartlett and Czekeley. The aim of this talk is to raise awareness to these and related topics and to generate a dialogue on their merits.


Dimitris Politis (University of California, San Diego)

“Model-free prediction for stationary and nonstationary time series”

The Model-free Prediction Principle of Politis (2013, 2015) gives an alternative approach to inference as a whole, including point and interval prediction and estimation. Its application to time series analysis gives new insights to old problems, e.g., a novel approxima- tion to the optimal linear one-step-ahead predictor of a stationary time series, but also helps address emerging issues involving nonstationary data. Examples of the latter include point pre- dictors and prediction intervals for locally stationary time series, and volatility prediction for time-varying ARCH/GARCH  processes.

Keywords.  Optimal prediction; Prediction intervals.


Radu Craiu (University of Toronto)

“Adaptive Strategies for Component-Wise Metropolis-Hastings”

Adaptive ideas within the MCMC universe are ubiquitous. However, proving the validity of an adaptive MCMC sampler can be a complicated affair.

Recent theoretical developments that will be briefly discussed have simplified this latter task. We propose a novel adaptive MCMC for component-wise Metropolis-Hastings using a simple and intuitive design that relies on the Multiple-try Metropolis transition kernel. This Adaptive Component-Wise Multiple-try Metropolis is compared with other existing adaptive CMH samplers using numerical experiments.\

This is joint work with Jinyoung Yang and Jeffrey Rosenthal.


Marina Meila (University of Washinton, Seattle)

“(Bayesian) Statistics with Rankings”

How do we do “statistics as usual” when data comes in the form of permutations, partial rankings, or other objects with rich combinatorial structure?  I will start with the Mallows model, an exponential family model based on counting inversions, and I will describe how this flexible model can be adapted to data of various kinds (partial rankings, infinite rankings, signed permutations).

Some highlights will be results enabling practical Bayesian nonparameteric modeling over sets of top-t rankings, and algorithms for discovering the structure of preferences in a population. With these, we were able to model the Sushi preferences in a sample of 5000 Japanese respondents, and study the population of all college applicants in Ireland in 2000, in which over 40,000 individuals expressed preferences over more than 500 degree programs.

Joint work with: Chris Meek, Harr Chen, Raman Arora, Alnur Ali, Bhushan Mandhani, Le Bao, Kapil Phadnis, Arthur Patterson, and Jeff Bilmes


ACC (Ton) Coolen (King’s College, London)

“Towards a theory of overfitting in proportional hazards regression for survival data.”


Overfitting is a growing problem in survival analysis. While modern medicine presents us with epidemiological data of unprecedented dimensionality, even for Cox’s proportional hazards method, still the main work horse of medical statisticians, one finds in literature only rules of thumb on the minimum ratio samples/covariates that is required to prevent overfitting from invalidating regression outcomes. The standard error quantifiers in Cox regression (p-values, z-scores, etc) are blind to overfitting. As a consequence, clinical outcome prediction from high dimensional (e.g. genomic)  covariates continues to rely mostly on the use of so-called `signatures’, which are poor man’s substitutes for regression. In this seminar I present a new line of research that aims to develop a quantitative theory of overfitting in Cox-type regression models. It is based on replica analysis, a mathematical methodology that has been used successfully for several decades to model complex many-variable problems in physics, biology, and computer science.


Yazhen Wang (University of Wisconsin)

“Quantum Computation and Statistics”

Quantum computation and quantum information are the marriage outcome of quantum physics and information theory. They will likely lead to a new wave of technological innovations in communication, computation and cryptography. Quantum computation performs calculations by using quantum devices instead of electronic devices following classical physics and used by classical computers. As the theory of quantum physics is fundamentally stochastic, randomness and uncertainty are deeply rooted in quantum computation and quantum information. Thus statistics can play an important role in quantum computation, which in turn may offer great potential to revolutionize statistical computing and inference. This talk will first give a brief introduction to quantum computation and quantum information and then present my recent work on (i) quantum tomography and its connection with matrix completion and compressed sensing, (ii) annealing based quantum computing and its relationship with Markov chain Monte Carlo simulations, (iii) statistical analysis of quantum annealing for large scale quantum computing data.


*CANCELLED – Joseph Verducci (Ohio State University)

“Discovery of Subpopulations that Support Association”

Since copulas distill the essence of association between continuous random variables, it is natural to start with these when searching for hidden subpopulations that support association.  Recently Mukhergee (2016) has shown how exponential families of ranking models asymptote to copulas, with distance metrics determining the ultimate connection. Components of these metrics display some remarkable properties for data sampled from Gaussian and Frank copulas, and these prove to provide an elegant method for discovering association-supporting subpopulations.  In particular, the components of Kendall’s tau metric lead to a tau-path method for discovery that has already had successful applications in drug-discovery, marketing and finance.  This talk examines the tau-path method for testing and screening, including reduction of its computational complexity for large samples and performance measures for screening.