Statistics Seminar – Spring 2020

Schedule for Spring 2020

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, 10th Floor Lounge SSW

Cheese and Wine reception will follow the seminar at 5:10 PM in the 10th Floor Lounge SSW

For an archive of past seminars, please click here.



Anne van Delft (Ruhr University Bochum, Germany)

“Spectral domain-based inference for (nonstationary) function-valued time series”

“My research focuses on the development of theory and methodology for
(nonstationary) function-valued time series. In the analysis of
function-valued time series, the objects of interest are ordered
collections of random variables $(X_t \colon t\in \mathbb{Z})$ where
each $X_t$ takes values in some function space. This setting allows us
to model and extract information from modern data sets of which
measurements are taken almost continuously on their domain of
definition. Examples are omnipresent and can for example be found in
brain imaging analysis, climatology and economics. The structure of this
type of data is however complex and therefore poses new challenges.
Moreover, many random phenomena that we encounter in practice exhibit
changes in their probabilistic structure over time which should be take
into account. In order to capture the (time-dependent) characteristics
and dominant modes of variation, the spectral domain plays a key role.
In this talk, I will provide an overview of my research in which I
highlight an approach to draw inferences on the temporal dependence
structure and illustrate it with an application to real data.”



Time: 11:40 am – 12:40pm

Location: Room 903 SSW

Yuqi Gu (University of Michigan)

“Uncover Hidden Fine-Gained Scientific Information: Structured Latent Attribute Models.”


In modern psychological and biomedical research with diagnostic purposes, scientists often formulate the key task as inferring the fine-grained latent information under structural constraints. These structural constraints usually come from the domain experts’ prior knowledge or insight. The emerging family of Structured Latent Attribute Models (SLAMs) accommodate these modeling needs and have received substantial attention in psychology, education, and epidemiology.  SLAMs bring exciting opportunities and unique challenges. In particular, with high-dimensional discrete latent attributes and structural constraints encoded by a design matrix, one needs to balance the gain in the model’s explanatory power and interpretability, against the difficulty of understanding and handling the complex model structure.

In the first part of this talk, I present identifiability results that advance the theoretical knowledge of how the design matrix influences the estimability of SLAMs. The new identifiability conditions guide real-world practices of designing diagnostic tests and also lay the foundation for drawing valid statistical conclusions. In the second part, I introduce a statistically consistent penalized likelihood approach to selecting significant latent patterns in the population. I also propose a scalable computational method. These developments explore an exponentially large model space involving many discrete latent variables, and they address the estimation and computation challenges of high-dimensional SLAMs arising from large-scale scientific measurements. The application of the proposed methodology to the data from an international educational assessment reveals meaningful knowledge structure of the student population.


Dongming Huang (Harvard)

“Controlled Variable Selection with More Flexibility”, with subtitle (if applicable) “Relaxing the Assumptions of Model-X Knockoffs”


The recent model-X knocko_s method selects variables with provable and non-asymptotical error control and with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known distribution. In this talk, I will show that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as (np) parameters, where p is the dimension and n is the number of covariate samples (including unlabeled samples if available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models, conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. I will demonstrate how to do this for medium-dimensional Gaussian models, high-dimensional Gaussian graphical models, and discrete graphical models. Simulations show the new approach remains powerful under the weaker assumptions.



Time: 4:10pm – 5:00pm

Location: Room 903 SSW

Song Mei (Stanford University)

Title: Generalization error of linearized neural networks: staircase and double-descent

Abstract: Deep learning methods operate in regimes that defy the traditional statistical mindset. Despite the non-convexity of empirical risks and the huge complexity of neural network architectures, stochastic gradient algorithms can often find the global minimizer of the training loss and achieve small generalization error on test data. As one possible explanation to the training efficiency of neural networks, tangent kernel theory shows that a multi-layers neural network — in a proper large width limit — can be well approximated by its linearization. As a consequence, the gradient flow of the empirical risk turns into a linear dynamics and converges to a global minimizer. Since last year, linearization has become a popular approach in analyzing training dynamics of neural networks. However, this naturally raises the question of whether the linearization perspective can also explain the observed generalization efficacy. In this talk, I will discuss the generalization error of linearized neural networks, which reveals two interesting phenomena: the staircase decay and the double-descent curve. Through the lens of these phenomena, I will also address the benefits and limitations of the linearization approach for neural networks.



Time: 11:40 am – 12:55pm

Location: Room 903 SSW

Sean Jewell (University of Washington)

“Estimation and inference for changepoint models.”


This talk is motivated by statistical challenges that arise in the analysis of calcium imaging data, a new technology in neuroscience that makes it possible to record from huge numbers of neurons at single-neuron resolution.  In the first part of this talk,  I will consider the problem of estimating a neuron’s spike times from calcium imaging data. A simple and natural model suggests a non-convex optimization problem for this task. I will show that by recasting the non-convex problem as a changepoint detection problem, we can efficiently solve it for the global optimum using a clever dynamic programming strategy.

In the second part of this talk, I will consider quantifying the uncertainty in the estimated spike times. This is a surprisingly difficult task, since the spike times were estimated on the same data that we wish to use for inference. To simplify the discussion, I will focus specifically on the change-in-mean problem, and will consider the null hypothesis that there is no change in mean associated with an estimated changepoint. My proposed approach for this task can be efficiently instantiated for changepoints estimated using binary segmentation and its variants, L0 segmentation, or the fused lasso. Moreover, this framework allows us to condition on much less information than existing approaches, thereby yielding higher-powered tests. These ideas can be easily generalized to the spike estimation problem.

This talk will feature joint work with Toby Hocking, Paul Fearnhead, and Daniela Witten.



Aki Nishimura (UCLA)

“Bayesian sparse regression for large-scale observational healthcare analytics.”

Growing availability of large healthcare databases presents opportunities to investigate how patients’ response to treatments vary across subgroups. Even with a large cohort size found in these databases, however, low incidence rates make it difficult to identify causes of treatment effect heterogeneity among a large number of clinical covariates. Sparse regression provides a potential solution. The Bayesian approach is particularly attractive in our setting, where the signals are weak and heterogeneity across databases are substantial. Applications of Bayesian sparse regression to large-scale data sets, however, have been hampered by the lack of scalable computational techniques. We adapt ideas from numerical linear algebra and computational physics to tackle the critical bottleneck in computing posteriors under Bayesian sparse regression. For linear and logistic models, we develop the conjugate gradient sampler for high-dimensional Gaussians along with the theory of prior-preconditioning. For more general regression and survival models, we develop the curvature-adaptive Hamiltonian Monte Carlo to efficiently sample from high-dimensional log-concave distributions. We demonstrate the scalability of our method on an observational study involving n = 1,065,745 patients and p = 15,779 clinical covariates, designed to compare effectiveness of the most common first-line hypertension treatments. The large cohort size allows us to detect an evidence of treatment effect heterogeneity previously unreported by clinical trials.




Clayton Scott (University of Michigan)

“Calibrated Surrogate Losses for Adversarially Robust Classification.”

Adversarially robust classification seeks a classifier that is insensitive to adversarial perturbations
of test patterns. This problem is often formulated via a minimax objective, where the target loss
is the worst-case value of the 0-1 loss subject to a bound on the size of perturbation. Recent work
has proposed convex surrogates for the adversarial 0-1 loss, in an effort to make optimization more
tractable. In this work, we consider the question of which surrogate losses are calibrated with
respect to the adversarial 0-1 loss, meaning that minimization of the former implies minimization
of the latter. We show that no convex surrogate loss is calibrated with respect to the adversarial 0-1
loss. We further introduce a novel class of nonconvex losses and offer necessary and sufficient conditions
for losses in this class to be calibrated.



Joseph Williams (Toronto)

“Statistical Challenges in Reinforcement Learning for Dynamic Field Experimentation.”

Abstract (Statistics)

With the goal of surfacing statistical and machine learning challenges, this talk presents applications of reinforcement learning algorithms to conduct dynamically randomized A/B experiments, automatically using data to enhance digital environments for education and health. One study used an algorithm for dynamic experimentation (Thompson Sampling for multi-armed bandit problems) to enable instructors to randomize alternative explanations to learners, and automatically use the data to reweight randomization so higher rated explanations were presented more frequently to future learners. Another study continually added new arms/conditions/explanations over time, using dynamic experimentation to continually optimize. We also present results on how using dynamic experiments can complicate inference and hypothesis testing (by impacting power and type I error), and our first steps towards adapting inference techniques to dynamic experiments.

This talk raises the opportunities and challenges in using reinforcement learning for dynamic experiments. How do we adapt algorithms for dynamic experiments to appropriately trade-off enhancements to user interfaces against scientific discovery through valid inference? How do we explore which inference techniques are best suited to dynamic experiments?

Using dynamic experiments has the potential to perpetually enhance and personalize the technology people receive, while changing how social and behavioral sciences collect data in the field. But this requires solving problems and integrating techniques from a range of different areas, including both Bayesian and frequentist statistics, machine learning, a range of social & behavioral sciences, causal inference, and adaptive clinical trials.

Bio: Joseph Jay Williams is an Assistant Professor in Computer Science (and Psychology,by courtesy) at the University of Toronto, leading the Intelligent Adaptive Interventions research group. He was previously an Assistant Professor at the National University of Singapore’s School of Computing in the department of Information Systems & Analytics, a Research Fellow at Harvard’s Office of the Vice Provost for Advances in Learning, and a member of the Intelligent Interactive Systems Group in Computer Science. He completed a postdoc at Stanford University in Summer 2014, working with the Office of the Vice Provost for Online Learning and the Open Learning Initiative. He received his PhD from UC Berkeley in Computational Cognitive Science, where he applied Bayesian statistics and machine learning to model how people learn and reason. He received his B.Sc. from University of Toronto in Cognitive Science, Artificial Intelligence and Mathematics, and is originally from Trinidad and Tobago. More information about his research and papers is at


Shahin Tavakoli (Warwick)

“High-Dimensional Functional Factor Models”

Abstract: We set up theoretical foundations for high-dimensional approximate factor models for panel of functional time series (FTS).  We first establish a representation result stating that if the first r eigenvalues of the covariance operator of a cross-section of N FTS are unbounded as N diverges and if the (r + 1)th one is bounded, then we can represent each FTS as a sum of a common component driven by r factors, common to (almost) all the series, and a weakly cross-correlated idiosyncratic component (all the eigenvalues of the idiosyncratic covariance operator are bounded as N diverges). Our model and theory are developed in a general Hilbert space setting that allows for panels mixing functional and scalar time series. We then turn to the estimation of the factors, their loadings, and the common components. We derive consistency results in the asymptotic regime where the number N of series and the number T of time observations diverge, thus exemplifying the “blessing of dimensionality” that explains the success of factor models in the context of high-dimensional (scalar) time series.  Our results encompass the scalar case, for which they reproduce and extend, under weaker conditions, well-established results (Bai & Ng 2002).  We provide numerical illustrations and an empirical illustration on a dataset of intraday S&P100 and Eurostoxx 50 stock returns, along with their scalar overnight returns.

This is joint work with Gilles Nisol and Marc Hallin.



Spring Break


Murali Haran (Penn State) – Canceled


Alexander Volfovsky (Duke) – Canceled

Guido Imbens (Stanford) – Canceled

Heather Battey (Imperial College) – Canceled


Joseph Verducci (Ohio) – Canceled


Yuting Wei (CMU) – Canceled


Avi Feller (Berkeley) – Canceled