Statistics Seminar – Spring 2020

Schedule for Spring 2020

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, 10th Floor Lounge SSW

Cheese and Wine reception will follow the seminar at 5:10 PM in the 10th Floor Lounge SSW

For an archive of past seminars, please click here.



Anne van Delft (Ruhr University Bochum, Germany)

“Spectral domain-based inference for (nonstationary) function-valued time series”

“My research focuses on the development of theory and methodology for
(nonstationary) function-valued time series. In the analysis of
function-valued time series, the objects of interest are ordered
collections of random variables $(X_t \colon t\in \mathbb{Z})$ where
each $X_t$ takes values in some function space. This setting allows us
to model and extract information from modern data sets of which
measurements are taken almost continuously on their domain of
definition. Examples are omnipresent and can for example be found in
brain imaging analysis, climatology and economics. The structure of this
type of data is however complex and therefore poses new challenges.
Moreover, many random phenomena that we encounter in practice exhibit
changes in their probabilistic structure over time which should be take
into account. In order to capture the (time-dependent) characteristics
and dominant modes of variation, the spectral domain plays a key role.
In this talk, I will provide an overview of my research in which I
highlight an approach to draw inferences on the temporal dependence
structure and illustrate it with an application to real data.”



Time: 11:40 am – 12:40pm

Location: Room 903 SSW

Yuqi Gu (University of Michigan)

“Uncover Hidden Fine-Gained Scientific Information: Structured Latent Attribute Models.”


In modern psychological and biomedical research with diagnostic purposes, scientists often formulate the key task as inferring the fine-grained latent information under structural constraints. These structural constraints usually come from the domain experts’ prior knowledge or insight. The emerging family of Structured Latent Attribute Models (SLAMs) accommodate these modeling needs and have received substantial attention in psychology, education, and epidemiology.  SLAMs bring exciting opportunities and unique challenges. In particular, with high-dimensional discrete latent attributes and structural constraints encoded by a design matrix, one needs to balance the gain in the model’s explanatory power and interpretability, against the difficulty of understanding and handling the complex model structure.

In the first part of this talk, I present identifiability results that advance the theoretical knowledge of how the design matrix influences the estimability of SLAMs. The new identifiability conditions guide real-world practices of designing diagnostic tests and also lay the foundation for drawing valid statistical conclusions. In the second part, I introduce a statistically consistent penalized likelihood approach to selecting significant latent patterns in the population. I also propose a scalable computational method. These developments explore an exponentially large model space involving many discrete latent variables, and they address the estimation and computation challenges of high-dimensional SLAMs arising from large-scale scientific measurements. The application of the proposed methodology to the data from an international educational assessment reveals meaningful knowledge structure of the student population.


Dongming Huang (Harvard)

“Controlled Variable Selection with More Flexibility”, with subtitle (if applicable) “Relaxing the Assumptions of Model-X Knockoffs”


The recent model-X knocko_s method selects variables with provable and non-asymptotical error control and with no restrictions or assumptions on the dimensionality of the data or the conditional distribution of the response given the covariates. The one requirement for the procedure is that the covariate samples are drawn independently and identically from a precisely-known distribution. In this talk, I will show that the exact same guarantees can be made without knowing the covariate distribution fully, but instead knowing it only up to a parametric model with as many as (np) parameters, where p is the dimension and n is the number of covariate samples (including unlabeled samples if available). The key is to treat the covariates as if they are drawn conditionally on their observed value for a sufficient statistic of the model. Although this idea is simple, even in Gaussian models, conditioning on a sufficient statistic leads to a distribution supported on a set of zero Lebesgue measure, requiring techniques from topological measure theory to establish valid algorithms. I will demonstrate how to do this for medium-dimensional Gaussian models, high-dimensional Gaussian graphical models, and discrete graphical models. Simulations show the new approach remains powerful under the weaker assumptions.



Time: 4:10pm – 5:00pm

Location: Room 903 SSW

Song Mei (Stanford University)

Title: Generalization error of linearized neural networks: staircase and double-descent

Abstract: Deep learning methods operate in regimes that defy the traditional statistical mindset. Despite the non-convexity of empirical risks and the huge complexity of neural network architectures, stochastic gradient algorithms can often find the global minimizer of the training loss and achieve small generalization error on test data. As one possible explanation to the training efficiency of neural networks, tangent kernel theory shows that a multi-layers neural network — in a proper large width limit — can be well approximated by its linearization. As a consequence, the gradient flow of the empirical risk turns into a linear dynamics and converges to a global minimizer. Since last year, linearization has become a popular approach in analyzing training dynamics of neural networks. However, this naturally raises the question of whether the linearization perspective can also explain the observed generalization efficacy. In this talk, I will discuss the generalization error of linearized neural networks, which reveals two interesting phenomena: the staircase decay and the double-descent curve. Through the lens of these phenomena, I will also address the benefits and limitations of the linearization approach for neural networks.



Time: 11:40 am – 12:55pm

Location: Room 903 SSW

Sean Jewell (University of Washington)

“Estimation and inference for changepoint models.”


This talk is motivated by statistical challenges that arise in the analysis of calcium imaging data, a new technology in neuroscience that makes it possible to record from huge numbers of neurons at single-neuron resolution.  In the first part of this talk,  I will consider the problem of estimating a neuron’s spike times from calcium imaging data. A simple and natural model suggests a non-convex optimization problem for this task. I will show that by recasting the non-convex problem as a changepoint detection problem, we can efficiently solve it for the global optimum using a clever dynamic programming strategy.

In the second part of this talk, I will consider quantifying the uncertainty in the estimated spike times. This is a surprisingly difficult task, since the spike times were estimated on the same data that we wish to use for inference. To simplify the discussion, I will focus specifically on the change-in-mean problem, and will consider the null hypothesis that there is no change in mean associated with an estimated changepoint. My proposed approach for this task can be efficiently instantiated for changepoints estimated using binary segmentation and its variants, L0 segmentation, or the fused lasso. Moreover, this framework allows us to condition on much less information than existing approaches, thereby yielding higher-powered tests. These ideas can be easily generalized to the spike estimation problem.

This talk will feature joint work with Toby Hocking, Paul Fearnhead, and Daniela Witten.



Aki Nishimura (UCLA)

“Bayesian sparse regression for large-scale observational healthcare analytics.”

Growing availability of large healthcare databases presents opportunities to investigate how patients’ response to treatments vary across subgroups. Even with a large cohort size found in these databases, however, low incidence rates make it difficult to identify causes of treatment effect heterogeneity among a large number of clinical covariates. Sparse regression provides a potential solution. The Bayesian approach is particularly attractive in our setting, where the signals are weak and heterogeneity across databases are substantial. Applications of Bayesian sparse regression to large-scale data sets, however, have been hampered by the lack of scalable computational techniques. We adapt ideas from numerical linear algebra and computational physics to tackle the critical bottleneck in computing posteriors under Bayesian sparse regression. For linear and logistic models, we develop the conjugate gradient sampler for high-dimensional Gaussians along with the theory of prior-preconditioning. For more general regression and survival models, we develop the curvature-adaptive Hamiltonian Monte Carlo to efficiently sample from high-dimensional log-concave distributions. We demonstrate the scalability of our method on an observational study involving n = 1,065,745 patients and p = 15,779 clinical covariates, designed to compare effectiveness of the most common first-line hypertension treatments. The large cohort size allows us to detect an evidence of treatment effect heterogeneity previously unreported by clinical trials.




Clayton Scott (University of Michigan)


Joseph Williams (Toronto)


Shahin Tavakoli (Warwick)

“High-Dimensional Functional Factor Models”

Abstract: We set up theoretical foundations for high-dimensional approximate factor models for panel of functional time series (FTS).  We first establish a representation result stating that if the first r eigenvalues of the covariance operator of a cross-section of N FTS are unbounded as N diverges and if the (r + 1)th one is bounded, then we can represent each FTS as a sum of a common component driven by r factors, common to (almost) all the series, and a weakly cross-correlated idiosyncratic component (all the eigenvalues of the idiosyncratic covariance operator are bounded as N diverges). Our model and theory are developed in a general Hilbert space setting that allows for panels mixing functional and scalar time series. We then turn to the estimation of the factors, their loadings, and the common components. We derive consistency results in the asymptotic regime where the number N of series and the number T of time observations diverge, thus exemplifying the “blessing of dimensionality” that explains the success of factor models in the context of high-dimensional (scalar) time series.  Our results encompass the scalar case, for which they reproduce and extend, under weaker conditions, well-established results (Bai & Ng 2002).  We provide numerical illustrations and an empirical illustration on a dataset of intraday S&P100 and Eurostoxx 50 stock returns, along with their scalar overnight returns.

This is joint work with Gilles Nisol and Marc Hallin.



Spring Break


Murali Haran (Penn State)


Alexander Volfovsky (Duke)

Guido Imbens (Stanford)

Heather Battey (Imperial College)


Joseph Verducci (Ohio)


Yuting Wei (CMU)


Avi Feller (Berkeley)