Statistics Seminar Series

Choose which semester to display:

Schedule for Fall 2021

All talks are available online, via Zoom. Select talks take place in hybrid mode. In-person participation is only available to Columbia affiliates with building access.

Seminars are on Mondays
Time: 4:00pm - 5:00pm
Zoom Link:



Serena Ng (Columbia)

Title: Factor Based Imputation of Missing Values (joint work with Jushan Bai)

Abstract: This paper proposes an imputation procedure that uses the factors estimated from a tall block along with the re-rotated loadings estimated from a wide block to impute missing values in a panel of data. Assuming that a strong factor structure holds for the full panel of data and its sub-blocks, it is shown that the common component can be consistently estimated at four different rates of convergence without requiring regularization or iteration. An asymptotic analysis of the estimation error is obtained. An application of our analysis is estimation of counterfactuals when potential outcomes have a factor structure. We study the estimation of average and individual treatment effects on the treated and establish a normal distribution theory that can be useful for hypothesis testing.



Lester Mackey (Microsoft Research)

Title: Kernel Thinning and Stein Thinning

Abstract: This talk will introduce two new tools for summarizing a probability distribution more effectively than independent sampling or standard Markov chain Monte Carlo thinning:

  1. Given an initial n point summary (for example, from independent sampling or a Markov chain), kernel thinning finds a subset of only square-root n points with comparable worst-case integration error across a reproducing kernel Hilbert space.
  2. If the initial summary suffers from biases due to off-target sampling, tempering, or burn-in, Stein thinning simultaneously compresses the summary and improves the accuracy by correcting for these biases.

These tools are especially well-suited for tasks that incur substantial downstream computation costs per summary point like organ and tissue modeling in which each simulation consumes 1000s of CPU hours. 



Ricardo Masini (Princeton)


Abstract: Factor and sparse models are two widely used methods to impose a low-dimensional structure in high-dimension. They are seemingly mutually exclusive. We propose a lifting method that combines the merits of these two models in a supervised learning methodology that allows to efficiently explore all the information in high-dimensional datasets. The method is based on a flexible model for high-dimensional panel data, called factor-augmented regression (FarmPredict) model with both observable or latent common factors, as well as idiosyncratic components. This model not only includes both principal component (factor) regression and sparse regression as specific models but also significantly weakens the cross-sectional dependence and hence facilitates model selection and interpretability. The methodology consists of three steps. At each step, the remaining cross-section dependence can be inferred by a novel test for covariance structure in high-dimensions. We developed asymptotic theory for the FarmPredict model and demonstrated the validity of the multiplier bootstrap for testing high-dimensional covariance structure.This is further extended to testing high-dimensional partial covariance structures.The theory is supported by an simulation study and applications to the construction of a partial covariance network of the financial returns and a prediction exercise for a large panel of macroeconomic time series from FRED-MD database.


Pierre Bellec (Rutgers)







Zongming Ma (UPenn)

Title: Community Detection in Multilayer Networks

Abstract: In this talk, we discuss community detection in a stylized yet informative inhomogeneous multilayer network model. In our model, layers are generated by different stochastic block models, the community structures of which are (random) perturbations of a common global structure while the connecting probabilities in different layers are not related. Focusing on the symmetric two block case, we establish minimax rates for both global estimation of the common structure and individualized estimation of layer-wise community structures. Both minimax rates have sharp exponents. In addition, we provide an efficient algorithm that is simultaneously asymptotic minimax optimal for both estimation tasks under mild conditions. The optimal rates depend on the parity of the number of most informative layers, a phenomenon that is caused by inhomogeneity across layers. This talk is based on a joint work with Shuxiao Chen and Sifan Liu. Time permitting, we shall also discuss community detection in multilayer networks with covariates.





Professor Dylan Small (University of Pennsylvania)

Testing an Elaborate Theory of a Causal Hypothesis

When R.A. Fisher was asked what can be done in observational studies to clarify the step from association to causation, he replied, “Make your theories elaborate” -- when constructing a causal hypothesis, envisage as many different consequences of its truth as possible and plan observational studies to discover whether each of these consequences is found to hold.  William Cochran called “this multi-phasic attack…one of the most potent weapons in observational studies.”  Statistical tests for the various pieces of the elaborate theory help to clarify how much the causal hypothesis is corroborated. In practice, the degree of corroboration of the causal hypothesis has been assessed by a verbal description of which of the several tests provides evidence for which of the several predictions. This verbal approach can miss quantitative patterns.  We develop a quantitative approach to making statistical inference about the amount of the elaborate theory that is supported by evidence.  

This is joint work with Bikram Karmakar.