Statistics Seminar – Fall 2021

Schedule for Fall 2021

All talks are available online, via Zoom. Select talks take place in hybrid mode. In-person participation is only available to Columbia affiliates with building access.

Seminars are on Mondays
Time: 4:00pm – 5:00pm
Zoom Link: https://columbiauniversity.zoom.us/s/99255805560

9/13/21

 

Serena Ng (Columbia)

Title: Factor Based Imputation of Missing Values (joint work with Jushan Bai)

Abstract: This paper proposes an imputation procedure that uses the factors estimated from a tall block along with the re-rotated loadings estimated from a wide block to impute missing values in a panel of data. Assuming that a strong factor structure holds for the full panel of data and its sub-blocks, it is shown that the common component can be consistently estimated at four different rates of convergence without requiring regularization or iteration. An asymptotic analysis of the estimation error is obtained. An application of our analysis is estimation of counterfactuals when potential outcomes have a factor structure. We study the estimation of average and individual treatment effects on the treated and establish a normal distribution theory that can be useful for hypothesis testing.

 

9/20/21

Lester Mackey (Microsoft Research)

Title: Kernel Thinning and Stein Thinning

Abstract: This talk will introduce two new tools for summarizing a probability distribution more effectively than independent sampling or standard Markov chain Monte Carlo thinning:

  1. Given an initial n point summary (for example, from independent sampling or a Markov chain), kernel thinning finds a subset of only square-root n points with comparable worst-case integration error across a reproducing kernel Hilbert space.
  2. If the initial summary suffers from biases due to off-target sampling, tempering, or burn-in, Stein thinning simultaneously compresses the summary and improves the accuracy by correcting for these biases.

These tools are especially well-suited for tasks that incur substantial downstream computation costs per summary point like organ and tissue modeling in which each simulation consumes 1000s of CPU hours. 

 

9/27/21

Ricardo Masini (Princeton)
 

Title: BRIDGING FACTOR AND SPARSE MODELS

Abstract: Factor and sparse models are two widely used methods to impose a low-dimensional structure in high-dimension. They are seemingly mutually exclusive. We propose a lifting method that combines the merits of these two models in a supervised learning methodology that allows to efficiently explore all the information in high-dimensional datasets. The method is based on a flexible model for high-dimensional panel data, called factor-augmented regression (FarmPredict) model with both observable or latent common factors, as well as idiosyncratic components. This model not only includes both principal component (factor) regression and sparse regression as specific models but also significantly weakens the cross-sectional dependence and hence facilitates model selection and interpretability. The methodology consists of three steps. At each step, the remaining cross-section dependence can be inferred by a novel test for covariance structure in high-dimensions. We developed asymptotic theory for the FarmPredict model and demonstrated the validity of the multiplier bootstrap for testing high-dimensional covariance structure.This is further extended to testing high-dimensional partial covariance structures.The theory is supported by an simulation study and applications to the construction of a partial covariance network of the financial returns and a prediction exercise for a large panel of macroeconomic time series from FRED-MD database.

10/4/21

Pierre Bellec (Rutgers)

Title: Confidence interval in high-dimensional robust regression problems

Abstract: I will present some asymptotic normality and confidence interval results in linear regression problems where the dimension and sample size are comparable up to multiplicative constants (e.g., c < p/n < for constants c>0, >0), and where errors are heavy tailed or corrupted in Huber’s contamination model. Natural estimators in this context include regularized M-estimators, obtained as solution of minimization problems involving a robust loss function (to handle heavy tails or contamination) and an additive regularizing penalty (to fight the curse of dimensionality).

A typical example covered by the assumptions is the Huber loss combined with an Elastic-Net penalty. Under these assumptions, asymptotic normality of a de-biased estimate and the corresponding (1-α)-confidence interval holds uniformly over all p covariates except a finite number. The few covariates for which asymptotic normality fails to hold exhibit a “variance spike” and require a larger variance estimate. This explains, for some estimators including the Lasso, that asymptotic normality of de-biased estimates cannot hold uniformly over all coordinates unless this larger variance estimate is used. Time permitting, we will see that for the square loss, minimizing the confidence interval width is equivalent to minimizing the out-of-sample error.

10/11/21

Xiaodong Li (UC Davis)

Title: Linear Polytree Structural Equation Models: Structural Learning and Inverse Correlation Estimation

Abstract:

We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Specially, under both Gaussian and sub-Gaussian models, we study the sample size conditions for the well-known Chow-Liu algorithm to exactly recover the equivalence class of the polytree, which is uniquely represented by a CPDAG. We also study the error rate for the estimation of the inverse correlation matrix under such models. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of the method when the ground truth graphical structure can only be approximated by a polytree.

 

10/18/21

Peter Craigmile (OSU)

Title: Locally Stationary Processes and their Application to Climate Modeling

Abstract: In the analysis of climate it is common to build non-stationary spatio-temporal processes, often based on assuming a random walk behavior over time for the error process. Random walk models may be a poor description for the temporal dynamics, leading to inaccurate uncertainty quantification. Likewise, after detrending, assuming stationarity in time may also not be a reasonable assumption, especially under climate change. Based on ongoing research, we present a class of time-varying processes that are stationary in space, but locally stationary in time. We demonstrate how to carefully parameterize the time-varying model parameters in terms of a transformation of basis functions.  We demonstrate how to extend this class of time series processes to modeling spatio-temporal data, applying this modeling methodology to estimating temperature trends.

This research is joint with Shreyan Ganguly.

10/25/21

Zongming Ma (UPenn)

Title: Community Detection in Multilayer Networks

Abstract: In this talk, we discuss community detection in a stylized yet informative inhomogeneous multilayer network model. In our model, layers are generated by different stochastic block models, the community structures of which are (random) perturbations of a common global structure while the connecting probabilities in different layers are not related. Focusing on the symmetric two block case, we establish minimax rates for both global estimation of the common structure and individualized estimation of layer-wise community structures. Both minimax rates have sharp exponents. In addition, we provide an efficient algorithm that is simultaneously asymptotic minimax optimal for both estimation tasks under mild conditions. The optimal rates depend on the parity of the number of most informative layers, a phenomenon that is caused by inhomogeneity across layers. This talk is based on a joint work with Shuxiao Chen and Sifan Liu. Time permitting, we shall also discuss community detection in multilayer networks with covariates.

11/1/21

Academic Holiday – No Seminar

 

11/8/21

Dylan Small (University of Pennsylvania)

Testing an Elaborate Theory of a Causal Hypothesis

When R.A. Fisher was asked what can be done in observational studies to clarify the step from association to causation, he replied, “Make your theories elaborate” — when constructing a causal hypothesis, envisage as many different consequences of its truth as possible and plan observational studies to discover whether each of these consequences is found to hold.  William Cochran called “this multi-phasic attack…one of the most potent weapons in observational studies.”  Statistical tests for the various pieces of the elaborate theory help to clarify how much the causal hypothesis is corroborated. In practice, the degree of corroboration of the causal hypothesis has been assessed by a verbal description of which of the several tests provides evidence for which of the several predictions. This verbal approach can miss quantitative patterns.  We develop a quantitative approach to making statistical inference about the amount of the elaborate theory that is supported by evidence.  

This is joint work with Bikram Karmakar. 

11/15/21

Miklos Z. Racz (Princeton University)

Title: Correlated stochastic block models: graph matching and community recovery

Abstract: 

I will discuss statistical inference problems on edge-correlated stochastic block models. We determine the information-theoretic threshold for exact recovery of the latent vertex correspondence between two correlated block models, a task known as graph matching. As an application, we show how one can exactly recover the latent communities using multiple correlated graphs in parameter regimes where it is information-theoretically impossible to do so using just a single graph.

This is based on joint work with Anirudh Sridhar. 

11/22/21

Daniel Malinsky (Columbia, Biostatistics)

Title: Semiparametric Inference for Non-monotone Missing-Not-at-Random Data: the No Self-Censoring Model
 
Abstract: We study the identification and estimation of statistical functionals of multivariate data missing non-monotonically and not-at-random, taking a semiparametric approach. Specifically, we assume that the missingness mechanism satisfies what has been previously called “no self-censoring” or “itemwise conditionally independent nonresponse,” which roughly corresponds to the assumption that no partially-observed variable directly determines its own missingness status. We show that this assumption, combined with an odds ratio parameterization of the joint density, enables identification of functionals of interest, and we establish the semiparametric efficiency bound for the nonparametric model satisfying this assumption. We propose a practical augmented inverse probability weighted estimator, and in the setting with a (possibly high-dimensional) always-observed subset of covariates, our proposed estimator enjoys a certain double-robustness property. We explore the performance of our estimator with simulation experiments and on a previously-studied data set of HIV-positive mothers in Botswana.
11/29/21

Vasilis Syrgkanis (Microsoft Research)

Title: Adversarial machine learning and instrumental variables for flexible causal modeling

Abstract:

Machine learning models are increasingly being used to automate decision-making in a multitude of domains. Making good decisions requires uncovering causal relationships from data. Many causal estimation problems reduce to estimating a model that satisfies a set of conditional moment restrictions. We develop an approach for estimating flexible models defined via conditional moment restrictions, with a prototypical application being non-parametric instrumental variable regression. We introduce a min-max criterion function, under which the estimation problem can be thought of as solving a zero-sum game between a modeler who is optimizing over the hypothesis space of the target causal model and an adversary who identifies violating moments over a test function space. We analyze the statistical estimation rate of the resulting estimator for arbitrary hypothesis spaces, with respect to an appropriate analogue of the mean squared error metric, for ill-posed inverse problems. We show that when the minimax criterion is regularized with a second moment penalty on the test function and the test function space is sufficiently rich, then the estimation rate scales with the critical radius of the hypothesis and test function spaces, a quantity which typically gives tight fast rates. Our main result follows from a novel localized Rademacher analysis of statistical learning problems defined via minimax objectives. We provide applications of our main results for several hypothesis spaces used in practice such as: reproducing kernel Hilbert spaces, high dimensional sparse linear functions, spaces defined via shape constraints, ensemble estimators such as random forests, and neural networks. For each of these applications we provide computationally efficient optimization methods for solving the corresponding minimax problem and stochastic first-order heuristics for neural networks.

Based on joint works with: Nishanth Dikkala, Greg Lewis and Lester Mackey

Bio:

Vasilis Syrgkanis is a Principal Researcher at Microsoft Research, New England, where he is co-leading the project on Automated Learning and Intelligence for Causation and Economics (ALICE). He received his Ph.D. in Computer Science from Cornell University in 2014, under the supervision of Prof. Eva Tardos and spent two years at Microsoft Research, New York as a postdoctoral researcher. His research addresses problems at the intersection of machine learning, economics and theoretical computer science. His work has received best paper awards at the 2015 ACM Conference on Economics and Computation (EC’15), the 2015 Annual Conference on Neural Information Processing Systems (NeurIPS’15) and the Conference on Learning Theory (COLT’19).

12/6/21

Gourab Mukherjee (University of Southern California)

Title: Understanding Early Adoption of Hybrid Cars via a New Multinomial Probit Model with Multiple Network Weights

Abstract: Modeling demands of durable products such as cars is challenging as we do not have repeated purchases for most customers. Under such data scarcity, information pooling across similar customers is needed to estimate a consumer’s preferences and price sensitivity more accurately. We propose a new multinomial probit model that can simultaneously accommodate various types of similarity structures by incorporating multiple weighted networks among customers. Unlike the traditional multinomial spatial probit, our model allows consumer connectedness to impact their preference and marketing mix coefficients flexibly, such that different subsets of the parameter vector can be correlated in their own unique ways. We propose a novel Monte-Carlo Expectation-Maximization (MCEM) based approach for parameter estimation that significantly enhances scalability, greatly increasing the number of consumers and choice alternatives that can be analyzed.  Our method modifies the computationally expensive E-step in the classical EM algorithm by a fast Gibbs sampling based evaluation. Further, it implements the M-step using a fast back-fitting method that iteratively fits weighted regressions based on associated similarity matrices for each concerned subset of the coefficients. We establish the convergence properties of the proposed MCEM algorithm. We present computational perspectives on the scalability of the proposed method and provide a distributed computing-based implementation. We show that a multinomial probit model based on two different similarity structures can significantly improve the prediction of customer choices. Specifically the best fitting model includes spatially contiguous weight structures on the intercepts based on the geographical distance between consumers, while cross-customer correlated coefficients are based on the similarity between the consumers’ previously owned vehicles. We demonstrate how an automobile manufacturer can leverage the revealed heterogeneous spatial contiguity effects from the estimated model to develop more effective targeted promotions to accelerate the consumer adoption of a hybrid car.

12/13/21
Rajesh Ranganath (New York University)
 

Title: Interpretability and Spurious Correlations in Deep Predictive Models

Abstract: Interpretability enriches what can be gleaned from a good predictive model. Techniques that learn-to-explain have arisen because they require only a single evaluation of a model to provide an interpretation. In the first part of this talk, I will discuss a flaw with several methods that learn-to-explain: the optimal explainer makes the prediction rather than highlighting the inputs that are useful for prediction. I will also describe an evaluation technique that can detect when the explainer makes the prediction along with a new method that learns-to-explain without this issue.

Interpretability methods have been used to reveal that predictive models often make use of nuisance variables leading to unstable performance. In the second part of my talk, I will discuss our work on representation learning for building predictive models that generalize under changing nuisance-induced spurious correlations with applications to images and chest X-rays.