Statistics Seminar – Fall 2019

Schedule for Fall 2019

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, 10th Floor Lounge SSW

Cheese and Wine reception will follow the seminar at 5:10 PM in the 10th Floor Lounge SSW

For an archive of past seminars, please click here.



Alberto Abadie (MIT, Department of Economics)

Title: Statistical Non-Significance in Empirical Economics

Abstract: Statistical significance is often interpreted as providing greater information than non-significance. In this article we show, however, that rejection of a point null often carries very little information, while failure to reject may be highly informative. This is particularly true in empirical contexts that are common in economics, where data sets are large and there are rarely reasons to put substantial prior probability on a point null. Our results challenge the usual practice of conferring point null rejections a higher level of scientific significance than non-rejections. Therefore, we advocate a visible reporting and discussion of non-significant results.


Jason Klasowski (Rutgers)

Title: “Path-Based Compression and Generalization for Deep ReLU Networks”

Abstract: “The ability of modern neural networks to generalize well despite having many more parameters than training samples has been a
widely studied topic in the deep learning community. A recently proposed approach for improving generalization guarantees involves showing that a
given network can be `compressed’ to a sparser network with fewer and discrete parameters. We study a path-based approach in which the
compressed network is formed from empirical counts of paths drawn at random from a Markov distribution induced by the weights of the original
network. This method leads to a generalization bound depending on the complexity of the path structure in the network. In addition, by
exploiting certain invariance properties of neural networks, the generalization bound does not depend explicitly on the intermediate
layer dimensions, allowing for very large networks. Finally, we study empirically the relationship between compression and generalization, and
find that networks that generalize well can indeed be compressed more effectively than those that do not generalize.”



Subhasis Ghosal (North Carolina State University)

“Posterior Contraction and Credible Sets for Filaments of Regression Functions”

The filament of a smooth function f consists of local maximizers of f when moving in a certain direction. The filament is an important geometrical feature of the surface of the graph of a function. It is also considered as an important lower-dimensional summary in analyzing multivariate data. There have been some recent theoretical studies on estimating filaments of a density function using a nonparametric kernel density estimator. In this talk, we consider a Bayesian approach and concentrate on the nonparametric regression problem. We study the posterior contraction rates for filaments using a finite random series of tensor products of B-splines prior on the regression function. Compared with the kernel method, this has the advantage that the bias can be better controlled when the function is smoother, which allows obtaining better rates. Under an isotropic Holder smoothness condition, we obtain the posterior contraction rate for the filament under two different metrics — a distance of separation along an integral curve, and the Hausdorff distance between sets. Moreover, we construct credible sets for the filament having an optimal size with sufficient frequentist coverage. We study the performance of our proposed method through a simulation study and apply on a dataset on California earthquakes to assess the fault-line of the maximum local earthquake intensity.

Based on joint work with my former graduate student, Dr. Wei Li, Assistant Professor, Syracuse University, New York.


Ruobin Gong (Rutgers)

“Private Data + Approximate Computation = Exact Inference”

From data collection to model building and to computation, statistical inference at every stage must reconcile with imperfections. I discuss a serendipitous result that two apparently imperfect components mingle to produce the “perfect’’ inference. Differentially private data protect individuals’ confidential information by subjecting themselves to carefully designed noise mechanisms, trading off statistical efficiency for privacy. Approximate Bayesian computation (ABC) allows for sampling from approximate posteriors of complex models with intractable likelihoods, trading off exactness for computational efficiency. Finding the right alignment between the two tradeoffs liberates one from the other, and salvages the exactness of inference in the process. A parallel result for maximum likelihood inference on private data using Monte Carlo Expectation-Maximization is also discussed.


Jeff Leek (JHU)

Data science education as an economic and public health intervention – how statisticians can lead change in the world”
The data science revolution has led to massive new opportunities in technology, medicine, and business for people with data skills. Most people who have been able to take advantage of this revolution are already well educated, white-collar workers. In this talk I will describe our effort to expand access to data science jobs to individuals from under-served populations in East Baltimore. I will show how we are combining cloud based data science technologies, high-throughput educational data, and deep, low-throughput collaboration with local non-profits to use data science education as an economic and public health intervention. I will use this project to illustrate how statisticians have a unique opportunity in this data moment to lead change in the world. 


Joshua Loftus (NYU)

“Statistical aspects of algorithmic fairness”

Abstract: The social impact of technology has recently generated a lot of work in the machine learning research community, but relatively little from statistics. Fundamental issues such as fairness, privacy, and even legal rights such as the right to an “explanation” of an automated decision can not be reduced to properties of a given dataset and learning algorithm, but must account for statistical aspects of the data generating process. In this talk I will survey some recent literature on algorithmic fairness with a focus on methods based on causal inference. One such approach, counterfactual fairness, requires that predictions or decisions be the same both in the actual world and in a counterfactual world where an individual had a different value of a sensitive attribute, such as race or gender. This approach defines fairness in the context of a causal model for the data which usually relies on untestable assumptions. The causal modeling approach is useful for thinking about the implicit assumptions or possible consequences of other definitions, and identifying key points for interventions.


Richard Nickl (Cambridge)

“Statistical guarantees for the Bayesian approach to inverse problems.”
Abstract: Bayes methods for inverse problems have become very popular in applied mathematics in the last decade after seminal work by Andrew Stuart. They provide reconstruction algorithms as well as in-built “uncertainty quantification” via Bayesian credible sets, and particularly for Gaussian priors can be efficiently implemented by MCMC methodology. For linear inverse problems, they are closely related to classical penalised least squares methods and thus not fundamentally new, but for non-linear and non-convex problems, they give genuinely distinct and computable algorithmic alternatives that cannot be studied by variational analysis or convex optimisation techniques. In this talk we will discuss recent progress in Bayesian non-parametric statistics that allows to give rigorous statistical guarantees for posterior consistency in such models, and illustrate the theory in a variety of concrete non-linear inverse problems arising with partial differential equations.


Yihong Wu (Yale)

“Spectral graph matching and regularized quadratic relaxations”

Graph matching aims at finding the vertex correspondence that maximally aligns the edge sets of two given graphs. This task amounts to solving a computationally intractable quadratic assignment problem (QAP). We propose a new spectral method, which computes the eigendecomposition of the two adjacency matrices and returns a matching based on the pairwise alignments between all eigenvectors of the first graph with all eigenvectors of the second. Each alignment is inversely weighted by the gap between the corresponding eigenvalues. This spectral method can be equivalently viewed as solving a regularized quadratic programming relaxation of the QAP. We show that for a correlated Erdos-Renyi model, this method finds the exact matching with high probability if the two graphs differ by at most a 1/polylog(n) fraction of edges, both for dense graphs and for sparse graphs with at least polylog(n) average degree. The proposed algorithm matches the state of the art of polynomial-time algorithms based on combinatorial ideas, and exponentially improves the performance of existing spectral methods that only compare top eigenvectors or eigenvectors of the same order. The analysis exploits local laws for the resolvents of sparse Wigner matrices.


Academic Holiday – No seminar


Boaz Nadler (Weizmann)

“A new method for the best subset selection problem.”


In this talk we consider the following sparse approximation or best subset selection problem: Given a response vector y and a matrix A, find a k-sparse vector x that minimizes the residual ||Ax-y||. This sparse linear regression problem, and related variants, play a key role in high dimensional statistics, machine learning, compressed sensing, signal and image processing and more. This NP-hard problem is typically solved by minimizing a relaxed objective, consisting of a data-fit term and a penalty term, for example the popular Lasso.

In this talk we focus on the non-separable trimmed lasso penalty, defined as the L_1 norm of x minus the L_1 norm of its top k entries in absolute value. We show that this penalty has several appealing theoretical properties. However, it is difficult to optimize, being non smooth. We suggest the generalized soft-min penalty, a smooth surrogate that takes into account all possible k-sparse patterns. We derive a polynomial time algorithm to compute it, which in turn yields a novel method for the best subset selection problem. Numerical simulations illustrate its competitive performance compared to current state of the art. 

Joint work with Tal Amir and Ronen Basri.

Jun Liu (Harvard)

“Knockoffs or perturbations, that is a question.”

Simultaneously finding multiple influential variables and controlling the false discovery rate (FDR) for linear regression models is a fundamental problem with a long history. Researchers recently have proposed and examined a few innovative approaches surrounding the idea of creating “knockoff” variables (like spike-ins in biological experiments) to control FDR. As opposed to creating knockoffs, a classical statistical idea is to introduce perturbations and examine the impacts. We introduce here a perturbation-based Gaussian Mirror (GM) method, which creates for each predictor variable a pair of perturbed “mirror variables” by adding and subtracting a randomly generated Gaussian random variable, and proceeds with a certain regression method, such as the ordinary least-square or the Lasso. The mirror variables naturally lead to a test statistic highly effective for controlling the FDR. The proposed GM method does not require strong conditions for the covariates, nor any knowledge of the noise level and relative magnitudes of dimension p and sample size n. We observe that the GM method is more powerful than many existing methods in selecting important variables, subject to the control of FDR especially under the case when high correlations among the covariates exist. Additionally, we further extend the method for determining important variables in general neural network models and potentially other complex models. If time permits, I will also discuss a simpler bootstrap-type  perturbation method for estimating FDRs, which is also more powerful than knockoff methods when the predictors are reasonably correlated. The presentation is based on joint work with Xing Xin, Zhigen Zhao, Chenguang Dai, Buyu Lin, Gui Yu.


Jelena Bradic(University of California San Diego)

“Stable predictions? Causality, robustness, and machine learning.”

Abstract: Machine learning has proven extremely useful as a predictive tool for a variety of applications nowadays (one could say potentially all). How do we know if these learning algorithms are identifying the desired targets? How do we equip them to be stable in the presence of unobserved confounding? Can robustness be a tool for achieving inferential tasks? This talk introduces a variety of statistical ideas that span robustness, causality, and machine learning to ensure that the learned models can be equipped with inferential guarantees (such are p-values and confidence sets). We will illustrate a few new algorithms — semi-supervised learning, dynamic treatment, quantile invariance for interventions, censored orthogonality — and showcase their theoretical as well as application properties.


Tianxi Cai (Harvard)


Julia Adela Palacios (Stanford)

Title: Tajima coalescent and statistical summaries of unlabeled genealogies


In this talk I will present the Tajima coalescent, a model on the ancestral relationships of molecular samples. This model is then used as a prior on unlabeled and ranked genealogies to infer evolutionary parameters from molecular sequence data. I will then show that conditionally on observed data and a particular mutation model, the cardinality of the hidden state space of Tajima’s genealogies is exponentially smaller than the cardinality of the hidden state space of Kingman’s genealogies. We estimate the corresponding cardinalities with sequential importance sampling. Finally, I will propose a new distance on unlabeled and ranked genealogies that allows us to compare different genealogical distributions and to summarize genealogical distributions.