Statistics Seminar Series – Spring 2017

Schedule for Spring 2017

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, 10th Floor Lounge SSW

Cheese and Wine reception will follow the seminar at 5:10 PM in the 10th Floor Lounge SSW

For an archive of past seminars, please click here.

*Wednesday 1/18/17

*Special Time: 12:00

Room 903 SSW

Edgar Dobriban (Standford)

Title: ePCA: Exponential family PCA

Abstract: Many applications, such as photon-limited imaging and genomics, involve large datasets with entries from exponential family distributions. It is of interest to estimate the covariance structure and principal components of the noiseless distribution. Principal Component Analysis (PCA), the standard method for this setting, can be inefficient for non-Gaussian noise. In this talk we present ePCA, a methodology for PCA on exponential family distributions. ePCA involves the eigendecomposition of a new covariance matrix estimator, constructed in a deterministic non-iterative way using moment calculations, shrinkage, and random matrix theory.  We provide several theoretical justifications for our estimator, including the Marchenko-Pastur law in high dimensions. We illustrate ePCA by denoising single-molecule diffraction maps obtained using photon-limited X-ray free electron laser (XFEL) imaging. This is joint work with Lydia T. Liu and Amit Singer.


 Kristin Linn (University of Pennsylvania)

*Friday 2/3/17

*Special Time: 2:10 PM

Room 903 SSW

 Sam Pimentel (University of Pennsylvania)
2/6/17  Yang Chen (Harvard)

*Thursday 2/9/17

*Special Time:1:10 PM

Room 903 SSW

Maurico Sadinle (Duke)

*Friday 2/10/17

*Special Time: 2:10 PM

Room 903 SSW


Amy Willis (Cornell)

Jean Jacod (University of Paris VI)

“Modeling asset  prices:  small scale versus large scale”

A  typical model for the price of a financial asset, allowing for explicit or numerical computation of option prices, hedging, calibration, etc…  , describes the price with an horizon of months or years. In contrast,  a very active topic now is concerned with models for tick prices or order books.  The structure of the price at the microscopic level is very different from the structure of the usual (often continuous) semimartingales used at a macroscopic level. In particular the microscopic prices evolves on the tick grid, usually going up or down by one tick only.  Our  aim is to see how it is possible to reconcile the two viewpoints, using a scaling limit of tick-level price models.  We will see that  this question (going back to the thesis of Bachelier, in a sense) raises a number of non trivial questions if we want a reasonably simple microscopic model, together with a macroscopic model exhibiting stochastic volatility or jumps or a drift.

This is a joint work with Yacine  A¨ıt-Sahalia.


Jeff Goldsmith (Columbia Biostat)


Carlos Fernandez (Courant Institute)


 Spring Break

Li Ma (Duke University) 

“Fisher exact scanning for dependency”

Abstract: We introduce a method—called Fisher exact scanning (FES)—for testing and identifying variable dependency that generalizes Fisher’s exact test on 2-by-2 contingency tables to R-by-C contingency tables and continuous sample spaces. FES proceeds through scanning over the sample space using windows in the form of 2-by-2 tables of various sizes, and on each window completing a Fisher’s exact test. Based on a factorization of Fisher’s multivariate hypergeometric (MHG) likelihood into the product of the univariate hypergeometric likelihoods, we show that there exists a coarse-to-fine, sequential generative representation for the MHG model in the form of a Bayesian network, which in turn implies the mutual independence (up to deviation due to discreteness) among the Fisher’s exact tests completed under FES. This allows an exact characterization of the joint null distribution of the p-values and gives rise to an effective inference recipe through simple multiple testing procedures such as Sidak and Bonferroni corrections, eliminating the need for resampling. In addition, FES can characterize dependency through reporting significant windows after multiple testing control. The computational complexity of FES scales linearly with the sample size, which along with the avoidance of resampling makes it ideal for analyzing massive data sets. We use extensive numerical studies to illustrate the work of FES and compare it to several state-of-the-art methods for testing dependency in both statistical and computational performance. Finally, we apply FES to analyzing a microbiome data set and further investigate its relationship with other popular dependency metrics in that context.


Runze Li (Penn State)


 Max G’Sell (CMU)

“Post-selection testing for the graphical lasso and other structured problems”


The graphical lasso is often used in practice to model the dependence structure between variables.  However, inferential questions about the resulting solution have traditionally been difficult to answer.  We discuss two inferential problems in this setting: testing the significance of selected edges and testing the goodness-of-fit of the selected model as a whole.  The first problem sheds some light on the reliability of the specific edges in the graphical lasso solution, while the second has connections to sequential multiple testing and model selection.  This talk will also provide a general introduction to post-selection testing for statistical problems with structured data that goes beyond the graphical lasso.  We will highlight some of the interesting problems that arise in the general case along the way.


Hongyu Zhao (Yale)

Title: “Statistical Challenges in Analyzing and Interpreting Genome Wide Association Study Data”

Abstract: Genome-wide association study (GWAS) has been a great success in the past decade, with thousands of regions in the human genome implicated in hundreds of complex diseases. However, significant challenges remain in both identifying new risk loci and interpreting results, even for samples with tens of thousands of subjects. In this presentation, we describe our recent efforts to infer the genetic architecture of complex disease through random effects models, the development of functional annotations of the human genome, and the integrated analysis of these annotations with GWAS results. The effectiveness of our methods will be demonstrated through their applications to a large number of GWASs to identify tissues/cell types that are relevant to a specific disease, to infer shared genetic contributions to several diseases, and to improve genetic disease risk predictions. This is joint work with Jiming Jiang, Can Yang, Qiongshi Lu, Ryan Powels, Yiming Hu, Qian Wang, and others.



Carlos Carvalho (UT Austin)



This paper develops a semi-parametric Bayesian regression model for estimating heterogeneous treatment effects from observational data. Standard nonlinear regression models, which may work quite well for prediction, can yield badly biased estimates of treatment ef- fects when fit to data with strong confounding. Our Bayesian causal forests model avoids this problem by directly incorporating an estimate of the propensity function in the specification of the response model, implicitly inducing a covariate-dependent prior on the regres- sion function. This new parametrization also allows treatment heterogeneity to be regularized separately from the prognostic effect of con- trol variables, making it possible to informatively “shrink to homo- geneity”, in contrast to existing Bayesian non- and semi-parametric approaches.

Joint work with P. Richard Hahn and Jared Murray.


Tyler H. McCormick (University of Washington)


Rina Foygel Barber (UChicago)