Statistics Seminar Series – Spring 2017

Schedule for Spring 2017

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, 10th Floor Lounge SSW

Cheese and Wine reception will follow the seminar at 5:10 PM in the 10th Floor Lounge SSW

For an archive of past seminars, please click here.

*Wednesday 1/18/17

*Special Time: 12:00

Room 903 SSW

Edgar Dobriban (Standford)

Title: ePCA: Exponential family PCA

Abstract: Many applications, such as photon-limited imaging and genomics, involve large datasets with entries from exponential family distributions. It is of interest to estimate the covariance structure and principal components of the noiseless distribution. Principal Component Analysis (PCA), the standard method for this setting, can be inefficient for non-Gaussian noise. In this talk we present ePCA, a methodology for PCA on exponential family distributions. ePCA involves the eigendecomposition of a new covariance matrix estimator, constructed in a deterministic non-iterative way using moment calculations, shrinkage, and random matrix theory.  We provide several theoretical justifications for our estimator, including the Marchenko-Pastur law in high dimensions. We illustrate ePCA by denoising single-molecule diffraction maps obtained using photon-limited X-ray free electron laser (XFEL) imaging. This is joint work with Lydia T. Liu and Amit Singer.


 Kristin Linn (University of Pennsylvania)

*Friday 2/3/17

*Special Time: 2:10 PM

Room 903 SSW

 Sam Pimentel (University of Pennsylvania)
2/6/17  Yang Chen (Harvard)

*Thursday 2/9/17

*Special Time:1:10 PM

Room 903 SSW

Maurico Sadinle (Duke)

*Friday 2/10/17

*Special Time: 2:10 PM

Room 903 SSW


 Amy Willis (Cornell)

Jean Jacod (University of Paris VI)

“Modeling asset  prices:  small scale versus large scale”

A  typical model for the price of a financial asset, allowing for explicit or numerical computation of option prices, hedging, calibration, etc…  , describes the price with an horizon of months or years. In contrast,  a very active topic now is concerned with models for tick prices or order books.  The structure of the price at the microscopic level is very different from the structure of the usual (often continuous) semimartingales used at a macroscopic level. In particular the microscopic prices evolves on the tick grid, usually going up or down by one tick only.  Our  aim is to see how it is possible to reconcile the two viewpoints, using a scaling limit of tick-level price models.  We will see that  this question (going back to the thesis of Bachelier, in a sense) raises a number of non trivial questions if we want a reasonably simple microscopic model, together with a macroscopic model exhibiting stochastic volatility or jumps or a drift.

This is a joint work with Yacine  A¨ıt-Sahalia.


Jeff Goldsmith (Columbia Biostat)


Carlos Fernandez (Courant Institute)


 Spring Break

Li Ma (Duke University) 

“Fisher exact scanning for dependency”

Abstract: We introduce a method—called Fisher exact scanning (FES)—for testing and identifying variable dependency that generalizes Fisher’s exact test on 2-by-2 contingency tables to R-by-C contingency tables and continuous sample spaces. FES proceeds through scanning over the sample space using windows in the form of 2-by-2 tables of various sizes, and on each window completing a Fisher’s exact test. Based on a factorization of Fisher’s multivariate hypergeometric (MHG) likelihood into the product of the univariate hypergeometric likelihoods, we show that there exists a coarse-to-fine, sequential generative representation for the MHG model in the form of a Bayesian network, which in turn implies the mutual independence (up to deviation due to discreteness) among the Fisher’s exact tests completed under FES. This allows an exact characterization of the joint null distribution of the p-values and gives rise to an effective inference recipe through simple multiple testing procedures such as Sidak and Bonferroni corrections, eliminating the need for resampling. In addition, FES can characterize dependency through reporting significant windows after multiple testing control. The computational complexity of FES scales linearly with the sample size, which along with the avoidance of resampling makes it ideal for analyzing massive data sets. We use extensive numerical studies to illustrate the work of FES and compare it to several state-of-the-art methods for testing dependency in both statistical and computational performance. Finally, we apply FES to analyzing a microbiome data set and further investigate its relationship with other popular dependency metrics in that context.


Runze Li (Penn State)


 Max G’Sell (CMU)

Hongyu Zhao (Yale)



Carlos Carvalho (UT Austin)


Tyler H. McCormick (University of Washington)


Rina Foygal (UChicago)