# Schedule for Spring 2015

**Seminars are on Mondays**** Time: 4:10pm – 5:00pm**** Location: Room 903, 1255 Amsterdam Avenue**

Tea and Coffee will be served before the seminar at 3:30 PM, Room 1025

Cheese and Wine will be served after the seminar at 5:10 PM, Room 1025

For an archive of past seminars, please click here.

*Friday 1/23/2015 **Room 903 SSW ***11:00AM – 12:00 noon |
Caroline Uhler, IST Austria “Causal Inference and Gene Regulation” Although the genetic information in each cell within an organism is identical, gene expression varies widely between different cell types. The quest to understand this phenomenon has led to many interesting mathematics problems. At the heart of the matter is the question of how genes interact. This talk is about learning directed graphs based on conditional independence information. The most widely used approaches to this problem are variants of the PC algorithm. Using geometric and algebraic arguments I will show that the so-called “faithfulness assumption”, one of the main constraints of the PC algorithm, is in fact extremely restrictive, implying fundamental limitations for this algorithm. I will then propose an alternative method that overcomes these limitations and is based on finding the permutation of the variables that yields the sparsest graph. I will end by discussing implications for estimating time-varying and tissue-specific gene regulatory networks. |

1/26/2015 *Room 903 SSW **2:30PM – 3:30PM |
Will Fithian, Stanford “Optimal Inference After Model Selection” To perform inference after model selection, we propose controlling the Based on joint work with Dennis Sun and Jonathan Taylor. |

*Thursday 1/29/2015 **Room 303 Mudd ***1:10PM – 2:10 PM |
Peng Ding, Harvard University “Treatment Effect Heterogeneity” Abstract: Applied researchers are increasingly interested in whether and how treatment effects vary in randomized evaluations, especially variation not explained by observed covariates. We propose a model-free approach for testing for the presence of such unexplained variation. To use this randomization-based approach, we must address the fact that the average treatment effect, generally the object of interest in randomized experiments, actually acts as a nuisance parameter in this setting. We explore potential solutions and advocate for a method that guarantees valid tests in finite samples despite this nuisance. We also show how this method readily extends to testing for heterogeneity beyond a given model, which can be useful for assessing the sufficiency of a given scientific theory. We finally apply our method to the National Head Start Impact Study, a large-scale randomized evaluation of a Federal preschool program, finding that there is indeed significant unexplained treatment effect variation. |

2/2/15 |
Tran Mai Ngoc, UT Austin “Random Subdivisions & Neural Coding” In the first part, I will talk about random subdivisions obtained from projections of polytopes. These are related to random polytopes and zeros of random tropical polynomials. In the second part, I will discuss results and open problems in neural coding, with emphasis on decoding grid cells. |

2/9/15 |
Sivaraman Balakrishnan, University of California, Berkeley “Statistical and Computational Guarantees for the EM Algorithm” The expectation-maximization (EM) algorithm is an iterative method for finding maximum-likelihood estimates of parameters in statistical models with unobserved latent variables. Along with Markov Chain Monte Carlo (MCMC) it is one of the two computational workhorses that provided much impetus for statistics in entering its modern “computation-intensive” phase. Much is known about the EM algorithm, its convergence properties, and its susceptibility to local optima. However, despite the existence of multiple fixed points, on a variety of statistical problems the EM algorithm has been observed empirically to perform well given either a reasonable initialization or with several random starts. In this talk, I will introduce novel techniques to theoretically characterize some of the empirically observed behavior of the EM algorithm and give conditions under which the EM algorithm converges to near globally optimal parameter estimates. In particular, I will show the surprising result that for several canonical latent variable models there are large regions of the parameter space over which every fixed point of the EM algorithm is close to a population global optimum. I will conclude with a discussion of some of my other research interests, presenting a few vignettes from my work on clustering and topological data analysis. |

2/16/15 |
Xi Chen, New York University “Statistical Estimation and Decision-making for Crowdsourcing” Crowdsourcing is a popular paradigm for effectively collecting labels at low cost. In this talk, we discuss two important statistical problems in crowdsourcing for categorical labeling task: (1) estimation of true labels and workers’ quality from the static noisy labels provided by non-expert crowdsourcing workers; (2) the dynamic optimal budget allocation for collecting noisy labels. The MLE-based Dawid-Skene estimator has been widely used for the first estimation problem. However, it is hard to theoretically justify its performance due to the non-convexity of log-likelihood function. We propose a two-stage algorithm where the first stage uses the spectral method to obtain an initial estimate of parameters and the second stage refines the estimation via the EM algorithm. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor. For the second dynamic budget allocation, we formulate it into a Bayesian Markov Decision Process (MDP). To solve this MDP, we propose an efficient approximate policy, called optimistic knowledge gradient policy, which is a consistent policy and leads to superior empirical performance. Based on the joint work with Yuchen Zhang, Mike I Jordan, Dengyong Zhou and Qihang Lin. |

2/23/15 |
Guang Cheng, Purdue University “Semi-Nonparametric Inference for Massive Data” In this talk, we consider a partially linear framework for modelling (possibly heterogeneous) massive data. The major goal is to extract common features across all sub-populations while exploring heterogeneity of each sub-population. In particular, we propose an aggregation type estimator that possesses the (non-asymptotic) minimax optimal bound and asymptotic distribution as if there were no heterogeneity. Such a oracle result holds when the number of sub-populations does not grow too fast. A plug-in estimator for the heterogeneity parameter is further constructed, and shown to possess the asymptotic distribution as if the commonality information were available. A large scale heterogeneity testing is also considered. Our general theory applies to the divide-and-conquer approach that is often used to deal with massive homogeneous data in a parallel computing environment. A technical by-product of this talk is the statistical inferences for the general kernel ridge regression. |

3/2/15 |
Barbara Engelhardt, Princeton University “Bayesian latent factor models to recover differential gene co-expression networks” Abstract: Latent factor models have been the recent focus of much attention in `big data’ applications because of their ability to quickly allow the user to explore the underlying data in a controlled and interpretable way. In genomics, latent factor models are commonly used to identify population substructure, identify gene clusters, and control for noise in large data sets. In this talk I will present a general framework for Bayesian sparse latent factor models and motivate some of the structural extensions to these models that have been proposed by my group. I will illustrate the power and the promise of these models for a much broader class of problems in genomics through a specific application to the Genotype-Tissue Expression (GTEx) data set. In particular, I will show how this class of statistical model can be used to identify gene co-expression networks that co-vary uniquely in one tissue type, are differential across tissue types, or are ubiquitous across tissue types by exploiting their ability to estimate covariance matrices. This enables the recovery of differential interactions across large numbers of features via global regularization, as opposed to testing for differential interactions on a local, edge-by-edge basis. |

3/9/15 |
Ismael Castillo, Universities Paris VI & VII “On some properties of Polya tree posterior distributions” Abstract : In Bayesian nonparametrics, Polya tree distributions form a flexible class of priors on distributions or density functions. In the problem of density estimation, for certain choices of parameters, Polya trees have been shown to produce asymptotically consistent posterior distributions in a Hellinger sense. In this talk, after reviewing some general properties of Polya trees, I will show that the previous consistency result can be made much more precise in two directions: first, rates of convergence can be derived and, second, it is possible to characterise the limiting shape of the posterior distribution in a functional sense. We will discuss applications to Donsker-type results on the cumulative distribution function and to the study of functionals of the density. |

3/16/15 | no seminar |

3/23/15 |
Harry van Zanten, University of Amsterdam “Estimating a smooth function on a large graph by Bayesian regularization” Abstract: Various problems arising in modern statistics involve making inference about a “smooth” function on a large graph. Most of the proposed methods view such problems as a high-dimensional or nonparametric estimation problems and employ some regularization or penalization technique that takes the geometry of the graph into account and that tries to produce an appropriate bias-variance trade-off. To get more insight in the fundamental performance of such methods we study the convergence rates of Bayesian approaches in this context. Specifically, we consider the estimating of a smooth function on a large graph in regression or classification problems. We derive results that show how asymptotically optimal Bayesian regularization can be achieved under an asymptotic shape assumption on the underlying graph and a smoothness condition on the target function, both formulated in terms of the graph Laplacian. The priors we employ are randomly scaled Gaussians, with precision operators involving the Laplacian of the graph. This is joint work in progress with Alice Kirichenko |

3/30/15 |
Christian Robert, Université Paris-Dauphine Testing hypotheses via a mixture estimation model We consider a novel paradigm for Bayesian testing of hypotheses and Bayesian model comparison. Our alternative to the traditional construction of posterior probabilities that a given hypothesis is true or that the data originates from a specific model is to consider the models under comparison as components of a mixture model. We therefore replace the original testing problem with an estimation one that focus on the probability weight of a given model within a mixture model. We analyze the sensitivity on the resulting posterior distribution on the weights of various prior modeling on the weights. We stress that a major appeal in using this novel perspective is that generic improper priors are acceptable, while not putting convergence in jeopardy. Among other features, this allows for a resolution of the Lindley-Jeffreys paradox. When using a reference Beta B(a,a) prior on the mixture weights, we note that the sensitivity of the posterior estimations of the weights to the choice of a vanishes with the sample size increasing and avocate the default choice a=0.5, derived from Rousseau and Mengersen (2011). Another feature of this easily implemented alternative to the classical Bayesian solution is that the speeds of convergence of the posterior mean of the weight and of the corresponding posterior probability are quite similar. [Joint work with K. Kamary, J. Rousseau and K. Mengersen] |

4/6/15 |
David Banks, Duke University “Adversarial Risk Analysis” |

4/13/15 |
Alessandro Rinaldo, Carnegie Mellon University “DeBaCl: a density-based clustering algorithm and its properties.” Abstract: Density-based clustering provides a principled and flexible non-parametric framework for defining clustering problems and for evaluating the performance of clustering algorithms. In density-based clustering, clusters are defined to be the maximal connected components of the upper level sets of the density of the data generating distribution. The collection of all the clusters of a density ordered by inclusion defines a dendrogram, called the cluster tree. Such a tree offers a compact and highly interpretable summary of all the clustering properties of the corresponding distribution, and is the key object to estimate in density based clustering. In this talk I will present DeBaCl, a very simple algorithm for density-based clustering based on the knn density estimator. DeBacl scales well to high dimensional problems and can be applied to data originating from distributions with supports of mixed dimensions, and even functional data. I will describe some theoretical results about the performance of DeBaCl in Euclidean spaces, showing that under mild assumptions DeBaCl will produce a cluster tree over the sample points approaching the true clusters tree at a rate independent of the ambient dimension. |

4/20/15 |
Matthew Stephens, University of Chicago Title: False Discovery Rates – a new deal Abstract: False Discovery Rate (FDR) methodology, first put forward by Benjamini and Hochberg, and further developed by many authors – including Storey, Tibshirani, and Efron – is now one of the most widely used statistical methods in genomics, among other areas of application. A typical genomics workflow consists of i) estimating thousands of effects, and their associated p values; ii) feeding these p values to software (e.g. the widely used qvalue package) to estimate the FDR for any given significance threshold. In this talk we take a fresh look at this problem, and highlight two deficiencies of this standard pipeline that we believe could be improved. First, current methods, being based directly on p values (or z scores), fail to fully account for the fact that some measurements are more precise than others. Second, current methods assume that the least significant p values (those near 1) are all null – something that initially appears intuitive, but will not necessarily hold in practice. We suggest simple approaches to address both issues, and demonstrate the potential for these methods to increase the number of discoveries at a given FDR threshold. We also discuss the connection between this problem and shrinkage estimation, and problems involving sparsity more generally. (edited) |

4/27/15 |
* Neil Shephard, Harvard University Title:
Continuous time analysis of fleeting discrete price moves Abstract:
This paper proposes a novel model of financial prices where: (i) prices are discrete; (ii) prices change in continuous time; (iii) a high proportion of price changes are reversed in a fraction of a second. Our model is analytically tractable and directly formulated in terms of the calendar time and price impact curve. The resulting cadlag price process is a piecewise constant semi-martingale with finite activity, finite variation and no Brownian motion component. We use moment-based estimations to to fit four high frequency futures data sets and demonstrate the descriptive power of our proposed model. This model is able to describe the observed dynamics of price changes over three different orders of magnitude of time intervals. |

5/4/15 |