Statistics Seminar Series – Fall 2016

Schedule for Fall 2016

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, 10th Floor Lounge SSW

Cheese and Wine reception will follow the seminar at 5:10 PM in the 10th Floor Lounge SSW

For an archive of past seminars, please click here.


Xiaodong Li (UC Davis)


Xuming He (University of Michigan)

Do Bayesian model selection algorithms have strong selection consistency in high dimensions?

Bayesian model selection algorithms can be used as an alternative to optimization-based methods for model selection, and there is evidence that Bayesian methods approximate the L0-penalty better, but not much has been published about model selection consistency of Bayesian methods in the high dimensional setting.  In this talk, we will discuss the notion of strong selection consistency and show that some of the simple spike-and-slab priors, if allowed to be sample-size dependent, can be strongly consistent even when the number of features exceeds the sample size. The spike-and-slab variable selection algorithms however are not so scalable outside the linear model framework. A more scalable alternative, called Skinny Gibbs, is introduced to mitigate the computational burden without losing strong selection consistency. Logistic regression with high dimensional covariates is used as a primary example. The talk is based on joint work with Naveen Narisetty.


David Banks (Duke University)

Statistical Issues with Agent-Based Models

Agent-based models have become an ubiquitous tool in many disciplines.  But too little is known about their statistical properties.  This talk reviews the work that has been done in this area, and describes two strategies for improving model fitting and inference.  It also attempts to place agent-based modeling within the span of modern Bayesian inference.


Jeff Wu (Georgia Tech)

 “A fresh look at effect aliasing and interactions: some new wine in old bottles.”
Interactions and effect aliasing are among the fundamental concepts in experimental design. Some new insight and approach are given on this time honored subject.  Start with the very simple two-level fractional factorial designs. Two interactions AB and CD are said to be aliased if both represent and are used to estimate the same effect. In the literature this aliasing is deemed impossible to be “de-aliased” or estimated. We argue that this “impossibility” can indeed be resolved by taking a new approach which consists of reparametrization using the notion of “conditional main effects” (cme’s) and model selection by exploiting the properties between the cme’s and traditional factorial effects. In some sense this is a shocking result as this has been taken for granted since the founding work of Finney (1945). There is a similar surprise for three-level fractional factorial designs. The standard approach is to use ANOVA to decompose the interactions into orthogonal components, each of 2-d. Then the quandary of full aliasing between interaction components remains. Again this can be resolved by using a non-orthogonal decomposition of the four degrees of freedom for AxB interaction using the linear-quadratic parametrization. Then a model search strategy would allow the estimation of some interaction components even for designs of resolution III and IV. Moving from regular to nonregular designs like the Plackett-Burman designs, most of the interactions are not orthogonal to the main effects. The partial aliasing of the effects and their complexity was traditionally viewed as “hazards”. Hamada and Wu (1992)  recognized that this could be turned into an advantage. Their analysis strategy for effect de-aliasing is a precursor to what was described above. Underlying the three problems is the use of reparametrization and exploitation of non-orthogonality among some effects. The stated approach can be extended beyond designed experiments and potential applications in machine learning will be outlined. 


*Time: Noon

Matt Wand (University of Technology, Sydney)

Fast Approximate Inference for Arbitrarily Large Statistical Models via Message Passing”
Abstract: We explain how the notion of message passing can be used to streamline the algebra and computer coding for fast approximate inference in large Bayesian statistical models.  In particular, this approach is amenable to handling arbitrarily large models of particular types once a set of primitive operations is established.  The approach is founded upon a message passing formulation of mean field variational Bayes that utilizes factor graph representations of statistical models. The notion of factor graph fragments is introduced and is shown to facilitate compartmentalization of the required algebra and coding.

Tamas Rudas

Title: Model based analysis of incomplete data with non-ignorable missing data mechanism

Abstract: All data arising from surveys or censuses are essentially incomplete. The analysis of such data usually relies on variants of the ignorable missing data mechanism assumption. This assumption leads to convenient analyses but, unfortunately, cannot be tested. The approach put forward in this presentation drops this assumption by considering the respondents with different nonresponse patterns as samples from components of the population characterized by these patterns, and allows the joint distributions of the variables in these components to be different. The population distribution is a mixture of the distributions in the components, and the relative weights may be estimated from the observed data. In each of the components, only a marginal of the joint distribution is observed. The analysis proceeds with estimating the distributions in these components, so that the mixture provides best fit to a model of interest in terms of the so called mixture index of fit. The mixture index of fit (Rudas, Clogg, Lindsay, J Roy Stat Soc, 1994) is the largest fraction of the population where the model may be true. Then, the researcher may evaluate the estimated distributions in the components on substantive grounds and assess overall model fit. The missing data models obtained may also be seen as log-affine marginal models (Bergsma, Rudas, Ann Statist, 2002) for the variables and the indicators of whether or not they were observed. This approach makes it possible to formulate the standard Missing At Random and Missing Completely At Random assumptions, and leads to various multivariate generalizations of these concepts, providing a flexible framework to assess the missing data situation.



Denis Talay (Inria)

“Sensitivity analysis of first hitting time Laplace transforms w.r.t. the Hurst parameter  of the driving noise of stochastic differential equations”

The lecture is based on a joint work with Alexandre Richard (Inria).

We present an innovating sensitivity analysis for stochastic differential equations:

We study the sensitivity, when the Hurst parameter~$H$ of the driving fractional Brownian motion tends to the pure Brownian value, of probability distributions of smooth functionals of the trajectories of the solutions $\{X^H_t\}_{t\in \mathbb{R}_+}$ and of the Laplace transform of the first passage time of $X^H$ at a given threshold.

Our technique requires to extend already known accurate Gaussian estimates on the density of $X^H_t$ to the case where $t$ lives in an infinite time interval. We show and discuss our estimate at the end of the talk.


David Siegmund (Stanford University)

“Change-point Detection  and Estimation”

Several problems of genomic analysis involve detection of local genomic signals, rep- resented by changes in the mean level of some measurement. Changes can occur contin- uously or discontinuously. A motivating example of discontinuous  change is provided by copy number variation (CNV), in cancer cells where the changes in copy number are often somatic, and in normal cells where changes in copy number arise as germline mutations. Data can be based on comparative genomic hybridizaton (CGH), Single Nucleotide Poly- morphisms (SNPs) or DNA resequencing. For the first two it is often plausible to assume that the data are normally distributed.  In this talk I will focus on the simplest version of this problem, which involves segmentation of independent normal observations according to abrupt changes in the mean. Results will be illustrated by simulations and by applica- tions to the BT474 cell line. Confidence regions for the change-points and joint regions for the change-points and mean values will also be discussed.

This is joint research with Fang Xiao and Jian Li.


Barry Nussbaum (President-Elect, American Statistical Association)

“What Did They Just Say You Said?”
Abstract:  Statisticians have long known that success in our profession frequently depends on our ability to succinctly explain our results so decision makers may correctly integrate our efforts into their actions.  However this is no longer enough.  While we still must make sure that we carefully present results and conclusions, the real difficulty is what the recipient thinks we just said.   This presentation will discuss what to do, and what not to do.  Examples, including those used in court cases, executive documents, and material presented for the President of the United States will illustrate the principles.

Barry D. Nussbaum was the Chief Statistician for the U.S. Environmental Protection Agency from 2007 until his retirement in March, 2016.   He started his EPA career in 1975 in mobile sources and was the branch chief for the team that phased lead out of gasoline.  Dr. Nussbaum is the founder of the EPA Statistics Users Group.  In recognition of his notable accomplishments he was awarded the Environmental Protection Agency’s Distinguished Career Service Award.

Dr. Nussbaum has a bachelor’s degree from Rensselaer Polytechnic Institute, and both a master’s and a doctorate from the George Washington University.     In May, 2015, he was elected the 112th president of the American Statistical Association.   He has been a fellow of the ASA since 2007. He has taught graduate statistics courses for George Washington University and Virginia Tech and has even survived two terms as the treasurer of the Ravensworth Elementary School PTA. 

11/7/16 Elections (Academic Holiday)


Flori Bunea (Cornell University) – Cancelled

“Model Based Variable Clustering”

The problem of variable clustering is that  of grouping similar  components of a p- dimensional vector X  = (X1, . . . , Xp),  and estimating these groups from n independent copies of X . Traditionally,  variable clustering has been treated in an algorithmic manner, making the estimated clusters difficult to interpret and analyze, from a statistical perspec- tive. We take a different approach in this talk, and suggest model based variable clustering.
For a partition G of the index set {1, . . . , p}, we consider the class of G-latent models, in which each group of the X -variables is assumed to have a common latent generator, and the latent generators are correlated. At first sight, the most natural way to estimate such clusters is via K -means. We explain why this strategy cannot lead to correct cluster recov- ery in G-latent models. We offer a correction, based on semi-definite programing, that can be viewed  as a penalized convex relaxation of K -means (PECOK). We introduce a cluster separation measure tailored to G-latent models, which can be viewed  as a measure of the signal in these models. We derive its minimax lower bound for perfect cluster recovery. The clusters estimated by PECOK are shown to recover G at a near minimax optimal cluster separation rate, a result that holds true even if K , the number of clusters, is esti- mated adaptively from the data. We also compare PECOK with appropriate corrections of spectral clustering-type procedures, and show that the former outperforms the latter for cluster recovery of minimally separated clusters.
We also introduce a more general class of models for clustering, that of G-block corre- lation matrix models. We explain when this class can offer more flexibility  than the class of G-latent models. We identify the appropriate cluster separation metric in these models, different than the one above, and derive its minimax lower bound for cluster recovery. We derive a new clustering method, CORD, tailored to the class of G-block correlation mod- els.
Clusters of variables are routinely  employed in downstream scientific analyses. The standard practice is to average over each cluster, and use those averages for further mod- eling.  However, average variables are difficult  to interpret scientifically.  Moreover, their usage may result in misleading scientific conclusions. We give an example in the context of graphical modeling after clustering, and offer an alternative model, the block graphical model, that allows the study of conditional independencies after clustering without aver- aging.
Extensions to overlapping clustering will be discussed briefly, time permitting.


Christopher Fonnesbeck

“Bayesian Models for Florida Manatee Population Monitoring and Conservation”
The Florida manatee (Trichechus manatus) is an endangered coastal marine mammal, currently listed as “endangered” by both the US and Florida governments. For decades, management of the manatee population was conducted in the absence of reliable information regarding population size and dynamics. Though aerial surveys are regularly conducted to assess manatee numbers, such counts are biased by imperfect detection and incomplete coverage of their range. We present a Bayesian model for estimating the state-wide manatee population using data from a stratified random survey design, and using auxiliary information to correct for observation bias, and account for variation in manatee occupancy, abundance and availability across the state. This yields the first statistical estimate of the manatee population, which can be used to aid conservation decision-making, and may ultimately lead to the species’ removal from the endangered species list.



Aurellie Lozano (IBM) –


Eitan Greenshtein (Central Bureau of Statistics, Israel)
“Non-parametric empirical Bayes improvement of common shrinkage estimators.”

Abstract: We consider the problem of estimating a vector (µ1,…,µn) of normal means under a squared loss, based on independent Y_i ∼ N(µ_i,1), i = 1,…,n. We use ideas and techniques from non-parametric empirical Bayes, to obtain asymptotical risk improvement of classical shrinkage estimators, such as, Stein’s estimator, Fay-Herriot, Kalman filter, and more. We consider both the sequential and retrospective estimation problems. We elaborate on state-space models and the Kalman filter estimators. The performance of our improving method is demonstrated both through simulations and real data examples.

Joint work with Ariel Mansura, and Ya’acov Ritov


David Stoffer (University of Pittsburgh)

“Almost everything you always wanted to know about NONLINEAR STATE SPACE MODELS (but were afraid to ask)”


Ever wonder why, when you fly to LAX, you don’t wind up in San Diego?  The tracking devices will use a nonlinear state space model. While inference for the linear Gaussian model is fairly simple, inference for nonlinear models can be difficult and often relies on derivative free numerical optimization techniques.  A promising method that I will discuss is based on particle approximations of the conditional distribution of the hidden process given the data. This distribution is needed for both classical inference (e.g., Monte Carlo EM type algorithms) and Bayesian inference (e.g., Gibbs sampler).

Particle methods are an extension of sequential importance sampling (SIS). Although the SIS algorithm has been known since the early 1970s, its use in nonlinear problems remained largely unnoticed until the early 1990s. Obviously the available computational power was too limited to allow convincing applications of these methods, but other difficulties plagued the technique. Time series data are typically long and particles have a tendency to die young. Consequently, the approach is cursed by dimensionality. But as Shakespeare noted, if dimensionality curseth, a better algorithm useth.


David Matteson

“High Dimensional Forecasting via Interpretable Vector Autoregression”

Vector autoregression (VAR) is a fundamental tool for modeling multivariate time series. However, as the number of component series is increased, the VAR model becomes overparameterized. Several authors have addressed this issue by incorporating regularized approaches, such as the lasso in VAR estimation. Traditional approaches address overparameterization by selecting a low lag order, based on the assumption of short range dependence, assuming that a universal lag order applies to all components. Such an approach constrains the relationship between the components and impedes forecast performance. The lasso-based approaches work much better in high-dimensional situations but do not incorporate the notion of lag order selection. We propose a new class of regularized VAR models, called hierarchical vector autoregression (HVAR), that embed the notion of lag selection into a convex regularizer. The key modeling tool is a group lasso with nested groups which guarantees that the sparsity pattern of lag coefficients honors the VAR’s ordered structure. The HVAR framework offers three structures, which allow for varying levels of flexibility. A simulation study demonstrates improved performance in forecasting and lag order selection over previous approaches, and two macroeconomic applications further highlight forecasting improvements as well as HVAR’s convenient, interpretable output. Our manuscript is available here: