Statistics Seminar Series – Fall 2017

Schedule for Fall 2017

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, 10th Floor Lounge SSW

Cheese and Wine reception will follow the seminar at 5:10 PM in the 10th Floor Lounge SSW

For an archive of past seminars, please click here.


Yoav Benjamini (Tel Aviv University)

“The replicability problems in science: it’s not the p-value’s fault”

Abstract — Significance testing, and the p-value as its symbol, have become the statistical scapegoat for the replicability problems in science.  Instead, I shall argue that the two main statistical obstacles to replicability are (i) unattended inference on the selected, and (ii) ignoring the relevant variability. I shall review current approaches to selective inference, and give but one example of the second obstacle in mouse phenotyping. 


Aurelie Lozano (IBM)

“Sparse + Group-Sparse Dirty Models: Statistical Guarantees without Unreasonable Conditions and a Case for Non-Convexity.”

Abstract — Imposing sparse + group-sparse superposition structures in high-dimensional parameter estimation is known to provide flexible regularization that is more realistic for many real-world problems. For example, such a superposition enables partially-shared support sets in multi-task learning, thereby striking the right balance between parameter overlap across tasks and task specificity. Existing theoretical results on estimation consistency, however, are problematic as they require too stringent an assumption: the incoherence between sparse and group-sparse superposed components. In this talk, we fill the gap between the practical success and suboptimal analysis of sparse + group-sparse models, by providing the first consistency results that do not require unrealistic assumptions. We also study non-convex counterparts of sparse + group-sparse models. Interestingly, we show that these are guaranteed to recover the true support set under much milder conditions and with smaller sample size than convex models, which might be critical in practical applications as illustrated by our experiments.

9/25/17 Flori Bunea (Cornell)

“Structured Sparse Latent Variable Models For Overlapping clustering with Love”

Abstract pdf



Time: 4:00 – 5:00

Room: 717 Hamilton

Jianqing Fan (Princeton)

“Uniform pertubation analysis of eigenspaces and its applications to Community Detection, Ranking and Beyond.”


Spectral methods have been widely used for a large class of challenging problems, ranging from top-K ranking via pairwise comparisons, community detection, factor analysis, among others.

Analyses of these spectral methods require super-norm perturbation analysis of top eigenvectors.  This allows us to UNIFORMLY approximate elements in eigenvectors  by linear functions of the observed random matrix that can be analyzed further.  We first establish such an infinity-norm pertubation bound for top eigenvectors and apply the idea to several challenging problems such as top-K ranking, community detections, Z_2-syncronization and matrix completion.  We show that the spectral methods are indeed optimal for these problems.  We illustrate these methods via simulations.


Long Nguyen (University of Michigan)

“Robust estimation of parameters in finite mixture models”

Abstract: In finite mixture models, apart from underlying mixing measure, true kernel density function of each subpopulation in the data is, in many scenarios, unknown. Perhaps the most popular approach is to choose some kernel functions that we empirically believe our data are generated from and use these kernels to fit our models. Nevertheless, as long as the chosen kernel and the true kernel are different, statistical inference of mixing measure under this setting will be highly unstable. To overcome this challenge, we propose flexible and efficient robust estimators of the mixing measure in these models, which are inspired by the idea of minimum Hellinger distance estimator, model selection criteria, and superefficiency phenomenon. We demonstrate that our methods consistently recover the number of components and achieve the optimal convergence rates of parameter estimation, evaluated via Wasserstein distance metric, under both the well- and mis-specified  kernel settings for any fixed bandwidth. These desirable properties are illustrated via simulation studies with both synthetic and real data. This work is joint with Nhat Ho and Ya’acov Ritov.


Qiaozhu Mei (University of Michigan)

“Learning Representations of Large-scale Networks”

Abstract: Large-scale networks such as social networks, biological networks, and the World Wide Web have attracted increasing attention in both academia and industry. While recent developments of representation learning, especially deep learning approaches, have demonstrated their great power in image, text, and speech data, how to learning useful representations for discrete, networked data remains a major challenge. In this talk, I will introduce the recent progresses of my research group in learning the representations for large-scale network data. I will introduce efficient algorithms that embed nodes into a continuous vector space so that the local and global structural information is preserved. By further projecting the representation to a 2D or 3D space, we are able to visualize millions of high-dimensional data points meaningfully on a single slide. By learning the representation of a network as a whole, we demonstrate that the topological structure of a network has a predictive power for the growth of both the network itself and information cascades on the network.

Bio: Qiaozhu Mei is an associate professor at the School of Information and the Department of EECS, the University of Michigan. He is widely interested in data mining, machine learning, information retrieval and their applications to the Web, natural language, social networks, and health informatics. He is a recipient of the NSF CAREER Award and multiple best paper awards at ICML, KDD, WSDM, and other related venues. He is serving on the editorial boards of multiple top journals and is the general co-Chair of SIGIR 2018.


Yuxin Chen (Princeton)

“Implicit Regularization in Nonconvex Statistical Optimization”


Recent years have seen astounding progress both in theory and practice of nonconvex optimization. Carefully designed nonconvex procedures simultaneously achieve optimal statistical accuracy and computational efficiency for many problems. Due to the highly nonconvex landscape, the state-of-the-art results often require proper regularization procedures (e.g. trimming, projection, or extra penalization) to guarantee fast convergence. For vanilla algorithms, however, the prior theory usually suggests conservative step sizes in order to avoid overshooting.

This talk uncovers a striking phenomenon: even in the absence of explicit regularization, nonconvex gradient descent enforces proper regularization automatically and implicitly under a large family of statistical models. In fact, the vanilla nonconvex procedure follows a trajectory that always falls within a region with nice geometry. This “implicit regularization” feature allows the algorithm to proceed in a far more aggressive fashion without overshooting, which in turn enables faster convergence.  We will discuss several concrete fundamental problems including phase retrieval, matrix completion, blind deconvolution, and recovering structured probability matrices, which might shed light on the effectiveness of nonconvex optimization for solving more general structured recovery problems.


Lingzhou Xue (Penn State)


Alexandre Tsybakov (ENSAE)

“Optimal and adaptive variable selection”

Abstract: We consider the problem of variable selection based on $n$ observations from a high-dimensional linear regression model.

The unknown parameter of the model is assumed to belong to the class $S$ of all $s$-sparse vectors in $R^p$ whose non-zero components are greater than $a > 0$.

Variable selection in this context is an extensively studied problem and various methods of recovering sparsity pattern have been suggested.

However, in the theory not much is known beyond the consistency of selection. For Gaussian design, which is of major importance in the context of compressed sensing, necessary and sufficient conditions of consistency for some configurations of $n,p,s,a$ are available. They are known to be achieved by the exhaustive search selector, which is not realizable in polynomial time and requires the knowledge of $s$.

This talk will focus on the issue of optimality in variable selection based on the Hamming risk criterion. We first consider a general setting of variable selection problem and we derive the explicit expression for the minimax Hamming risk on $S$. Then, we specify it for the Gaussian sequence model and for high-dimensional linear regression with Gaussian design. In the latter model, we propose an adaptive algorithm independent of $s,a$, and of the noise level that nearly attains the value of the minimax risk. This algorithm is the first method, which is both realizable in polynomial time and is consistent under almost the same (minimal) sufficient conditions as the exhaustive search selector. This talk is based on a joint work with C.Butucea, M.Ndaoud and N.Stepanova. 


University Holiday


Aditya Guntuboyina (Berkeley)

“Adaptation via convex optimization in two nonparametric estimation problems”

Abstract: We study two convex optimization based procedures for nonparametric function estimation: trend filtering (or higher order total variation denoising) and the Kiefer-Wolfowitz MLE for Gaussian location mixtures. Trend filtering can be seen as a technique for fitting spline-like functions for nonparametric regression with adaptive knot selection. It can also be seen as a special case of LASSO for a specific design matrix with highly correlated columns. The Kiefer-Wolfowitz MLE is a technique for nonparametric density estimation with an adaptive selection of the number of Gaussian mixture components. We shall prove that these two procedures have natural adaptive risk behavior for prediction (i.e., they achieve prediction performance that is comparable to appropriate Oracle estimators as well as to non-convex combinatorial procedures) under sparsity-like assumptions. The results for trend filtering are based on joint work with Donovan Lieu, Sabyasachi Chatterjee and Bodhisattva Sen and the results for the Kiefer-Wolfowitz MLE are based on joint work with Sujayam Saha.



Rm 1025 SSW


Anthony Davison (Ecole Poytechnique Federale de Lausanne – EPFL)


Heping Zhang (Yale)

“Surrogate Residuals for Generalized Linear Models”

Residual diagnostics is an important classroom topic in statistics. Nowadays, it only occasionally appears in a statistical journal or research publication, even when regression or association analyses of real data are critical to the findings and conclusions. Perhaps it is a topic we tend to gloss over; or perhaps it is a topic we know neither how to approach nor how to circumvent. The latter is probably closer to reality in the context of logistic regression, and even more so when there is an ordinal-scaled outcome such as whether we feel sad, good or great today. In this talk, I will attempt to draw your attention to this topic, which in my own experience remains very
important yet insufficiently treated in real data applications. Using a combination of classic and contemporary statistical techniques, I will introduce the concept of surrogate residuals, which appears informative and useful in residual diagnosis of regression models involving categorical outcomes, including logistic regression. This will be supported in theory, simulation, and real data analysis. The work is a collaboration with Dr. Dungang Liu, Assistant Professor of Business Analytics, Carl H. Lindner College of Business, University of Cincinnati.


Rajarshi Mukherjee (Berkeley)

“On Estimation of Nonparametric Functionals”

Abstract: We consider asymptotically minimax as well as adaptive estimation of a class of “smooth” nonlinear functionals in nonparametric models. Particular examples of the general class of functionals under study include: (i) effect of a treatment on a outcome in presence of covariates, (ii) mean functional in missing data problems, and (iii) integrated functionals of densities or that of mean and variance functions in nonparametric regression models. For non-adaptive minimax estimation, we describe a general strategy of estimation by extending first order semiparametric theory to Higher Order Influence Functions (HOIFs). Since estimators based on HOIFs are U-statistics of suitable order, we will also consider general adaptive upper bounds for estimators based on second order U-statistics which arise from finite dimensional approximation of the infinite dimensional models using projection type kernels. An accompanying general adaptive lower bound tool is provided by deriving bounds on chi-square divergence between mixture of product measures. We then show that such tools provide rate optimal adaptive estimation for the class of functionals under study.



Richard Olshen (Stanford)

“V(D)J Diversity and Statistical Inference”

Abstract: This talk will include an introduction to the topic of V(D)J rearrangements of T cells and B cells of the adaptive human immune system, in particular of IgG heavy chains. There are many statistical problems that arise in understanding particular types of these cells. This presentation will be my attempt to provide some mathematical and computational details that arise in trying to understand the data.


Sarah Heaps (Newcastle University)

“Identifying the effect of public holidays on daily demand for gas”

Gas distribution networks need to ensure the supply and demand for gas are balanced at all times. In practice, this is supported by a number of forecasting exercises which, if performed accurately, can substantially lower operational costs, for example through more informed preparation for severe winters. Amongst domestic and commercial customers, the demand for gas is strongly related to the weather and patterns of life and work. In regard to the latter, public holidays have a pronounced effect, which often extends into neighbouring days. In the literature, the days over which this protracted effect is felt are typically pre-specified as fixed windows around each public holiday. This approach fails to allow for any uncertainty surrounding the existence, duration and location of the protracted holiday effects. We introduce a novel model for daily gas demand which does not fix the days on which the proximity effect is felt. Our approach is based on a four-state, non-homogeneous hidden Markov model with cyclic dynamics. In this model the classification of days as public holidays is observed, but the assignment of days as “pre-holiday”, “post-holiday” or “normal” is unknown. Explanatory variables recording the number of days to the preceding and succeeding public holidays guide the evolution of the hidden states and allow smooth transitions between normal and holiday periods. To allow for temporal autocorrelation, we model the logarithm of gas demand at multiple locations, conditional on the states, using a first-order vector autoregression (VAR(1)). We take a Bayesian approach to inference and consider briefly the problem of specifying a prior distribution for the autoregressive coefficient matrix of a VAR(1) process which is constrained to lie in the stationary region. We summarise the results of an application to data from Northern Gas Networks (NGN), the regional network serving the North of England, a preliminary version of which is already being used by NGN in its annual medium-term forecasting exercise.

*Tuesday 12/12/17

Room 903 SSW




Fabrizia Mealli (Harvard)

Title: Assessing causal effects on survival time in the presence of treatment switching

Abstract: In clinical trials focusing on survival outcomes for patients suffering from Acquired Immune Deficiency Syndrome (AIDS)-related illnesses and particularly painful cancers in advanced stages, patients in the control arm are often allowed to switch to the treatment arm if their physical conditions are worse than certain tolerance levels. The Intention-To-Treat analysis, comparing groups formed by randomization regardless of the treatment actually received, is often used; although it provides valid causal estimates of the effect of assignment, it does not give information about the effect of the actual receipt of the treatment and ignores the information of treatment switching in the control group. Other existing methods in the literature propose to reconstruct the outcome a unit would have had if s/he had not switched but they are usually based on strong assumptions, like that there exist no relation between patient’s prognosis and switching behaviour. Clearly, the switching status of the units in the control group contains important post-treatment information and it is useful to characterize the heterogeneity of the treatment effect. We propose to re-define the problem of treatment switching using principal stratification and introduce new causal estimands, principal causal effects for patients belonging to subpopulations defined by the switching behavior under the control treatment, which appropriately adjust for the post-treatment information and characterize treatment effect heterogeneity. For inference, we use a Bayesian approach, which allows us to properly take into account that (i) switching happens in continuous time generating a continuum of principal strata; (ii) switching time is not defined for units who never switch in a particular experiment; and (iii) both survival time, the outcome of primary interest, and switching time are subject to censoring. We illustrate our framework using simulated data based on the Concorde study, a randomized controlled trial aimed to assess causal effects on time-to-disease progression or death of immediate versus deferred treatment with zidovudine among patients with asymptomatic HIV infection.

Joint work with Alessandra Mattei and Peng Ding