Statistics Seminar Series
Semester Schedule: Statistics – Spring 2014 Seminars are on Mondays 

Feb 17

Rebecca C. Steorts (CMU)
Title: Will the Real Steve Fienberg Please Stand Up: Getting to Know a Abstract: We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a {\em bipartite} graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate $k$way posterior probabilities of matches across records, and propagate the uncertainty of record linkage into later analyses.

Feb 24 
Alexandra Chronopoulou, CUNY Title: Statistical Inference for fractional SDEs and applications. Abstract: Stochastic differential equations driven by fractional Brownian motion have an increasing presence in a wide range of applications, as they can model successfully phenomena that are characterized by long memory and/or selfsimilarity. In this talk, we will review their basic theoretical properties, focus on the statistical inference of their parameters and discuss particular applications in mathematical finance. 
March 3  Noureddine El Karoui, UC Berkeley

March 10  Yi Yu, University of Cambridge Title: Fused community detection Abstract: Community detection is one of the most widely studied problems in This is joint work with Dr. Yang Feng (Columbia University) and Prof. Richard J. Samworth

March 17  Spring Recess

March 24  Zongming Ma, UPENN “Estimating Highdimensional Matrices: Convex Geometry and Computational Barriers” In this talk, we introduce a unified approach for studying estimation of highdimensional matrices, which yields tight nonasymptotic minimax rates for a large collection of loss functions in a variety of problems. Based on the convex geometry of finitedimensional Banach spaces, the minimax rates of oracle (unconstrained) matrix denoising problem is determined for all unitarily invariant norms. This result is then extended to denoising with submatrix sparsity, where the excess risk depends on the sparsity constraints in a completely different manner. The approach is also applicable to matrix completion under lowrank constraint and extends beyond the normal mean model. In

March 31  Jiashun Jin, CMU
Fast Network Community Detection by SCORE Consider a network where the nodes split into K dierent communities. The 
April 7  Yee Whye Teh

April 14  Grant Weller, CMU
Title: Inference for Hidden Regular Variation in Multivariate Extremes
Abstract: A fundamental deficiency of classical multivariate extreme value theory is the inability to distinguish between asymptotic independence and exact independence. In this work, we examine multivariate threshold exceedance modeling in the framework of regular variation. Under this framework, dependence in the tail of a distribution is described by a limiting measure, which in some cases is degenerate on joint tail regions despite possible dependence in such regions at finite levels. Hidden regular variation, a higherorder tail decay on these regions, offers a refinement of the classical theory. We develop a representation of random vectors possessing hidden regular variation as the sum of independent regular varying components. The representation is shown to be asymptotically valid via a multivariate tail equivalence result. We develop a likelihoodbased estimation procedure from this representation via a Monte Carlo expectationmaximization algorithm which has been modified for tail estimation. The methodology is demonstrated on simulated data and applied to a bivariate series of air pollution measurements.

April 21 
Mary Meyer, Colorado State University “Variable and Shape Selection in the Generalized Additive Model” The partial linear generalized additive model is considered, where the goal is to choose a subset of predictor variables and describe the component relationships with the response, in the 
April 28  Liza Levina, University of Michigan
Title: Fast Community Detection in Large Sparse Networks Abstract: Community detection is one of the fundamental problems in network analysis, with many diverse applications, and a lot of work has been done on models and algorithms that find communities. Perhaps the most commonly used probabilistic model for a network with communities is the stochastic block model, and many algorithms for fitting it have been proposed. Since finding communities involves optimizing over all possible assignments of discrete labels, most existing algorithms do not scale well to large networks, and many fail on sparse networks. In this talk, we propose a pseudolikelihood approach for fitting the stochastic block model to address these shortcomings. Pseudolikelihood is a general statistical principle that involves trading off some of the model complexity against computational efficiency. We also derive a variant that allows for arbitrary degree distributions in the network, making it suitable for fitting the more flexible degreecorrected stochastic block model. The pseudolikelihood algorithm scales easily to networks with millions of nodes, performs well empirically under a range of settings, including on very sparse networks, and is asymptotically consistent under reasonable conditions. If times allows, I will also discuss spectral clustering with perturbations, a new method of independent interest we use to initialize pseudolikelihood, which works well on sparse networks where regular spectral clustering fails. 
May 5  Juerg Huesler, University of Bern
On high exceedances and excursions Abstract 