Statistics Seminar – Fall 2018

Schedule for Fall 2018

Seminars are on Mondays
Time: 4:10pm – 5:00pm
Location: Room 903, 1255 Amsterdam Avenue

Tea and Coffee will be served before the seminar at 3:30 PM, 10th Floor Lounge SSW

Cheese and Wine reception will follow the seminar at 5:10 PM in the 10th Floor Lounge SSW

For an archive of past seminars, please click here.


Haipeng Xing (SUNY)

“Predictive effect of economic and market variations on structural breaks in the credit market”

The financial crisis of 2007-2008 has caused severe economic and political consequences over the world. An interesting question from this crisis is whether or to what extent such sharp changes or structural breaks in the market can be explained by economic and market fundamentals. To address this issue, we consider a model that extracts the information of market structural breaks from firms’ credit rating records, and connects probabilities of market structural breaks to observed and latent economic variables. We also discuss the issue of selecting significant variables when the number of economic covariates is large. We then analyze market structural breaks that involve U.S. firms’ credit rating records and historical data of economic and market fundamentals from 1986 to 2015. We find that the probabilities of structural breaks are positively correlated with changes of S\&P500 returns and volatilities and changes of inflation, and negatively correlated with changes of corporate bond yield. The significance of other variables depends on the inclusion of latent variables in the study or not.


4:00-5:15 in Uris 142

Ed Kaplan (Yale)

Title: Approximating the First-Come, First-Served Stochastic Matching Model with Ohm’s Law

Abstract: The first-come, first-served (FCFS) stochastic matching model, where each server in an infinite sequence is matched to the first eligible customer from a second infinite sequence, developed from queueing problems addressed by Kaplan (1984) in the context of public housing assignments. The goal of this model is to determine the matching rates between eligible customer types and server types, that is, the fraction of all matches that occur between type-i customers and type-j servers. This model was solved in a beautiful paper by Adan and Weiss, but the resulting equation for the matching rates is quite complicated, involving the sum of permutation-specific terms over all permutations of the server types. Here, we develop an approximation for the matching rates based on Ohm’s Law that in some cases reduces to exact results, and via analytical, numerical, and simulation examples is shown to be highly accurate. As our approximation only requires solving a system of linear equations, it provides an accurate and tractable alternative to the exact solution.

(This is joint work with Mohammad M. Fazel-Zarandi, MIT Sloan School of Management)

There will be a refreshment reception afterward in Uris deli’s Hepburn Lounge on the 1st floor of the building.


Andrew Nobel (UNC)

“Variational Analysis of Empirical Risk Minimization”

This talk presents a variational framework for the asymptotic analysis of empirical risk minimization in general settings. In its most general form the framework concerns a two-stage inference procedure. In the first stage of the procedure, an average loss criterion is used to fit the trajectory of an observed dynamical system with a trajectory of a reference dynamical system. In the second stage of the procedure, a parameter estimate is obtained from the optimal trajectory of the reference system. I will show that the empirical risk of the best fit trajectory converges almost surely to a constant that can be expressed in variational form as the minimal expected loss over dynamically invariant couplings (joinings) of the observed and reference systems. Moreover, the family of joinings minimizing the expected loss fully characterizes the asymptotic behavior of the estimated parameters. I will illustrate the variational framework through an application to the well-studied problem of maximum likelihood estimation, and the analysis of system identification from quantized trajectories subject to noise, a problem in which the models themselves exhibit dynamical behavior across time. As time permits, I will give an overview of new results in a more Bayesian setting, specifically Gibbs posterior estimation of Gibbs distributions.


Cheng Yong Tang (Temple)

“Pre-processing with Orthogonal Decompositions for High-dimensional  Explanatory  Variables”

It is well known that high level of correlations between explanatory variables is problematic for high-dimensional regularized regression methods targeting at esti- mating sparse linear models. Due to the violation of the irrepresentable condition, the popular LASSO method may suffer from false inclusions of non-contributing variables. In this paper, we propose pre-processing with orthogonal decomposi- tions (PROD) for the explanatory variables in high-dimensional regressions. The PROD procedure is constructed based upon a generic orthogonal decomposition of the design matrix. We investigate in detail three specific cases of the PROD: one by the conventional principal component analysis, one by a novel optimiza- tion incorporating the impact from the response variable, and one by random projections. We show that the level of correlations can be effectively reduced with PROD, making it more realistic for the irrepresentable condition to be valid. We also recognize that the PROD is flexible and can be adapted taking multiple ob- jectives into consideration such as reducing the level of correlations between the explanatory variables yet without compromising the level of variations of the re- sulting estimator. Extensive numerical studies with simulations and data analysis show the promising performance of the PROD. Our theoretical analysis also con- firms its effect and benefit for high-dimensional regularized regression methods. This is a joint work with Xu Han and Ethan X. Fang.


Hongning Wang (University of Virginia)

Learning Contextual Bandits in a Non-Stationary Environment”

Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in many important real-world problems, such as display advertisement and recommender systems. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users’ preferences are dynamic. In this talk, we consider a situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such a non-trivial environment. And extensive empirical evaluations on both synthetic and real-world data sets confirm its practical utility in a changing environment.


Dr. Hongning Wang is now an Assistant Professor in the Department of Computer Science at the University of Virginia. He received his PhD degree in computer science at the University of Illinois at Champaign-Urbana in 2014. His research generally lies in the intersection among machine learning, data mining and information retrieval, with a special focus on computational user behavior modeling. His work has generated over 50 research papers in top venues in data mining and information retrieval areas. He is a recipient of 2016 National Science Foundation CAREER Award and 2014 Yahoo Academic Career Enhancement Award. 



Xin Tong (USC Marshall)

“Neyman-Pearson classification”


In many binary classification applications, such as disease diagnosis and spam detection, practitioners commonly face the need to limit type I error (that is, the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (that is, the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, alpha, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than alpha do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than alpha.  This talk introduces the speaker and coauthors’ work on NP classification algorithms and their applications and raises current challenges under the NP paradigm.  

Bio bio:

Xin Tong is an assistant professor in the Department of Data Sciences and Operations at the University of Southern California. He attended the University of Toronto for undergraduate studies in mathematics and obtained a Ph.D. degree in Operations Research from Princeton University. Before joining the University of Southern California, He was an instructor in Statistics at the Department of Mathematics, Massachusetts Institute of Technology.  His current research interest focuses on asymmetric statistical learning problems.  His research is partially funded by the United States NSF and NIH.  


Roy Han (Rutgers)

 “Least squares estimation: beyond Gaussian regression models.”

We study the convergence rate of the least squares estimator (LSE) in a regression model with possibly heavy-tailed errors. Despite its importance in practical applications, theoretical understanding of this problem has been limited. We first show that from a worst-case perspective, the convergence rate of the LSE in a general non-parametric regression model is given by the maximum of the Gaussian regression rate and the noise rate induced by the errors. In the more difficult statistical model where the errors only have a second moment, we further show that the sizes of the ‘localized envelopes’ of the model give a sharp interpolation for the convergence rate of the LSE between the worst-case rate and the (optimal) parametric rate. These results indicate both certain positive and negative aspects of the LSE as an estimation procedure in a heavy-tailed regression setting. The key technical innovation is a new multiplier inequality that sharply controls the size of the multiplier empirical process associated with the LSE, which also finds applications in shape-restricted and sparse linear regression problems

Mark Brown (Columbia)

“Taylor’s Law via Ratios, for Some Distributions with Infinite Mean”

Taylor’s law (TL) originated as an empirical pattern in ecology. In many sets of samples of population density, the variance of each sample was approximately proportional to a power of the mean of that sample. In a family of nonnegative random variables, TL asserts that the population variance is proportional to a power of the population mean. TL, sometimes called fluctuation scaling, holds widely in physics, ecology, finance, demography, epidemiology, and other sciences, and characterizes many classical probability distributions and stochastic processes such as branching processes and birth-and-death processes. We demonstrate analytically for the first time that a version of TL holds for a class of distributions with infinite mean. These distributions and the associated TL differ qualitatively from those of light-tailed distributions. Our results employ and contribute to methodology of Albrecher and Teugels (2006) and Albrecher, Ladoucette and Teugels (2010). This work opens a new domain of investigation for generalizations of TL. This work is joint with Professors Joel Cohen and Victor de la Pena

This is a joint seminar with the Applied Probability and Risk Seminar

11/5/18 Academic Holiday – no seminar
Howell Tong (LSE)

“Jackknife approach to the estimation of mutual information”


Quantifying the dependence between two random variables is a fundamental issue in data analysis, and thus many measures have been proposed. Recent studies have focused on the renowned mutual information (MI) [Reshef DN, et al. (2011) Science 334:1518–1524]. However, “Unfortunately, reliably estimating mutual information from finite continuous data remains a significant and unresolved problem” [Kinney JB, Atwal GS (2014) ProcNatl Acad Sci USA 111:3354–3359]. In this paper, we examine the kernel estimation of MI and show that the bandwidths involved should be equalized. We consider a jackknife version of the kernel estimate with equalized bandwidth and allow the band- width to vary over an interval. We estimate the MI by the largest value among these kernel estimates and establish the associated theoretical underpinnings.


Farzad Sabzikar (ISU)

“Asymptotic theory for near integrated processes driven by ARTFIMA time series”

Abstract:  In this talk, we discuss asymptotic theory for near-integrated random processes and associated regressions including the score function in more general settings where the errors are Autoregressive tempered fractionally integrated moving average (ARTFIMA) time series. ARTFIMA time series is a special case of tempered linear processes.  Tempered processes are stationary time series that have a semi-long memory property in the sense that the autocovariogram of the process resembles that of a long memory model for moderate lags but eventually diminishes exponentially fast according to the presence of a decay factor governed by a tempering parameter.  When the tempering parameter is sample size dependent, the resulting class of processes admits a wide range of behavior that includes both long memory, semi-long memory, and short memory processes. The limit results relate to tempered fractional processes that include tempered fractional Brownian motion and tempered fractional diffusion process of the second kind.

*This is a joint work with Peter Phillips and Qiying Wang.


David Blei (Columbia)

“The Blessings of Multiple Causes”

Causal inference from observational data is a vital problem, but it comes with strong assumptions.  Most methods require that we observe all confounders, variables that correlate to both the causal variables (the treatment) and the effect of those variables (how well the treatment works).  But whether we have observed all confounders is a famously untestable assumption.  We describe the deconfounder, a way to do causal inference from observational data with weaker assumptions that the classical methods require.

How does the deconfounder work?  While traditional causal methods measure the effect of a single cause on an outcome, many modern scientific studies involve multiple causes, different variables whose effects are simultaneously of interest.  The deconfounder uses the multiple causes as a signal for unobserved confounders, combining unsupervised machine learning and predictive model checking to perform causal inference.

We describe the theoretical requirements for the deconfounder to provide unbiased causal estimates, and show that it requires weaker assumptions than classical causal inference.  We analyze the deconfounder’s performance in three types of studies: semi-simulated data around smoking and lung cancer, semi-simulated data around genomewide association studies, and a real dataset about actors and movie revenue.  The deconfounder provides a checkable approach to estimating close-to-truth causal effects.

This is joint work with Yixin Wang.


Simon Tavaré (Columbia)

“The combinatorics of spaghetti hoops”

Starting with n cooked spaghetti strands, tie randomly chosen ends together to produce a collection of spaghetti hoops. What is the expected number of hoops? What can be said about the distribution of the number of hoops of length 1, 2, …? What is the behaviour of the longest hoops when n is large? What is the probability that all the hoops have different lengths? Questions like this appear in many guises in many areas of mathematics, the connection being their relation to the Ewens Sampling Formula (ESF). I will describe a number of related examples, including prime factorisation, random mappings and random permutations, illustrating the central role played by the ESF. I will also discuss methods for simulating decomposable combinatorial structures by exploiting another wonder of the ESF world, namely the Feller Coupling. Analysis of a children’s playground game shows that apparently small departures from the Feller model can open up a number of unsolved problems.