Student Seminar – Spring 2021

Schedule for Spring 2021

The Student Seminar has migrated to Zoom for the Spring 2021 semester.

Seminars are on Wednesdays
Time: 12:00 – 1:00pm

Contacts: Diane Lu, Leon Fernandes

Information for speakers: For information about schedule, direction, equipment, reimbursement and hotel, please click here.


Andrew Gelman (Columbia)

“Bayesian Workflow”


Zoom link:


Johannes Wiesel (Columbia)

“Continuity of the martingale optimal transport problem on the real line”
We show continuity of the martingale optimal transport optimization problem as a functional of its marginals. This is achieved via an estimate on the projection in the nested/causal Wasserstein distance of an arbitrary coupling on to the set of martingale couplings with the same marginals. As a corollary, we obtain an independent proof of sufficiency of the monotonicity principle established in [Beiglboeck, M., & Juillet, N. (2016). On a problem of optimal transport under marginal martingale constraints. Ann. Probab., 44 (2016), no. 1, 42106] for cost functions of polynomial growth.

Jingchen Liu (Columbia) 

“Process Data Analysis in Computer-based Assessment”
In classic tests, item responses are often expressed as univariate categorical variables. Computer-based tests allow us to track students’ entire problem solving processes through log files. In this case, the response to each item is a time-stamped process containing students’ detailed behaviors and their interaction with the simulated environment. The key questions are whether and how much more information are contained in the detailed response processes additional to the traditional responses (yes/no or partial credits). Furthermore, we also need to develop methods to systematically extract such information from the process responses that often contain a substantial amount of noise. In this talk, I present several exploratory analyses of process data.



Carsten Chong (Columbia)

“Mixed semimartingales: Detecting hidden fractional processes at high frequency”

Abstract: This talk revolves around the following statistical problem: Can we detect a fractional Brownian motion of Hurst index $H > 1/2$ that is hidden behind a Brownian sample path, based on high-frequency observations on a finite time interval? Such processes, introduced by Cheridito (2001) as mixed fractional Brownian motions, can be used, for example, to model long-memory behavior and/or market inefficiencies in financial time series. We generalize this class of processes to so-called mixed semimartingales, in order to allow for stochastic volatility, and use power variation functionals to construct consistent estimators and asymptotic confidence intervals for $H$ and other quantities of interest. The performance of our estimators is evaluated in a simulation study.

This talk is based on ongoing work with Thomas Delerue (TU Munich) and Mathias Trabs (Hamburg).

Zoom link:

Michael Sobel (Columbia)

“Association and Causation: Attributes and Effects of Judges in Equal Employment Opportunity Commission Litigation Outcomes, 1996-2006.”

Abstract:  A longstanding question in the literature on judicial decision making is whether judges with different features of an attribute, e.g., race or sex, differentially handle cases brought to court. To address this, researchers typi`cally predict case outcomes using judge attributes and case covariates, then interpret the partial associations between attributes and outcomes as effects. But attributes are not treatments. Further, judges with different features of an attribute may be assigned different types of cases. The associations then reflect feature differences and dissimilar case loads. Ideally, one wants to know how judges with different features would handle the same cases. We construct a general methodology for studying the role of attributes in judicial decision making that capitalizes on this idea, applying it to study the role of race in employment discrimination cases filed by the Equal Employment Opportunity Commission between October 1, 1996 and September 30, 2006 in the U.S. federal district courts. Each case is randomly assigned to an eligible judge. For each case with at least one eligible majority group judge and one eligible minority group judge, we define potential outcomes for every judge eligible to hear that case, then use the unit treatment effects comparing judges with different features to define a unit feature comparison (UFC). The UFC’s are then used to define new population estimands. To estimate these, we impute missing potential outcomes from the posterior predictive distribution of a two part Bayesian hierarchical model, using the imputed and observed outcomes to estimate the UFC’s, which are then used to estimate population quantities. We analyze two outcomes: 1) whether the monetary relief awarded to plaintiff is 0 or positive, and 2) the relief amount. A case assigned to a minority judge is more likely to result in a non-zero award than if that case were assigned to an eligible majority race judge. But the median difference in award amounts suggests minority judges would grant less relief than majority race judges.

Zoom Link:


Collin Andrew Cademartori (Columbia)

“Union Info Session”


Marcel Nutz (Columbia)

“Discussion on thesis and career planning”

Abstract: We will discuss thesis and career planning. This is aimed at PhD students only.

Zoom Link:

.Spring Break – No Seminar

Zhiliang Ying (Columbia)

“Latent variable models and survival analysis”

Abstract: This talk covers two areas of my research work. The first is related to the measurement theory, with applications to educational and psychological assessment. Various latent class and latent factor models will be presented along with their applications. The second area is the survival analysis, which has broad applications to many disciplines. Counting process-based approaches to modeling right-censored data will be introduced, with emphasis on non and semiparametric analysis. If time permits, other types of censoring/truncation will also be discussed.

Zoom Link:


Samory Kpotufe (Columbia)

“Discussion on research directions”

Zoom Link:

Ming Yuan (Columbia)
“Tensor Methods for High Dimensional Data Analysis”

Cynthia Rush (Columbia) 

“All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation”

Abstract: In this talk, we discuss the statistical and computational limits for estimation of a rank-one matrix (the spike) corrupted by an additive Gaussian noise matrix, in a sparse limit, where the underlying hidden vector (that constructs the rank-one matrix) has a number of non-zero components that scale sub-linearly with the total dimension of the vector, and the signal-to-noise ratio tends to infinity at an appropriate speed. We discuss the proof of these results, which provides explicit low-dimensional variational formulas for the asymptotic mutual information between the spike and the observed noisy matrix and analyzes an approximate message passing algorithm in the sparse regime. For Bernoulli and Bernoulli-Rademacher distributed vectors, and when the sparsity and signal strength satisfy an appropriate scaling relation, we find all-or-nothing phase transitions for the asymptotic minimum and algorithmic mean-square errors. These jump from their maximum possible value to zero, at well-defined signal-to-noise thresholds whose asymptotic values we determine exactly. In the asymptotic regime, the statistical-to-algorithmic gap diverges indicating that sparse recovery is hard for approximate message passing. This is joint work with Jean Barbier (ICTP) and Nicolas Macris (EPFL).

Zoom link:


Simon Tavaré (Columbia)

“On finding new species”

Abstract: Fisher, Corbet and Williams (1943) studied the relationship between the number of species and the number of specimens found in typical ecological samples, illustrating their analysis with data from a Microlepidoptera sample from England, and another from Malayan butterflies. It is convenient to use the notation c = (c0 ,c1 ,c2 ,…) to denote species counts, cj denoting the number of species observed j times in the sample; the total number of species observed is

S = c1 + c2 + ···

and the number of specimens sampled is

N = c1 + 2c2 + 3c3 + ··· .

Fisher introduced the log-series distribution in that paper, under which

E cj a xj / j, j=1, 2, …

and provided a method for estimating the parameters of the model based on values of N and S.

In this talk I will describe a model in which specimens are sampled sequentially, and the type (species) of each specimen is observed. Here N=n records the number of specimens sampled to date, and S = Sn the random number of species observed. I will describe results about the covariance between the number of species observed in the first m trials and the next n, the behavior of multiple sequential samples, and prediction of the number of new species expected in the next n specimens given information about the number of species in the first m. A number of limiting regimes as n, m ® ¥ will be discussed.

This is joint work with Arash Jamshidpey and Poly da Silva from Columbia Statistics Department.