Student Seminar Series

Choose which semester to display:

Schedule for Fall 2022

Attention: The Student Seminar will be in hybrid mode this semester. Most talks and events will be held in person, and people can also join via zoom. In-person participation is only available to Columbia affiliates with building access.

Seminars are on Wednesdays 
Time: 12:00 - 1:00pm

Location: Room 903, 1255 Amsterdam Avenue

Zoom Link:
Meeting ID: 991 5989 3951
Passcode: 284676

Contacts: Jaesung Son, Luhuan Wu

Information for speakers: For information about schedule, direction, equipment, reimbursement and hotel, please click here.

Director of Graduate Studies, Professor John Cunningham
Our new DGS, Prof. John Cunningham, will share a few words with us and then we can introduce ourselves.


Sharing various summer internship experiences from our fellow students: Nick Galbraith, Zhen Huang, Jialin Ouyang.
Nikolaos Ignatiadis (Post-Doc, Columbia Stats)
Covariate-Powered Empirical Bayes Estimation

Methods are studied for simultaneous analysis of many noisy experiments in the presence of rich covariate information. The goal of the analyst is to optimally estimate the true effect underlying each experiment. Both the noisy experimental results and the auxiliary covariates are useful for this purpose, but neither data source on its own captures all the information available to the analyst. We propose a flexible plug-in empirical Bayes estimator that synthesizes both sources of information and may leverage any black-box predictive model. We show that our approach is within a constant factor of minimax for a simple data-generating model. Furthermore, we establish an extension to the classic result of James-Stein, whereby our proposed estimator dominates the sample mean of the experimental results under quadratic risk; even if the auxiliary covariates contain no information about the true effects. Finally, we exhibit promising empirical performance of the method on both real and simulated data.
Professor Wayne Lee (Columbia Stats)
"Thoughts on the intersection between applied statistics and humanities"
Julia (Two Sigma)

Two Sigma is a financial sciences company. Our community of scientists, technologists, and academics look beyond traditional finance to understand the bigger picture and develop creative solutions to some of the world’s most challenging economic problems.

We rely on the scientific method, rooted in hypothesis, analysis, and experimentation, to drive data-driven decisions, to manage risk, and to expand into new areas of focus. In this way, we create systematic tools and technologies to forecast the future of global markets.

Our Quant Researchers presenting include:

Ding completed her PhD in Statistics at Columbia University in 2021, and she currently works as a quantitative researcher at Two Sigma. Her day-to-day work includes building technical alpha models for equities. She loves statistics, finance, Two Sigma, New York City, and life.

Yuting graduated from Columbia in 2016 with a PhD in Statistics and currently works as a modeler at Two Sigma. Her day-to-day work includes predictive modeling and machine learning research.


Long Zhao, Ari Blau, Jitong Qi (Columbia Stats)

"Summer Intern Workshop"



Professor Marco Avella (Columbia Stats)
Noisy convex optimization for differentially private inference with M-estimators
We propose a general optimization-based framework for computing differentially private M-estimators and a new method for the construction of differentially private confidence regions. First, we show that bounded-influence M-estimators can naturally be used in conjunction with noisy gradient descent and noisy Newton methods in order to obtain optimal private estimators with global linear or quadratic convergence, respectively. We establish finite sample global convergence guarantees, under both local strong convexity and self-concordance, showing that our private estimators converge with high probability to an optimal neighborhood of the non-private M-estimators. We then tackle the problem of parametric inference by constructing differentially private estimators of the asymptotic variance of our private M-estimators. Finally, we discuss ongoing work that explores the potential practical and theoretical benefits of a noisy sketched Newton algorithm.
Xuming He (University of Michigan)
Charles Margossian (Flat Iron Institute)
Title: Markov chains Monte Carlo using modern hardware
Abstract: The current trend in hardware development is to support processors which can run a large number of operations in parallel. How well is MCMC positioned to take advantage of massive parallelization? Over the past two years, several algorithms have been developed to run many chains in parallel on GPUs. But our MCMC workflow -- i.e. length of the burnin / warmup and sampling phases, convergence diagnostics, tuning parameters -- is still rooted in the tradition of running one, maybe a few, long Markov chains. The same can be said of our theory, where asymptotic analyses often focus on infinitely long -- and therefore stationary -- chains. The many chains regime suggests taking limits in another direction: an infinite number of finite non-stationary chains. This perspective paves the way to developing a principled MCMC workflow, eliciting persistent questions: how many chains should we run? How long should the warmup and sampling phases be? How should we initialize the chains?
This work is partly based on a preprint:

Arnab Auddy (Columbia Stats)

Title: Statistical Benefits and Computational Challenges of Tensor Spectral Learning

Abstract: As we observe progressively more complex data, it becomes necessary to model higher order interactions among the observed variables. Orthogonally decomposable tensors provide a unified framework for many such problems, whereby tensor spectral estimators become a natural choice to learn the latent factors of the model. While this is a natural extension of matrix SVD, tensor based estimators automatically provide much better identifiability and estimability properties. In addition to the attractive statistical properties, these methods present us with intriguing computational considerations. In the second part of the talk, I will illustrate these phenomena in the particular application to Independent Component Analysis (ICA). Interestingly there is a gap within the information theoretic and computationally tractable limits of the problem. Additionally we provide noise robust algorithms based on spectral truncation, which provide rate optimal estimators for the mixing matrix of ICA. Our estimators are also asymptotically normal thus allowing confidence interval construction.


No Seminar

Zoraida Rico (Post-Doc, Columbia Stats)
Title: On optimal covariance matrix estimation. 
Abstract: We present an estimator of the covariance matrix of a random d-dimensional vector from an i.i.d. finite sample. Our sole assumption is that this vector satisfies a bounded Lp-L2 moment assumption over its one-dimensional marginals, for p greater than or equal to 4. Given this, we show that the covariance can be estimated from the sample with the same high-probability error rates that the sample covariance matrix achieves in the case of Gaussian data. This holds even though we allow for very general distributions that may not have moments of order greater than p. Moreover, our estimator is optimally robust to adversarial contamination. This result improves the recents works by Mendelson and Zhivotovskiy and Catoni and Giulini, and matches parallel work by Abdalla and Zhivotovskiy. This talk is based on a joint work with Roberto I. Oliveira (IMPA). 
Faculty and student mixer
Professor Hongseok Namkoong (Columbia Business School)