Fall 2024 Semester PhD Courses
For the most updated information on Statistics PhD courses, please go to Vergil.
Faculty Name  Course Number  Course Title  Course Description 
Andrew Gelman  GR6101  APPLIED STATISTICS I  We will go through most of the book, Regression and Other Stories, by Andrew Gelman, Jennifer Hill, and Aki Vehtari, also connecting to important open questions in statistics research. Topics covered in the course include: Applied regression: data collection, modeling and inference, linear regression, logistic regression, Bayesian inference, and poststratification. Causal inference from experiments and observational studies using regression and other identification strategies; Simulation, model fitting, and programming in R; Key statistical problems include adjusting for differences between sample and population; Adjusting for differences between treatment and control groups, extrapolating from past to future, and using observed data to learn about latent constructs of interest; Applied examples, mostly in social science and public health. 
John P Cunningham  GR6103  APPLIED STATISTICS III 
Modern machine learning requires adaptation and experimentation over large, expensive, and/or mixedtype search spaces. Bayesian optimization, which uses a probability model to reason about and carry out experimental design, has in the last four years seen a major shift in its capabilities and performance, and is now widely used throughout industry and academia. This course will first cover the statistical roots of this literature, its connection to Bayesian decision theory, and the required mechanics with Gaussian processes, kernel methods, and optimization. Second, the course will study the fundamentals adaptive experimentation and bayesian optimization. The third part of the course will cover very recent advances in the literature including trust region optimization, diverse optimization, latent space optimization, etc. Applications will include large scale machine learning systems, molecular design, and more. The first two components of the course will center around the recent book Bayesian Optimization by Garnett, and papers will fill out the remainder. Software will focus on BOTorch and related projects, and while the course does not expect any experience in BOTorch, some PyTorch familiarity is required. Course requirements include attendance, short weekly reader reports, and a final course project. Students interested in Bayesian statistics, modern machine learning, and/or optimization will I hope find this content to be exciting, relevant, and challenging. 
Tian Zheng  GR6105  Statistical Consulting  Prerequisites: STAT GR6102 or instructor permission. The Department’s doctoral student consulting practicum: Students undertake pro bono consulting activities for Columbia community researchers under the tutelage of a faculty mentor. 
Cynthia Rush  GR6201  Theoretical Statistics I 
Prerequisites: Students in a masters program must seek the director of the M.A. program in statistics’ permission; Students in an undergraduate program must seek the director of undergraduate studies in statistics’ permission. A general introduction to mathematical statistics and statistical decision theory. Elementary decision theory, Bayes inference, NeymanPearson theory, hypothesis testing, most powerful unbiased tests, confidence sets. Estimation: methods, theory, and asymptotic properties. Likelihood ratio tests, multivariate distribution. Elements of general linear hypothesis, invariance, nonparametric methods, sequential analysis.

Ming Yuan  GR6203  Theoretical Statistics III  Large amounts of multidimensional data represented by multiway arrays or tensors are prevalent in modern applications across various fields such as chemometrics, genomics, physics, psychology, and signal processing. The structural complexity of such data provides vast new opportunities for modeling and analysis, but efficiently extracting information content from them, both statistically and computationally, presents unique and fundamental challenges. Addressing these challenges requires an interdisciplinary approach that brings together tools and insights from statistics, optimization and numerical linear algebra among other fields. Despite these hurdles, significant progress has been made in the last decade. In this course, we will examine some of the key advancements, identify common threads among them, and discuss some open problems. 
Anne Van Delft  GR6301  Probability Theory I  Prerequisites: A thorough knowledge of elementary real analysis and some previous knowledge of probability. Overview of measure and integration theory. Probability spaces and measures, random variables and distribution functions. Independence, BorelCantelli lemma, zeroone laws. Expectation, uniform integrability, sums of independent random variables, stopping times, Wald’s equations, elementary renewal theorems. Laws of large numbers. Characteristic functions. Central limit problem; LindebergFeller theorem, infinitely divisible and stable distributions. Cramer’s theorem, introduction to large deviations. Law of the iterated logarithm, Brownian motion, heat equation. 
Nicolas Trillos  GR6303  Probability Theory III  In simple terms, optimal transport (OT) is the problem of finding the cheapest way to transport a given distribution of mass from some initial location to a different target location. The problem was mathematically formalized by Gaspard Monge in the 18th century and for a long time remained a relatively inaccessible mathematical problem with little theoretical development (and obviously no computational one either) until the work by Kantorovich in the 20th century. In the last decades, OT has become one of the most active areas of research in mathematics, and many interesting connections between OT and multiple areas of pure math have been revealed and developed, showing that, despite its simplicity, OT possesses a very rich mathematical structure with the potential to trespass academic boundaries. Indeed, OT has become a powerful tool used in applications to economics, biology, physics, image analysis, and, more recently, statistics and data analysis. The main goal of this course is to introduce some of the most relevant theoretical and computational aspects of OT and to discuss some recent applications to statistics and data analysis. 
Genevera Allen  GR6701  Probabilistic Models and Machine Learning  Statistical Machine Learning is a PhDlevel course on statistical and probabilistic foundations of machine learning. We will cover statistical machine learning methods, theory, and inference as well as how to apply such methods to real problems. We study both the foundations and modern methods in this field. Our goals are to understand statistical machine learning, to begin research that makes contributions to this field, and to develop good practices for building and applying these models in practice. 
Liam M Paninski  GR8201  Stat AnalysisNeural Data  This is a PhDlevel topics course in statistical analysis of neural data. Students from statistics, neuroscience, and engineering are all welcome to attend. We will discuss modeling, prediction, and decoding of neural data, with applications to multielectrode recordings, calcium and voltage imaging, behavioral video recordings, and more. We will introduce a number of advanced statistical techniques relevant in neuroscience. Each technique will be illustrated via application to problems in neuroscience. The focus will be on the analysis of single and multiple spike train and calcium imaging data, with a few applications to analyzing intracellular voltage and dendritic imaging data. 
Cynthia Rush & Marco Avella Medina  GR9201  Seminar in Theoretical Statistics  Departmental colloquium in statistics. 
Ivan Corwin  GR9301  Seminar in Probability Theory  Departmental colloquium in probability theory. 
Chenyang Zhong & Victor H de la Pena & Graeme Baker

GR9302  Seminar in Applied Probability & Risk  A colloquium in applied probability and risk. 
Philip Protter & Marcel F Nutz & Steven Campbell  GR9303  Seminar in Mathematical Finance  A colloquium on topics in mathematical finance. 
Spring 2024 Semester PhD Courses
For the most updated information on Statistics PhD courses, please go to Vergil.
Faculty Name  Course Number  Course Title  Course Description 
Yuqi Gu  GR6102  Applied Statistics II  This is a firstyear Ph.D. course on statistical machine learning and Bayesian statistics, focusing mainly on the methodology and also covering some applications. Course contents include the following: Linear and nonlinear dimension reduction; Datadriven and modelbased classification and clustering methods; Graphical models including Bayesian networks and Markov random fields; Latent variable models; Variational Bayesian inference; Introduction to deep learning and neural networks; Computational Bayesian statistics including Gibbs sampler and other MCMC algorithms; Bayesian hierarchical modeling. 
Liam Paninski  GR6104  Computational Statistics  Computation plays a central role in modern statistics and machine learning. This course aims to cover topics needed to develop a broad working knowledge of modern computational statistics. We seek to develop a practical understanding of how and why existing methods work, enabling effective use of modern statistical methods. Achieving these goals requires familiarity with diverse topics in statistical computing, computational statistics, computer science, and numerical analysis. Our choice of topics reflects our view of what is central to this evolving field, and what will be interesting and useful. A key theme is scalability to problems of high dimensionality, which are of most interest to many recent applications. 
Regina Dolgoarshinnykh  GR6105  Statistical Consulting  Prerequisites: STAT GR6102 or instructor permission. The Deparatments doctoral student consulting practicum. Students undertake pro bono consulting activities for Columbia community researchers under the tutelage of a faculty mentor. 
Cindy Rush  GR6202  Theoretical Statistics II  Prerequisites: STAT GR6201 Continuation of STAT G6201 
Marcel Nutz  GR6302  Probability Theory II  Graduatelevel introduction to stochastic processes in discrete and continuous time.Topics: Martingales: inequalities, convergence and closure properties, optimal stopping theorems, BurkholderGundy inequalities. Semimartingles: DoobMeyer decomposition, stochastic integration, Ito’s formula. Brownian motion: construction, invariance principles and random walks, study of sample paths, martingale representation results, Girsanov theorem. Markov processes: semigroups and infinitesimal generators. Stochastic differential equations. Connections to partial differential equations: FeynmanKac formula, Dirichlet problem. 
Generva Allen  GR8101  Topics in Applied Statistics  TBD 
Jingchen Liu  GR8201  Topics in Theoretical Statistics  TBD 
Philip Protter  GR8301  Topics in Probability Theory  Usually when one thinks of Mathematical Finance one thinks of modeling the stock market, options, and hedging, almost invariably involving Brownian motion. A key concept is the absence of arbitrage which leads to the use of Girsanov’s Theorem and changes of measure. In this course we will of course touch on all that, more or less due to necessity, but the heart of the course will be devoted to the poorly understood subject of credit risk, taking advantage of recent advances of Coculescu and Nikeghbali. We will discuss the classification of stopping times and show how totally inaccessible stopping times arise naturally in the modeling of credit defaults. Such an analysis touches on Survival Analysis and the theory of Censored Data, especially when martingales are involved. 
David Blei  GR8401  Topics in Machine Learning  Field Experiments, Machine Learning, and Causality; Spring 2024; David Blei / Don Green; This course explores the challenges of extracting unbiased and generalizable causal inferences about cause and effect in policyrelevant domains. This technical level of the course is designed for doctoral students in social science, computer science, and statistics, but it will also be open to masters students and undergraduates with sufficient preparation. The partnership between the two instructors (who are also research collaborators and coauthors) reflects a growing recognition that experimental designs deployed in field settings, although informative and influential, can only support causal generalizations with the help of supplementary assumptions; similarly, observational studies that draw on big data only provide reliable causal insights with the help of supplementary assumptions. The aim of this collaboration is to explore ways that innovative research design, modeling, and machine learning methods can advance the frontiers of knowledge in policyrelevant fields. While courses on causal inference focus on a handful of offtheshelf techniques, the proposed course aims to innovate, offering new ways of thinking about what to study and how. With realworld experimental designs and realworld data, we will study how to evaluate the strengths and weaknesses of modeling choices and methods, and how to use modelbased insights to suggest more informative design choices. 
Bianca Dumitrascu & Yuqi Gu  GR9201  Seminar in Theoretical Statistics  Departmental colloquium in statistics. 
Ivan Corwin  GR9301  Seminar in Probability Theory  This is a weekly seminar in probability theory involving mostly outside speakers who present on a variety of topics including stochastic analysis and PDEs, random matrix theory, random geometry, stochastic optimal control, statistical physics and many others. 
Chenyang Zhong & Sumit Mukherjee  GR9302  Seminar in Applied Probability and Risk  A colloquiim in applied probability and risk. 
Marcel Nutz & Philip Protter  GR9303  Seminar in Mathematical Finance  Research seminar on mathematical finance featuring invited speakers. 
Version 12.6.23