Fall 2024 Semester PhD Courses
For the most updated information on Statistics PhD courses, please go to Vergil.
Faculty Name | Course Number | Course Title | Course Description |
Andrew Gelman | GR6101 | APPLIED STATISTICS I | We will go through most of the book, Regression and Other Stories, by Andrew Gelman, Jennifer Hill, and Aki Vehtari, also connecting to important open questions in statistics research. Topics covered in the course include: Applied regression: data collection, modeling and inference, linear regression, logistic regression, Bayesian inference, and poststratification. Causal inference from experiments and observational studies using regression and other identification strategies; Simulation, model fitting, and programming in R; Key statistical problems include adjusting for differences between sample and population; Adjusting for differences between treatment and control groups, extrapolating from past to future, and using observed data to learn about latent constructs of interest; Applied examples, mostly in social science and public health. |
John P Cunningham | GR6103 | APPLIED STATISTICS III |
Modern machine learning requires adaptation and experimentation over large, expensive, and/or mixed-type search spaces. Bayesian optimization, which uses a probability model to reason about and carry out experimental design, has in the last four years seen a major shift in its capabilities and performance, and is now widely used throughout industry and academia. This course will first cover the statistical roots of this literature, its connection to Bayesian decision theory, and the required mechanics with Gaussian processes, kernel methods, and optimization. Second, the course will study the fundamentals adaptive experimentation and bayesian optimization. The third part of the course will cover very recent advances in the literature including trust region optimization, diverse optimization, latent space optimization, etc. Applications will include large scale machine learning systems, molecular design, and more. The first two components of the course will center around the recent book Bayesian Optimization by Garnett, and papers will fill out the remainder. Software will focus on BOTorch and related projects, and while the course does not expect any experience in BOTorch, some PyTorch familiarity is required. Course requirements include attendance, short weekly reader reports, and a final course project. Students interested in Bayesian statistics, modern machine learning, and/or optimization will I hope find this content to be exciting, relevant, and challenging. |
Tian Zheng | GR6105 | Statistical Consulting | Prerequisites: STAT GR6102 or instructor permission. The Department’s doctoral student consulting practicum: Students undertake pro bono consulting activities for Columbia community researchers under the tutelage of a faculty mentor. |
Cynthia Rush | GR6201 | Theoretical Statistics I |
Prerequisites: Students in a masters program must seek the director of the M.A. program in statistics’ permission; Students in an undergraduate program must seek the director of undergraduate studies in statistics’ permission. A general introduction to mathematical statistics and statistical decision theory. Elementary decision theory, Bayes inference, Neyman-Pearson theory, hypothesis testing, most powerful unbiased tests, confidence sets. Estimation: methods, theory, and asymptotic properties. Likelihood ratio tests, multivariate distribution. Elements of general linear hypothesis, invariance, nonparametric methods, sequential analysis.
|
Ming Yuan | GR6203 | Theoretical Statistics III | Large amounts of multidimensional data represented by multiway arrays or tensors are prevalent in modern applications across various fields such as chemometrics, genomics, physics, psychology, and signal processing. The structural complexity of such data provides vast new opportunities for modeling and analysis, but efficiently extracting information content from them, both statistically and computationally, presents unique and fundamental challenges. Addressing these challenges requires an interdisciplinary approach that brings together tools and insights from statistics, optimization and numerical linear algebra among other fields. Despite these hurdles, significant progress has been made in the last decade. In this course, we will examine some of the key advancements, identify common threads among them, and discuss some open problems. |
Anne Van Delft | GR6301 | Probability Theory I | Prerequisites: A thorough knowledge of elementary real analysis and some previous knowledge of probability. Overview of measure and integration theory. Probability spaces and measures, random variables and distribution functions. Independence, Borel-Cantelli lemma, zero-one laws. Expectation, uniform integrability, sums of independent random variables, stopping times, Wald’s equations, elementary renewal theorems. Laws of large numbers. Characteristic functions. Central limit problem; Lindeberg-Feller theorem, infinitely divisible and stable distributions. Cramer’s theorem, introduction to large deviations. Law of the iterated logarithm, Brownian motion, heat equation. |
Nicolas Trillos | GR6303 | Probability Theory III | In simple terms, optimal transport (OT) is the problem of finding the cheapest way to transport a given distribution of mass from some initial location to a different target location. The problem was mathematically formalized by Gaspard Monge in the 18th century and for a long time remained a relatively inaccessible mathematical problem with little theoretical development (and obviously no computational one either) until the work by Kantorovich in the 20th century. In the last decades, OT has become one of the most active areas of research in mathematics, and many interesting connections between OT and multiple areas of pure math have been revealed and developed, showing that, despite its simplicity, OT possesses a very rich mathematical structure with the potential to trespass academic boundaries. Indeed, OT has become a powerful tool used in applications to economics, biology, physics, image analysis, and, more recently, statistics and data analysis. The main goal of this course is to introduce some of the most relevant theoretical and computational aspects of OT and to discuss some recent applications to statistics and data analysis. |
Genevera Allen | GR6701 | Probabilistic Models and Machine Learning | Statistical Machine Learning is a PhD-level course on statistical and probabilistic foundations of machine learning. We will cover statistical machine learning methods, theory, and inference as well as how to apply such methods to real problems. We study both the foundations and modern methods in this field. Our goals are to understand statistical machine learning, to begin research that makes contributions to this field, and to develop good practices for building and applying these models in practice. |
Liam M Paninski | GR8201 | Stat Analysis-Neural Data | This is a PhD-level topics course in statistical analysis of neural data. Students from statistics, neuroscience, and engineering are all welcome to attend. We will discuss modeling, prediction, and decoding of neural data, with applications to multi-electrode recordings, calcium and voltage imaging, behavioral video recordings, and more. We will introduce a number of advanced statistical techniques relevant in neuroscience. Each technique will be illustrated via application to problems in neuroscience. The focus will be on the analysis of single and multiple spike train and calcium imaging data, with a few applications to analyzing intracellular voltage and dendritic imaging data. |
Cynthia Rush & Marco Avella Medina | GR9201 | Seminar in Theoretical Statistics | Departmental colloquium in statistics. |
Ivan Corwin | GR9301 | Seminar in Probability Theory | Departmental colloquium in probability theory. |
Chenyang Zhong & Victor H de la Pena & Graeme Baker
|
GR9302 | Seminar in Applied Probability & Risk | A colloquium in applied probability and risk. |
Philip Protter & Marcel F Nutz & Steven Campbell | GR9303 | Seminar in Mathematical Finance | A colloquium on topics in mathematical finance. |
Spring 2024 Semester PhD Courses
For the most updated information on Statistics PhD courses, please go to Vergil.
Faculty Name | Course Number | Course Title | Course Description |
Yuqi Gu | GR6102 | Applied Statistics II | This is a first-year Ph.D. course on statistical machine learning and Bayesian statistics, focusing mainly on the methodology and also covering some applications. Course contents include the following: Linear and nonlinear dimension reduction; Data-driven and model-based classification and clustering methods; Graphical models including Bayesian networks and Markov random fields; Latent variable models; Variational Bayesian inference; Introduction to deep learning and neural networks; Computational Bayesian statistics including Gibbs sampler and other MCMC algorithms; Bayesian hierarchical modeling. |
Liam Paninski | GR6104 | Computational Statistics | Computation plays a central role in modern statistics and machine learning. This course aims to cover topics needed to develop a broad working knowledge of modern computational statistics. We seek to develop a practical understanding of how and why existing methods work, enabling effective use of modern statistical methods. Achieving these goals requires familiarity with diverse topics in statistical computing, computational statistics, computer science, and numerical analysis. Our choice of topics reflects our view of what is central to this evolving field, and what will be interesting and useful. A key theme is scalability to problems of high dimensionality, which are of most interest to many recent applications. |
Regina Dolgoarshinnykh | GR6105 | Statistical Consulting | Prerequisites: STAT GR6102 or instructor permission. The Deparatments doctoral student consulting practicum. Students undertake pro bono consulting activities for Columbia community researchers under the tutelage of a faculty mentor. |
Cindy Rush | GR6202 | Theoretical Statistics II | Prerequisites: STAT GR6201 Continuation of STAT G6201 |
Marcel Nutz | GR6302 | Probability Theory II | Graduate-level introduction to stochastic processes in discrete and continuous time.Topics: Martingales: inequalities, convergence and closure properties, optimal stopping theorems, Burkholder-Gundy inequalities. Semimartingles: Doob-Meyer decomposition, stochastic integration, Ito’s formula. Brownian motion: construction, invariance principles and random walks, study of sample paths, martingale representation results, Girsanov theorem. Markov processes: semigroups and infinitesimal generators. Stochastic differential equations. Connections to partial differential equations: Feynman-Kac formula, Dirichlet problem. |
Generva Allen | GR8101 | Topics in Applied Statistics | TBD |
Jingchen Liu | GR8201 | Topics in Theoretical Statistics | TBD |
Philip Protter | GR8301 | Topics in Probability Theory | Usually when one thinks of Mathematical Finance one thinks of modeling the stock market, options, and hedging, almost invariably involving Brownian motion. A key concept is the absence of arbitrage which leads to the use of Girsanov’s Theorem and changes of measure. In this course we will of course touch on all that, more or less due to necessity, but the heart of the course will be devoted to the poorly understood subject of credit risk, taking advantage of recent advances of Coculescu and Nikeghbali. We will discuss the classification of stopping times and show how totally inaccessible stopping times arise naturally in the modeling of credit defaults. Such an analysis touches on Survival Analysis and the theory of Censored Data, especially when martingales are involved. |
David Blei | GR8401 | Topics in Machine Learning | Field Experiments, Machine Learning, and Causality; Spring 2024; David Blei / Don Green; This course explores the challenges of extracting unbiased and generalizable causal inferences about cause and effect in policy-relevant domains. This technical level of the course is designed for doctoral students in social science, computer science, and statistics, but it will also be open to masters students and undergraduates with sufficient preparation. The partnership between the two instructors (who are also research collaborators and co-authors) reflects a growing recognition that experimental designs deployed in field settings, although informative and influential, can only support causal generalizations with the help of supplementary assumptions; similarly, observational studies that draw on big data only provide reliable causal insights with the help of supplementary assumptions. The aim of this collaboration is to explore ways that innovative research design, modeling, and machine learning methods can advance the frontiers of knowledge in policy-relevant fields. While courses on causal inference focus on a handful of off-the-shelf techniques, the proposed course aims to innovate, offering new ways of thinking about what to study and how. With real-world experimental designs and real-world data, we will study how to evaluate the strengths and weaknesses of modeling choices and methods, and how to use model-based insights to suggest more informative design choices. |
Bianca Dumitrascu & Yuqi Gu | GR9201 | Seminar in Theoretical Statistics | Departmental colloquium in statistics. |
Ivan Corwin | GR9301 | Seminar in Probability Theory | This is a weekly seminar in probability theory involving mostly outside speakers who present on a variety of topics including stochastic analysis and PDEs, random matrix theory, random geometry, stochastic optimal control, statistical physics and many others. |
Chenyang Zhong & Sumit Mukherjee | GR9302 | Seminar in Applied Probability and Risk | A colloquiim in applied probability and risk. |
Marcel Nutz & Philip Protter | GR9303 | Seminar in Mathematical Finance | Research seminar on mathematical finance featuring invited speakers. |
Version 12.6.23