Statistics Seminar – Fall 2023

Schedule for Fall 2023

Seminars are on Mondays
Time: 4:00pm – 5:00pm

Location: Room 903 SSW, 1255 Amsterdam Avenue

9/11/23

 

Youjin Lee (Brown Biostatistics)

Title: Instrumental variables and replicable causal research

Abstract: In observational studies, unmeasured confounders can produce bias in causal estimates, and this bias often is systematic and recurs in replicated studies. Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome in the presence of unmeasured confounders. When several instrumental variables are available and the instruments are subject to biases that do not completely overlap, a careful analysis based on these several instruments can produce orthogonal pieces of evidence (i.e., evidence factors) that, when combined, would strengthen causal conclusions while avoiding systematic bias. In this talk, I will introduce several strategies to construct evidence factors from multiple candidate instrumental variables when invalid instruments may be present. I will demonstrate the use of instrumental variables for replicable causal research in different applications, including a regression discontinuity design.

Bio: Youjin Lee is a Manning Assistant Professor in the Department of Biostatistics at Brown University. Her research focuses on developing robust and replicable causal inference methods with complex data. She received a PhD from Johns Hopkins University in 2019 and was a postdoctoral fellow at University of Pennsylvania before joining Brown.

Special Date:

Tuesday

9/12/23

Time: 2:00 pm – 3:00 pm

Location: 903 SSW

Masashi Sugiyama (RIKEN/The University of Tokyo, Japan)

Title: Machine Learning from Weak, Noisy, and Biased Supervision

Abstract: In statistical inference and machine learning, we face a variety of uncertainties such as training data with insufficient information, label noise, and bias. In this talk, I will give an overview of our research on reliable machine learning, including weakly supervised classification (positive unlabeled classification, positive confidence classification, complementary label classification, etc.), noisy label classification (noise transition estimation, instance-dependent noise, clean sample selection, etc.), and transfer learning (joint importance-predictor estimation for covariate shift adaptation, dynamic importance estimation for full distribution shift, continuous distribution shift, etc.).

Bio: Masashi Sugiyama received his Ph.D. in Computer Science from the Tokyo Institute of Technology in 2001. He has been a professor at the University of Tokyo since 2014, and also the director of the RIKEN Center for Advanced Intelligence Project (AIP) since 2016. He is (co-)author of Machine Learning in Non-Stationary Environments (MIT Press, 2012), Density Ratio Estimation in Machine Learning (Cambridge University Press, 2012), and Machine Learning from Weak Supervision (MIT Press, 2022). In 2022, he received the Award for Science and Technology from the Japanese Minister of Education, Culture, Sports, Science and Technology. He was program co-chair of the Neural Information Processing Systems (NeurIPS) conference in 2015, the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2019, and the Asian Conference on Machine Learning (ACML) in 2010 and 2020.

9/18/23

Jessica Hullman (Northwestern CS)
Title: Evaluating Visualizations for Inference and Decision-Making

Abstract: Research and development in computer science and statistics have produced increasingly sophisticated software interfaces for interactive visual data analysis. Data visualizations have also become ubiquitous for communication in the news and scientific publishing. Despite these successes, our understanding of how to design effective visualizations for data-driven decision-making remains limited. Design philosophies that emphasize data exploration and hypothesis generation can encourage pattern-finding at the expense of quantifying uncertainty. Designing visualizations to maximize perceptual accuracy and self-reported satisfaction can lead people to adopt visualizations that promote overconfident interpretations. I will motivate a few alternative objectives for measuring the effectiveness of visualization, and show how a rational agent framework based in statistical decision theory can help us understand the value of a visualization in the abstract and in light of empirical study results.

Bio: Dr. Jessica Hullman is the Ginni Rometty Associate Professor of Computer Science at Northwestern University. Her research addresses challenges that arise when people draw inductive inferences from data interfaces. Hullman’s work has contributed visualization techniques, applications, and evaluative frameworks for improving data-driven decision-making in applications like visual data analysis, communication of experiment results, data privacy, and responsive design. Her work has been awarded best paper awards at top visualization and HCI venues. She is the recipient of a Microsoft Faculty Fellowship and NSF CAREER award, among others.

9/25/23

Krzysztof Choromanski (Google DeepMind & Columbia IEOR)

Title: The case for random features in modern Transformer architectures

Abstract: Transformer architectures have revolutionized modern machine learning, quickly overtaking regular deep neural networks in practically all its fields: from large language through vision to speech models. One of the main challenges in using them to model long-range interactions (critical for such applications as bioinformatics, e.g. genome modeling) remains the prohibitively expensive quadratic time complexity (in the lengths of their input sequences) of their core attention modules. For the same reason, efficient deployment of massive Transformers on devices with limited computational resources (e.g. in Robotics) is still a difficult problem. Random feature techniques led to the one of the most mathematically rigorous ways to address this problem and the birth of various scalable Transformer architectures (such as the class of low-rank implicit-attention Transformers called Performers). In this talk, I will summarize the recent progress made on scaling up Transformers with random features (RFs) and present related open mathematical problems. The talk will cover in particular: new RF-based methods for approximating softmax and Gaussian kernels (such as FAVOR, FAVOR+ and FAVOR# mechanisms), hybrid random features, the role of quasi Monte Carlo techniques as well as even more recent algorithms producing topologically-aware modulation of the regular attention modules in Transformers via RF-based  linearizations of various graph kernels.

Bio: Krzysztof Choromanski is a staff research scientist at Google DeepMind and an adjunct assistant professor at Columbia University. He obtained his Ph.D from the IEOR Department at Columbia University, where he worked on various problems in structural graph theory (in particular the celebrated Erdos-Hajnal Conjecture and random graphs). His current interests include Robotics, scalable Transformer architectures (also for topologically-rich inputs), the theory of random features  and structural neural networks. Krzysztof is one of the co-founders of the class of Performers-Transformers, the first Transformer architectures providing efficient unbiased estimation of the regular softmax-kernel matrices used in Transformers.

10/2/23

Ben Recht (Berkeley)

Title: Statistics When n Equals 1

Abstract: 21st-century medicine embraces a population perspective on the implications of treatments and diseases. But such population inferences tell us little about what to do with any particular person. In this talk, I will first describe some of the drawbacks of applying population statistics to decision-making about individuals. As an alternative, I will outline how we might design treatments and interventions to help those individuals directly. I will present a series of parallel projects that link ideas from optimization, control, and experiment design to draw inferences and inform decisions about single units. Though most recent work in this vein has focused on precision, focusing on smaller statistical populations, I will explain why optimization might better guide personalization.

Bio: Benjamin Recht is a Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research has focused on applying mathematical optimization and statistics to problems in data analysis and machine learning. He is currently studying histories, methods, and theories of scientific validity and experimental design.

10/9/23

Nikita Zhivotovakiy (UC Berkeley Statistics)

Title: Sharper Risk Bounds for Statistical Aggregation

Abstract: In this talk, we revisit classical results in the theory of statistical aggregation, focusing on the transition from global complexity to a more manageable local one. The goal of aggregation is to combine several base predictors to achieve a prediction nearly as accurate as the best one, without assumptions on the class structure or target. Though studied in both sequential and statistical settings, they traditionally use the same “global” complexity measure. We highlight the lesser-known PAC-Bayes localization enabling us to prove a localized bound for the exponential weights estimator by Leung and Barron, and a deviation-optimal localized bound for Q-aggregation. Finally, we demonstrate that our improvements allow us to obtain bounds based on the number of near-optimal functions in the class, and achieve polynomial improvements in sample size in certain nonparametric situations. This is contrary to the common belief that localization doesn’t benefit nonparametric classes. Joint work with Jaouad Mourtada and Tomas Vaškevičius.

Bio: Nikita Zhivotovskiy is an Assistant Professor in the Department of Statistics at the University of California Berkeley. He previously held postdoctoral positions at ETH Zürich in the department of mathematics hosted by Afonso Bandeira, and at Google Research, Zürich hosted by Olivier Bousquet. He also spent time at the Technion I.I.T. mathematics department hosted by Shahar Mendelson. Nikita completed his thesis at Moscow Institute of Physics and Technology under the guidance of Vladimir Spokoiny and Konstantin Vorontsov.

10/16/23

Cosma Shalizi (CMU Statistics)

Title: Simulation-Based Inference by Matching Random Features

Abstract: We can, and should, do statistical inference on simulation models by adjusting the parameters in the simulation so that the values of randomly chosen functions of the simulation output match the values of those same functions calculated on the data. Results from the “random features” literature in machine learning suggest that using random functions of the data can be an efficient replacement for using optimal functions. Results from the “state-space reconstruction” or “geometry from a time series” literature in nonlinear dynamics indicate that just $2d+1$ such functions will typically suffice to identify a model with a $d$-dimensional parameter space. This talk will sketch the key arguments, show some successful numerical experiments on time series, and suggest directions for further work.

Paper: https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_2111.09220&d=DwIBaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=AAEa1JoRM8PIcCFLv028JW1TbEAfdVDYX-sQziiJ0bk&m=IaRTE9OBzvkhALl6nsuSjapFwxDqcsvTOCPYJPos-NvTkWQ4ixBh-frFHgMRZx_C&s=-t-aGzy-X3Z-62rLDxZ_-8-u98Tj9FM92jU0ulFTxc0&e=

10/23/23

Murali Haran (PSU Statistics)

Title:  Measuring Sample Quality in Asymptotically Inexact Monte Carlo Algorithms

Abstract: An important statistical computing problem is approximating expectations with respect to a given target distribution. Markov chain Monte Carlo algorithms produce asymptotically exact approximations, meaning the Markov chain’s stationary distribution is identical to the target distribution. Asymptotically inexact algorithms generate sequences without this property; even asymptotically the samples generated do not follow the target distribution. I will describe novel tools for analyzing the output from both asymptotically exact and asymptotically inexact Monte Carlo methods, providing a way to tune the algorithms and to compare them. I will begin my talk by explaining my motivating problem: probability models that have intractable normalizing functions. This is a large class of models where inexact algorithms are often more practical than asymptotically exact algorithms. I will conclude with a discussion of the practical application of our approach. (This research is joint with Bokgyeong Kang, John Hughes, and Jaewoo Park.)

10/30/23

Nathan Ross (University of Melbourne Mathematics and Statistics)

Title: Gaussian random field approximation for wide neural networks

Abstract: It has been observed that wide neural networks (NNs) with randomly initialized weights may be well-approximated by Gaussian fields indexed by the input space of the NN, and taking values in the output space. There has been a flurry of recent work making this observation precise, since it sheds light on regimes where neural networks can perform effectively. In this talk, I will discuss recent work where we derive bounds on Gaussian random field approximation of wide random neural networks of any depth, assuming Lipschitz activation functions. The bounds are on a Wasserstein transport distance in function space equipped with a strong (supremum) metric, and are explicit in the widths of the layers and natural parameters such as moments of the weights. The result follows from a general approximation result using Stein’s method, combined with a novel Gaussian smoothing technique for random fields, which I will also describe. The talk covers joint works with Krishnakumar Balasubramanian, Larry Goldstein, and Adil Salim; and A.D. Barbour and Guangqu Zheng.

11/6/23

Alexander Aue (UC Davis Statistics)  CANCELED

Title: Testing high-dimensional general linear hypotheses under a multivariate regression model with spiked noise covariance

Abstract: This talk considers the problem of testing linear hypotheses under a high-dimensional multivariate regression model with spiked noise covariance. The proposed family of tests consists of test statistics based on a weighted sum of projections of the data onto the factor directions, with the weights acting as the regularization parameters. We establish asymptotic normality of the proposed family of test statistics under the null hypothesis. We also establish the power characteristics of the tests under a family of probabilistic local alternatives and derive the minimax choice of the regularization parameters. The performance of the proposed tests is evaluated in comparison with several competing tests. Finally, the proposed tests are applied to the Human Connectome Project data to test for the presence of associations between volumetric measurements of the human brain and certain behavioral variables. The talk is based on joint work with Haoran Li, Debashis Paul & JIe Peng.

11/13/23 *Note: This talk will be online only

Tim van Erven (University of Amsterdam)

Title: The Risks of Recourse in Explainable Machine Learning

Abstract: Algorithmic recourse provides explanations that help users overturn an unfavorable decision by a machine learning system. For instance, customers whose loan application is denied might want to know how they can get a loan in the future. Instead of considering the effects of recourse on individual users, as is typical in the literature, we study the effects at the population level. Surprisingly, we find that the effect is typically negative, because providing recourse tends to reduce classification accuracy. In the case of loan applications, this would lead to many extra customers who end up defaulting. We further study whether the party deploying the classifier has an incentive to strategize in anticipation of having to provide recourse, and we find that sometimes they do, to the detriment of their users. Providing algorithmic recourse may therefore also be harmful at the systemic level. All in all, we conclude that the current concept of algorithmic recourse is not reliably beneficial, and therefore requires rethinking. 

This talk is based on:
H. Fokkema, D. Garreau and T. van Erven. The Risks of Recourse in Binary
Classification. ArXiv:2306.00497 preprint, 2023.

Join Zoom Meeting

https://columbiauniversity.zoom.us/j/99934594376?pwd=WlhZSnFNZ2VOYlBNQTRYVlBvYjBTdz09

Meeting ID: 999 3459 4376

Passcode: 570503

 

 

11/20/23

Amin Karbasi (Yale EE&CS)

 

Title: “When we talk about reproducibility, what are we talking about?”

Abstract: The reproducibility crisis, a significant challenge in fields like biology, chemistry, and artificial intelligence, compels us to confront a fundamental question: Can we truly trust the outcomes of scientific research? Reproducibility, the cornerstone of scientific credibility, necessitates that experiments conducted under similar conditions and methodologies yield statistically indistinguishable results.
In this talk, our focus is specifically on artificial intelligence, where we investigate reproducibility in the context of learning algorithms. We seek to identify which learning problems permit statistically indistinguishable learning algorithms. Furthermore, we explore a crucial, often overlooked aspect: the trade-off between reproducibility and the speed of convergence in optimization algorithms. Our goal is to understand how the quest for rapid algorithmic solutions may affect the reliability and replicability of results.

It’s important to highlight that this investigation is just the starting point of an algorithmic perspective on reproducibility. I hope this talk inspires more students/researchers to delve into this topic, advancing our collective understanding.

 

11/27/23

Chi Jin (Princeton ECE)

Title: Maximum Likelihood Estimation is All You Need for Well-Specified Covariate Shift

Abstract: A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization—generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of “what are the most effective algorithms for OOD generalization” remains open even under the standard setting of covariate shift. This talk will address this question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. This result holds for a very large class of parametric models, including but not limited to linear regression, logistic regression, and phase retrieval, and does not require any boundedness condition on the density ratio. This paper further complement the study by proving that for the misspecified setting, MLE can perform poorly, and the Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in specific scenarios, outperforming MLE.

Bio: Chi Jin is an assistant professor at the Electrical and Computer Engineering department of Princeton University. He obtained his PhD degree in Computer Science at University of California, Berkeley, advised by Michael I. Jordan. His research mainly focuses on theoretical machine learning, especially on nonconvex optimization, Reinforcement Learning (RL), and recently representation learning. In nonconvex optimization, he provided the first proof showing that first-order algorithm (stochastic gradient descent) is capable of escaping saddle points efficiently. In RL, he provided the first efficient learning guarantees for Q-learning and least-squares value iteration algorithms when exploration is necessary. His works also lay the theoretical foundation for RL with function approximation, multiagent RL and partially observable RL.

12/4/23

Matthew Reimherr (PSU Statistics)

Title: Adaptive Transfer Learning in Nonparametric Regression

Abstract: Kernel ridge regression (KRR), being widely used in nonparametric regression, can suffer a slow learning rate given insufficient training samples. In this presentation, after giving an introduction to Transfer Learning, I discuss a new approach called Smoothness Adaptive Transfer Learning (SATL), a two-step KRR-based transfer learning algorithm. Notably, unlike the current prevalent two-step TL algorithms that utilize the same kernel regularization across steps and rely on knowing a priori how smooth the target/source regression function and their offset function are, SATL uses the Gaussian kernel in both steps, which allows the estimators of different functions to adapt to their respective smoothness levels. I present a minimax lower bound in L2-error and show SATL enjoys a matching upper bound. The minimax convergence rate sheds light on the factors influencing transfer learning gains and demonstrates the superiority of SATL compared to target-only KRR. I also discuss an interesting auxiliary theoretical result for classic KRR, showing that if the true regression function lies in a given Sobolev space, employing a fixed bandwidth Gaussian kernel in a target-only KRR can still attain the minimax optimal convergence rate.

12/11/23

Nancy Zhang (University of Pennsylvania Statistics and Data Science)

Title: Signal recovery in single cell data integration

Abstract: Data integration to align cells across batches has become a cornerstone of single cell studies, critically affecting downstream analyses.  Yet, how much signal is erased during integration?  Currently, there are no guidelines for when biological signals are separable from batch effects, and thus, studies usually take a black-box, trial-and-error approach towards data integration.  I will show that current paradigms for single cell data integration are unnecessarily aggressive, removing biologically meaningful variation.  To remedy this, I will present a novel statistical model and computationally scalable algorithm, CellANOVA, to recover biological signals that are lost during data integration. CellANOVA utilizes a “pool-of-controls” design concept, common in single cell studies, to separate unwanted variation from biological variation of interest.  When applied with existing integration methods, CellANOVA allows the preservation of subtle biological signals and substantially corrects the distortion introduced by integration.  

Bio: Dr. Zhang is a Ge Li and Ning Zhao Professor of Statistics in The Wharton School at University of Pennsylvania.  Dr. Zhang obtained her Ph.D. in Statistics in 2005 from Stanford University.  After one year of postdoctoral training at University of California, Berkeley, she returned to the Department of Statistics at Stanford University as Assistant Professor in 2006.  She received the Sloan Fellowship in 2011, and formally moved to University of Pennsylvania in 2012.  She was awarded the Medallion Lectureship by the Institute of Mathematical Statistics in 2021.  Her research focuses primarily on the development of statistical methods and computational algorithms for the analysis of genomic data.