Schedule for Fall 2023
Seminars are on Mondays
Time: 4:00pm - 5:00pm
Location: Room 903 SSW, 1255 Amsterdam Avenue
9/11/23
|
Youjin Lee (Brown Biostatistics)
Abstract: In observational studies, unmeasured confounders can produce bias in causal estimates, and this bias often is systematic and recurs in replicated studies. Instrumental variables have been widely used to estimate the causal effect of a treatment on an outcome in the presence of unmeasured confounders. When several instrumental variables are available and the instruments are subject to biases that do not completely overlap, a careful analysis based on these several instruments can produce orthogonal pieces of evidence (i.e., evidence factors) that, when combined, would strengthen causal conclusions while avoiding systematic bias. In this talk, I will introduce several strategies to construct evidence factors from multiple candidate instrumental variables when invalid instruments may be present. I will demonstrate the use of instrumental variables for replicable causal research in different applications, including a regression discontinuity design. Bio: Youjin Lee is a Manning Assistant Professor in the Department of Biostatistics at Brown University. Her research focuses on developing robust and replicable causal inference methods with complex data. She received a PhD from Johns Hopkins University in 2019 and was a postdoctoral fellow at University of Pennsylvania before joining Brown. |
Special Date: Tuesday 9/12/23 Time: 2:00 pm - 3:00 pm Location: 903 SSW |
Masashi Sugiyama (RIKEN/The University of Tokyo, Japan) Title: Machine Learning from Weak, Noisy, and Biased Supervision Abstract: In statistical inference and machine learning, we face a variety of uncertainties such as training data with insufficient information, label noise, and bias. In this talk, I will give an overview of our research on reliable machine learning, including weakly supervised classification (positive unlabeled classification, positive confidence classification, complementary label classification, etc.), noisy label classification (noise transition estimation, instance-dependent noise, clean sample selection, etc.), and transfer learning (joint importance-predictor estimation for covariate shift adaptation, dynamic importance estimation for full distribution shift, continuous distribution shift, etc.). Bio: Masashi Sugiyama received his Ph.D. in Computer Science from the Tokyo Institute of Technology in 2001. He has been a professor at the University of Tokyo since 2014, and also the director of the RIKEN Center for Advanced Intelligence Project (AIP) since 2016. He is (co-)author of Machine Learning in Non-Stationary Environments (MIT Press, 2012), Density Ratio Estimation in Machine Learning (Cambridge University Press, 2012), and Machine Learning from Weak Supervision (MIT Press, 2022). In 2022, he received the Award for Science and Technology from the Japanese Minister of Education, Culture, Sports, Science and Technology. He was program co-chair of the Neural Information Processing Systems (NeurIPS) conference in 2015, the International Conference on Artificial Intelligence and Statistics (AISTATS) in 2019, and the Asian Conference on Machine Learning (ACML) in 2010 and 2020. |
9/18/23 |
Jessica Hullman (Northwestern CS) Abstract: Research and development in computer science and statistics have produced increasingly sophisticated software interfaces for interactive visual data analysis. Data visualizations have also become ubiquitous for communication in the news and scientific publishing. Despite these successes, our understanding of how to design effective visualizations for data-driven decision-making remains limited. Design philosophies that emphasize data exploration and hypothesis generation can encourage pattern-finding at the expense of quantifying uncertainty. Designing visualizations to maximize perceptual accuracy and self-reported satisfaction can lead people to adopt visualizations that promote overconfident interpretations. I will motivate a few alternative objectives for measuring the effectiveness of visualization, and show how a rational agent framework based in statistical decision theory can help us understand the value of a visualization in the abstract and in light of empirical study results. Bio: Dr. Jessica Hullman is the Ginni Rometty Associate Professor of Computer Science at Northwestern University. Her research addresses challenges that arise when people draw inductive inferences from data interfaces. Hullman's work has contributed visualization techniques, applications, and evaluative frameworks for improving data-driven decision-making in applications like visual data analysis, communication of experiment results, data privacy, and responsive design. Her work has been awarded best paper awards at top visualization and HCI venues. She is the recipient of a Microsoft Faculty Fellowship and NSF CAREER award, among others. |
9/25/23 |
Krzysztof Choromanski (Google DeepMind & Columbia IEOR) Title: The case for random features in modern Transformer architectures Abstract: Transformer architectures have revolutionized modern machine learning, quickly overtaking regular deep neural networks in practically all its fields: from large language through vision to speech models. One of the main challenges in using them to model long-range interactions (critical for such applications as bioinformatics, e.g. genome modeling) remains the prohibitively expensive quadratic time complexity (in the lengths of their input sequences) of their core attention modules. For the same reason, efficient deployment of massive Transformers on devices with limited computational resources (e.g. in Robotics) is still a difficult problem. Random feature techniques led to the one of the most mathematically rigorous ways to address this problem and the birth of various scalable Transformer architectures (such as the class of low-rank implicit-attention Transformers called Performers). In this talk, I will summarize the recent progress made on scaling up Transformers with random features (RFs) and present related open mathematical problems. The talk will cover in particular: new RF-based methods for approximating softmax and Gaussian kernels (such as FAVOR, FAVOR+ and FAVOR# mechanisms), hybrid random features, the role of quasi Monte Carlo techniques as well as even more recent algorithms producing topologically-aware modulation of the regular attention modules in Transformers via RF-based linearizations of various graph kernels. Bio: Krzysztof Choromanski is a staff research scientist at Google DeepMind and an adjunct assistant professor at Columbia University. He obtained his Ph.D from the IEOR Department at Columbia University, where he worked on various problems in structural graph theory (in particular the celebrated Erdos-Hajnal Conjecture and random graphs). His current interests include Robotics, scalable Transformer architectures (also for topologically-rich inputs), the theory of random features and structural neural networks. Krzysztof is one of the co-founders of the class of Performers-Transformers, the first Transformer architectures providing efficient unbiased estimation of the regular softmax-kernel matrices used in Transformers. |
10/2/23 |
Ben Recht (Berkeley) Title: Statistics When n Equals 1 Abstract: 21st-century medicine embraces a population perspective on the implications of treatments and diseases. But such population inferences tell us little about what to do with any particular person. In this talk, I will first describe some of the drawbacks of applying population statistics to decision-making about individuals. As an alternative, I will outline how we might design treatments and interventions to help those individuals directly. I will present a series of parallel projects that link ideas from optimization, control, and experiment design to draw inferences and inform decisions about single units. Though most recent work in this vein has focused on precision, focusing on smaller statistical populations, I will explain why optimization might better guide personalization. Bio: Benjamin Recht is a Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. His research has focused on applying mathematical optimization and statistics to problems in data analysis and machine learning. He is currently studying histories, methods, and theories of scientific validity and experimental design. |
10/9/23 |
Nikita Zhivotovakiy (UC Berkeley Statistics) Title: Sharper Risk Bounds for Statistical Aggregation Abstract: In this talk, we revisit classical results in the theory of statistical aggregation, focusing on the transition from global complexity to a more manageable local one. The goal of aggregation is to combine several base predictors to achieve a prediction nearly as accurate as the best one, without assumptions on the class structure or target. Though studied in both sequential and statistical settings, they traditionally use the same "global" complexity measure. We highlight the lesser-known PAC-Bayes localization enabling us to prove a localized bound for the exponential weights estimator by Leung and Barron, and a deviation-optimal localized bound for Q-aggregation. Finally, we demonstrate that our improvements allow us to obtain bounds based on the number of near-optimal functions in the class, and achieve polynomial improvements in sample size in certain nonparametric situations. This is contrary to the common belief that localization doesn't benefit nonparametric classes. Joint work with Jaouad Mourtada and Tomas Vaškevičius. Bio: Nikita Zhivotovskiy is an Assistant Professor in the Department of Statistics at the University of California Berkeley. He previously held postdoctoral positions at ETH Zürich in the department of mathematics hosted by Afonso Bandeira, and at Google Research, Zürich hosted by Olivier Bousquet. He also spent time at the Technion I.I.T. mathematics department hosted by Shahar Mendelson. Nikita completed his thesis at Moscow Institute of Physics and Technology under the guidance of Vladimir Spokoiny and Konstantin Vorontsov. |
10/16/23 |
Cosma Shalizi (CMU Statistics) Title: Simulation-Based Inference by Matching Random Features Abstract: We can, and should, do statistical inference on simulation models by adjusting the parameters in the simulation so that the values of randomly chosen functions of the simulation output match the values of those same functions calculated on the data. Results from the "random features" literature in machine learning suggest that using random functions of the data can be an efficient replacement for using optimal functions. Results from the "state-space reconstruction" or "geometry from a time series" literature in nonlinear dynamics indicate that just $2d+1$ such functions will typically suffice to identify a model with a $d$-dimensional parameter space. This talk will sketch the key arguments, show some successful numerical experiments on time series, and suggest directions for further work. Paper: https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_2111.09220&d=DwIBaQ&c=009klHSCxuh5AI1vNQzSO0KGjl4nbi2Q0M1QLJX9BeE&r=AAEa1JoRM8PIcCFLv028JW1TbEAfdVDYX-sQziiJ0bk&m=IaRTE9OBzvkhALl6nsuSjapFwxDqcsvTOCPYJPos-NvTkWQ4ixBh-frFHgMRZx_C&s=-t-aGzy-X3Z-62rLDxZ_-8-u98Tj9FM92jU0ulFTxc0&e= |
10/23/23 |
Murali Haran (PSU Statistics) Title: Measuring Sample Quality in Asymptotically Inexact Monte Carlo Algorithms Abstract: An important statistical computing problem is approximating expectations with respect to a given target distribution. Markov chain Monte Carlo algorithms produce asymptotically exact approximations, meaning the Markov chain's stationary distribution is identical to the target distribution. Asymptotically inexact algorithms generate sequences without this property; even asymptotically the samples generated do not follow the target distribution. I will describe novel tools for analyzing the output from both asymptotically exact and asymptotically inexact Monte Carlo methods, providing a way to tune the algorithms and to compare them. I will begin my talk by explaining my motivating problem: probability models that have intractable normalizing functions. This is a large class of models where inexact algorithms are often more practical than asymptotically exact algorithms. I will conclude with a discussion of the practical application of our approach. (This research is joint with Bokgyeong Kang, John Hughes, and Jaewoo Park.) |
10/30/23 |
Nathan Ross (University of Melbourne Mathematics and Statistics) Title: Gaussian random field approximation for wide neural networks Abstract: It has been observed that wide neural networks (NNs) with randomly initialized weights may be well-approximated by Gaussian fields indexed by the input space of the NN, and taking values in the output space. There has been a flurry of recent work making this observation precise, since it sheds light on regimes where neural networks can perform effectively. In this talk, I will discuss recent work where we derive bounds on Gaussian random field approximation of wide random neural networks of any depth, assuming Lipschitz activation functions. The bounds are on a Wasserstein transport distance in function space equipped with a strong (supremum) metric, and are explicit in the widths of the layers and natural parameters such as moments of the weights. The result follows from a general approximation result using Stein's method, combined with a novel Gaussian smoothing technique for random fields, which I will also describe. The talk covers joint works with Krishnakumar Balasubramanian, Larry Goldstein, and Adil Salim; and A.D. Barbour and Guangqu Zheng. |
11/6/23 |
Alexander Aue (UC Davis Statistics) |
11/13/23 |
Tim van Erven (University of Amsterdam) |
11/20/23 |
Amin Karbasi (Yale EE&CS)
|
11/27/23 |
Chi Jin (Princeton ECE) |
12/4/23 |
Matthew Reimherr (PSU Statistics) |
12/11/23 |
Nancy Zhang (University of Pennsylvania Statistics and Data Science) |