Student Seminar Fall 2023

 

Seminars are on Wednesdays 

Time: 12:00 – 1:00 pm

Location: Room 903, 1255 Amsterdam Avenue

Contacts: Wribhu Banik, Seunghyun Lee, Anirban Nath

9/27/2023

Speaker: Kaizheng Wang (Columbia IEOR)

Title: Pseudo-Labeling for Kernel Ridge Regression underCovariate Shift

Abstract: We develop and analyze a principled approach tokernel ridge regression under covariate shift. The goal is to learn a regressionfunction with small mean squared error over a target distribution, based onunlabeled data from there and labeled data that may have a different featuredistribution. We propose to split the labeled data into two subsets and conductkernel ridge regression on them separately to obtain a collection of candidatemodels and an imputation model. We use the latter to fill the missing labelsand then select the best candidate model accordingly. Our non-asymptotic excessrisk bounds show that in quite general scenarios, our estimator adapts to thestructure of the target distribution and the covariate shift. It achieves theminimax optimal error rate up to a logarithmic factor. The use of pseudo-labelsin model selection does not have major negative impacts.

10/4/2023

Speaker:  Yongchan Kwon (Coulumbia)

Title: Data Valuation: Shapley Value and Beyond

Abstract: Data valuation is a powerful framework forproviding statistical insights into which data are beneficial or detrimental tomodel training. Many existing Shapley-based data valuation methods have shownpromising results in various downstream tasks, however, they are well known tobe computationally challenging as it requires training a large number ofmodels. As a result, it has been recognized as infeasible to apply to largedatasets. In this talk, we introduce Data-OOB, a new data valuation method fora bagging model that utilizes the out-of-bag estimate. The proposed method iscomputationally efficient and can scale to millions of data by reusing trainedweak learners. Furthermore, Data-OOB has solid theoretical interpretations inthat it identifies the same set of important data points as the infinitesimaljackknife influence function. We conduct comprehensive experiments usingvarious classification datasets, each with thousands of sample sizes. Wedemonstrate that the proposed method significantly outperforms existingstate-of-the-art data valuation methods in identifying mislabeled data andfinding a set of helpful (or harmful) data points, highlighting the potentialfor applying data values in real-world applications.

10/11/2023

Speaker: Two Sigma

Title: Two sigma quant research talk

Abstract: Curious about using data to forecast the future? Join our talk to learn how we create systematic tools and technologies to forecast the future of global markets.  We’ll cover topics such as how to leverage the scientific method and make data-driven decisions.

10/18/2023

Speaker: Chenyang Zhong (Columbia)

Title: Mallows permutation models with L1 and L2 distances

Abstract: Introduced by Mallows in statistical ranking theory, Mallows permutation model is a class of non-uniform probability measures on permutations. The general model depends on a distance metric that can be chosen from a host of metrics on permutations. In this talk, I will focus on Mallows permutation models with L1 and L2 distances-respectively known as Spearman’s footrule and Spearman’s rank correlation in the statistics literature. The models have been widely applied in statistics and physics, but have largely resisted analysis because the normalizing constants are “uncomputable”.  
A natural question regarding these Mallows models is: Picking a random permutation from either of the models, what does it “look like”? This may involve various features of the permutation, such as the longest increasing subsequence or the cycles. In this talk, I will explain how multi-scale analysis and the hit and run algorithm-a Markov chain for sampling from both models-can be used to prove theorems regarding such questions.

 

10/25/2023

Speaker: Eren Kizildag (Columbia)

Title: Statistical-Computational Tradeoffs in RandomOptimization Problems

Abstract: Many algorithmic tasks inhigh-dimensional statistics and in random structures exhibit a ubiquitousphenomenon, dubbed as a statistical-computational gap: knownpolynomial-time algorithms perform significantly worse than the existentialguarantee; they break down beyond a certain point. Examples include stochasticblock models, random graphs, and more. In general, there is no analogue ofNP-theory for average-case models.

In this talk, I willoutline a framework called the Overlap Gap Property (OGP), tailored for randomoptimization problems. Originating from statistical physics, in particular fromspin glasses, the OGP is an intricate geometrical property that is known to bea barrier against important classes of algorithms. I’ll show the presence ofthe OGP in binary perceptron, discrepancy minimization, and number balancingmodels; I’ll then outline how the OGP yields nearly sharp algorithmic lower boundsagainst stable algorithms and online algorithms for these models.

Time permitting, I’ll alsotalk about original connections with statistical physics—in particular, showthe shattering phase for the Ising p-spin model—and outline extensions toplanted models.

Based on https://arxiv.org/abs/2103.01369https://arxiv.org/abs/2203.15667https://arxiv.org/abs/2302.06485https://arxiv.org/abs/2307.07461,and https://arxiv.org/abs/2309.15115,some of which are joint with David Gamarnik (MIT), Aukosh Jaganath (Waterloo),Will Perkins (Georgia Tech), and Changji Xu (Harvard). 

 

11/1/2023

Speaker: Joe Suk (Columbia)

Title: Towards A More OptimisticView of Nonstationary Sequential Decision-Making

Abstract: A bevy of recent works tackle the problem of sequentialdecision-making (e.g. multi-armed bandits, contextual bandits, reinforcementlearning) under changing environments, or distribution shifts. Ideally, oneaims to automatically adapt/self-tune to unknown changes in distribution, andrestart exploration as needed. While recent breakthroughs show this is possiblein a broad sense, such works contend that the learner should restart proceduresupon experiencing any non-stationarity, leading to worst-case (regret) rates.This leaves open whether faster rates are possible, adaptively, if few changesin distribution are actually severe, e.g., involve no change in best arm. Inmulti-armed bandits, we provide a positive answer, showing that in fact, a muchweaker notion of change can be adapted to, which can yield significantly fasterrates than previously known, whether as expressed in terms of number of bestarm switches–for which no adaptive procedure was known, or in terms ofpreviously studied variation quantities. Time permitting, I’ll discuss abroader research program addressing this question in dueling bandits,contextual bandits, RL.

 

 

11/8/2023

Speaker: Emmanuel Abbe (EPFL)

Title: Proof of the Reed-Muller Code Capacity Conjecture

Abstract: In 1948, Shannon used a probabilistic argument to prove theexistence of codes achieving channel capacity. In 1954, Muller and Reedintroduced a simple deterministic code construction, conjectured shortly afterto achieve channel capacity. Significant progress was made towards establishingthis conjecture over the last decades. In particular, the special case of theerasure channel was settled in 2015 by Kudekar et al., relying onBourgain-Kalai’s sharp threshold theorem for symmetric monotone properties. Thecase of error channels remained, however, unsettled despite recent major progress. In this talk, we provide a proof of the general conjecture. The proofcircumvents the use of Bourgain-Kalai’s theorem by establishing first a”weak threshold” property that relies solely on symmetries. The proofthen proceeds with a new boosting framework for coding, improving the weaklocal error to a strong global error by relying on sunflower structures (as inErdős-Rado 1960). Joint work with Colin Sandon.

 

11/15/2023

Speaker:  John Cunningham, Marcel Nutz

Title: Data is as data does: the influence of computation onstatistical inference

Abstract: Probabilistic models remain a hugely popularclass of techniques in modern statistics and machine learning, and theirexpressiveness has been extended by modern large-scale compute. While exciting,these generalizations almost always come with approximations, and researcherstypically ignore the fundamental influence of computational approximations.Thus, results from modern probabilistic methods become as much about theapproximation method as they are about the data and the model, undermining boththe Bayesian principle and the practical utility of inference in probabilisticmodels.

To expose this issue and to demonstrate how todo approximate inference correctly, in this talk we will focus on the GaussianProcess class of models. I will derive a new class of GP approximations thatprovide consistent estimation of the combined posterior arising from both thefinite number of data observed *and* the finite amount of computation expended.The most common GP approximations map to an instance in this class, such asmethods based on the Cholesky factorization, conjugate gradients, and inducingpoints. I will show the consequences of ignoring computational uncertainty,prove that implicitly modeling it improves generalization performance, andpoint to extensions of computational uncertainty beyond Gaussian Processes.

Speaker: Marcel Nutz

Title: EntropicSelection in Monge’s Problem

Abstract: The optimal transport problem can admit many optimizers, forinstance when the cost function is the Euclidean distance on R^d (Monge’sproblem, Earth Mover’s problem). On the other hand, entropically regularizedoptimal transport (EOT) always has a unique optimizer. It is a longstandingopen problem whether EOT selects some particular optimal transport coupling inthe vanishing regularization limit, and what that coupling would be. We providea surprising answer for Monge’s problem in dimension d>1. (Ongoing work with Chenyang Zhong.)

 

9/27/2023

Speaker: Collin Andrew Cadematori (Columbia)

Title: BayesianModel Checking – Rejection, Expansion, and Power

Abstract: How should wecheck a Bayesian model for compatibility with some aspect of our observed data?Disagreement over this question has persisted in the model checking literaturedespite numerous attempts to diagnose and resolve limitations with theclassical posterior predictive check. This talk will review the recent historyof this topic before attempting yet another diagnosis and resolution of some ofthe main issues. Our diagnosis begins by pointing out how a simple applicationof the law of total variance reveals that the performance of the posteriorpredictive p-value tends to degrade when we extend some base model to a morecomplex model. This creates a need to scale the difficulty of our model testwith the complexity of the underlying model. We propose that this can beachieved simply by testing multiple statistics simultaneously using a jointp-value, and we examine the conditions under which this strategy succeeds usingcopula models of the dependence between test statistics. Along the way, we willexplore some broader questions, including the potential incompatibility ofdifferent goals we may have when model checking, and whether model checking inthe Bayesian and frequentist domains are fundamentally different exercises.