Student Seminar – Fall 2019

# Schedule for Fall 2019

Seminars are on Wednesdays
Time: 12:00pm – 1:00pm
Location: Room 1025, 1255 Amsterdam Avenue
Contacts: Yuling Yao, Owen Ward

 9/11/19 Cynthia G. Rush (Columbia University) Title: Algorithmic Analysis of SLOPE via Approximate Message Passing Abstract: SLOPE is a relatively new convex optimization procedure for high-dimensional linear regression via the sorted L1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This non-separable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. In this talk, we propose using approximate message passing or AMP to provably solve the SLOPE problem in the regime of linear sparsity under Gaussian random designs. This algorithmic approach allows one to approximate the SLOPE solution via the much more amenable AMP iterates, and a consequence of this analysis is an asymptotically exact characterization of the SLOPE solution. Explicitly, we demonstrate that one can characterize the asymptotic dynamics of the AMP iterates by employing a recently developed state evolution analysis for non-separable penalties, thereby overcoming the difficulty caused by the sorted L1 penalty. We demonstrate the potential of such an asymptotic analysis by using it to study the false discovery rate and true positive proportion of SLOPE solutions. This is joint work with Zhiqi Bu, Jason Klusowski, and Weijie Su. 9/18/19 Summer 2019 Internships Recap 9/25/19 10/2/19 Espen Bernton (Columbia) Title: Schrödinger bridge samplers Abstract: Consider a reference Markov process with initial distribution $\pi_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $\pi_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schr\”odinger addressed the problem of identifying the Markov process with initial distribution $\pi_{0}$ and terminal distribution equal to $\pi_{T}$ which is the closest to the reference process in terms of relative entropy. This special case of the so-called Schr\”odinger bridge problem can be solved using iterative proportional fitting, also known as Sinkhorn’s algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schr\”odinger bridge samplers, to approximate a target distribution $\pi$ on $\mathbb{R}^{d}$ and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $\pi_T = \pi$, via regression-based approximations of the corresponding iterative proportional fitting recursion. We empirically demonstrate its performance in several applications, and make connections with other problems arising in the optimal transport, optimal control and physics literatures. Joint work with J. Heng, A. Doucet and P. E. Jacob. 10/9/19 Mengye Ren (University of Toronto) “Meta-learning for more human-like learning algorithms.” Abstract: Standard deep learning algorithms require tedious engineering effort curating large-scale datasets and tuning neural nets on every single task, which differs vastly from how humans learn to accumulate knowledge in an open world. Future learning algorithms will be more adaptive and human-like. In this talk, I will present a few of my recent papers that try to capture some aspects of more human-like learning. Specifically, I will show some progress in using meta-learning to develop new learning algorithms for learning with sparse supervision, handling noisy training labels, and incrementally accumulating new knowledge. 10/16/19 Soledad Villar (Moore-Sloan Research Fellow at the Center for Data Science at New York University) “Classification-aware dimensionality reduction and genetic marker selection.” Abstract: Classification and dimensionality reduction are two fundamental problems in mathematical data science. Given labeled points in a high-dimensional vector space, we seek a projection onto a low dimensional subspace that maintains the classification structure of the data. Taking inspiration from large margin nearest neighbor classification, we introduce SqueezeFit, a semidefinite relaxation of this problem. Unlike its predecessors, this relaxation is amenable to theoretical analysis, allowing us to provably recover a planted projection operator from the data. We apply this framework to the marker selection problem, where we use linear programming to find the markers in genetic data that exhibit the classification structure of single-cell RNA sequencing data. 10/23/19 Miguel Ángel Garrido (Columbia University) “How to be a (statistically significant) good TA and not die trying”. 10/30/19 Jonathan Auerbach (Columbia University)   Title: The Life and Death of Great American City Analytics Description: The digitization of government records and automation of government services are producing an ever-expanding volume of data. This data has the power to make cities more responsive and equitable. But traditional modes of civic engagement – the cornerstone of great American cities – are ill equipped to harness this power. How can ordinary citizens keep up and hold data-driven policy accountable when data is stored in petabytes but policy is communicated in sound bites? 11/6/19 Kamiar Rahnama Rad (Baruch College) “Scalable estimation of out-of-sample prediction error via approximate leave-one-out with applications to neural data analysis.” Abstract. The paper considers the problem of out-of-sample risk estimation under the high dimensional settings where standard techniques such as K-fold cross validation suffer from large biases. Motivated by the low bias of the leave-one-out cross validation (LO) method, we propose a computationally efficient closed-form approximate leave-one-out formula (ALO) for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires minor computational overhead. With minor assumptions about the data generating process, we obtain a finite-sample upper bound for |LO−ALO|. Our theoretical analysis illustrates that |LO−ALO| → 0 with overwhelming probability, when n, p → ∞, where the dimension p of the feature vectors may be comparable with or even greater than the number of observations, n. Despite the high-dimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that |LO − ALO| decreases as n, p increase, revealing the excellent finite sample performance of ALO. We further illustrate the usefulness of our proposed out-of-sample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat. 11/13/19 Natalie Doss (Yale) “Optimal Estimation in the High-Dimensional Gaussian Mixture Model.” The Gaussian location mixture model is one of the most widely studied models in the statistical literature, yet rates of convergence in this model are not well understood when the model is high dimensional. In this talk, I will discuss recent results on minimax rates of convergence for both parameter and density estimation. I will also discuss a fast algorithm for mixing distribution estimation that achieves the minimax rate. This is joint work with Yihong Wu, Pengkun Yang, and Harrison Zhou. 11/20/19 Siddhartha Dalal (Columbia University) Deep Analytics: From NLP, Computer Vision to Sensors Inspired by new advances in Deep Learning, a new field is emerging called Deep Analytics that focuses on analysis of large unconventional data collected from pictures, videos, written documents, mobile apps, sensors and IoT (Internet of Things). The information sources with these kinds of data are exploding. For example, amount of video traffic on Internet alone is going to become 82% of total traffic by 2020 according Cisco. Given these vast amount of this new data, it is critical for data scientists to be involved in developing techniques for analysis of such data. I will describe a number of advances and challenges in this field with applications to Computer Vision, NLP (Natural Language Processing) and sensors in context of a number of applications. The applications include drug safety, worker safety, automated damage detection and prevention. Bio: Siddhartha (Sid) Dalal is a Professor of Professional Practice at Columbia University. Prior to Columbia he was Chief Data Scientist and Sr. VP at AIG in charge of R&D that included creation and application of AI, Statistics and CS to Computer Vision, Natural Language Processing and Sensors/IOT for managing risks. He came to AIG from RAND Corporation where was the CTO. Sid also was VP of Research at Xerox overseeing their worldwide imaging and software services research, and Bell Labs and Bellcore/SAIC as Chief Scientist and Executive Director. Sid has an MBA and a Ph.D. from the University of Rochester with over 100 peer reviewed publications, patents and monographs covering the areas of risk analysis, medical informatics, Bayesian statistics and economics, image processing and sensor networks. At Rand he was responsible for the creation of technology and spinning off of Praedicat, Inc., a casualty insurance analytics company. Sid is a member of US Army Science Board, an advisory board of 20 scientists appointed by the Secretary of Defense to advise US Army on technology. He has received several awards including from IEEE, ASA and ASQ. 11/27/19 12/4/19 Anne Goldfield (PhD/Counseling and Psychological Services) 12/11/19 Two sigma