Schedule for Fall 2019
Seminars are on Wednesdays
Time: 12:00pm – 1:00pm
Location: Room 1025, 1255 Amsterdam Avenue
Contacts: Yuling Yao, Owen Ward
Information for speakers: For information about schedule, direction, equipment, reimbursement and hotel, please click here.
9/11/19 
Cynthia G. Rush (Columbia University) Title: Algorithmic Analysis of SLOPE via Approximate Message Passing Abstract: SLOPE is a relatively new convex optimization procedure for highdimensional linear regression via the sorted L1 penalty: the larger the rank of the fitted coefficient, the larger the penalty. This nonseparable penalty renders many existing techniques invalid or inconclusive in analyzing the SLOPE solution. In this talk, we propose using approximate message passing or AMP to provably solve the SLOPE problem in the regime of linear sparsity under Gaussian random designs. This algorithmic approach allows one to approximate the SLOPE solution via the much more amenable AMP iterates, and a consequence of this analysis is an asymptotically exact characterization of the SLOPE solution. Explicitly, we demonstrate that one can characterize the asymptotic dynamics of the AMP iterates by employing a recently developed state evolution analysis for nonseparable penalties, thereby overcoming the difficulty caused by the sorted L1 penalty. We demonstrate the potential of such an asymptotic analysis by using it to study the false discovery rate and true positive proportion of SLOPE solutions. This is joint work with Zhiqi Bu, Jason Klusowski, and Weijie Su. 
9/18/19 
Summer 2019 Internships Recap 
9/25/19 

10/2/19 
Espen Bernton (Columbia) Title: Schrödinger bridge samplers Abstract: Consider a reference Markov process with initial distribution $\pi_{0}$ and transition kernels $\{M_{t}\}_{t\in[1:T]}$, for some $T\in\mathbb{N}$. Assume that you are given distribution $\pi_{T}$, which is not equal to the marginal distribution of the reference process at time $T$. In this scenario, Schr\”odinger addressed the problem of identifying the Markov process with initial distribution $\pi_{0}$ and terminal distribution equal to $\pi_{T}$ which is the closest to the reference process in terms of relative entropy. This special case of the socalled Schr\”odinger bridge problem can be solved using iterative proportional fitting, also known as Sinkhorn’s algorithm. We leverage these ideas to develop novel Monte Carlo schemes, termed Schr\”odinger bridge samplers, to approximate a target distribution $\pi$ on $\mathbb{R}^{d}$ and to estimate its normalizing constant. This is achieved by iteratively modifying the transition kernels of the reference Markov chain to obtain a process whose marginal distribution at time $T$ becomes closer to $\pi_T = \pi$, via regressionbased approximations of the corresponding iterative proportional fitting recursion. We empirically demonstrate its performance in several applications, and make connections with other problems arising in the optimal transport, optimal control and physics literatures. Joint work with J. Heng, A. Doucet and P. E. Jacob. 
10/9/19 
Mengye Ren (University of Toronto) “Metalearning for more humanlike learning algorithms.” Abstract: Standard deep learning algorithms require tedious engineering effort curating largescale datasets and tuning neural nets on every single task, which differs vastly from how humans learn to accumulate knowledge in an open world. Future learning algorithms will be more adaptive and humanlike. In this talk, I will present a few of my recent papers that try to capture some aspects of more humanlike learning. Specifically, I will show some progress in using metalearning to develop new learning algorithms for learning with sparse supervision, handling noisy training labels, and incrementally accumulating new knowledge. 
10/16/19 
Soledad Villar (MooreSloan Research Fellow at the Center for Data Science at New York University) “Classificationaware dimensionality reduction and genetic marker selection.”
Abstract: 
10/23/19 
Miguel Ángel Garrido (Columbia University) “How to be a (statistically significant) good TA and not die trying”.

10/30/19 
Jonathan Auerbach (Columbia University)
Title: The Life and Death of Great American City Analytics
Description: The digitization of government records and automation of government services are producing an everexpanding volume of data. This data has the power to make cities more responsive and equitable. But traditional modes of civic engagement – the cornerstone of great American cities – are ill equipped to harness this power. How can ordinary citizens keep up and hold datadriven policy accountable when data is stored in petabytes but policy is communicated in sound bites? 
11/6/19 
Kamiar Rahnama Rad (Baruch College) “Scalable estimation of outofsample prediction error via approximate leaveoneout with applications to neural data analysis.” Abstract. The paper considers the problem of outofsample risk estimation under the high dimensional settings where standard techniques such as Kfold cross validation suffer from large biases. Motivated by the low bias of the leaveoneout cross validation (LO) method, we propose a computationally efficient closedform approximate leaveoneout formula (ALO) for a large class of regularized estimators. Given the regularized estimate, calculating ALO requires minor computational overhead. With minor assumptions about the data generating process, we obtain a finitesample upper bound for LO−ALO. Our theoretical analysis illustrates that LO−ALO → 0 with overwhelming probability, when n, p → ∞, where the dimension p of the feature vectors may be comparable with or even greater than the number of observations, n. Despite the highdimensionality of the problem, our theoretical results do not require any sparsity assumption on the vector of regression coefficients. Our extensive numerical experiments show that LO − ALO decreases as n, p increase, revealing the excellent finite sample performance of ALO. We further illustrate the usefulness of our proposed outofsample risk estimation method by an example of real recordings from spatially sensitive neurons (grid cells) in the medial entorhinal cortex of a rat. 
11/13/19 
Natalie Doss (Yale) “Optimal Estimation in the HighDimensional Gaussian Mixture Model.” The Gaussian location mixture model is one of the most widely studied models in the statistical literature, yet rates of convergence in this

11/20/19 
Siddhartha Dalal (Columbia University) Deep Analytics: From NLP, Computer Vision to Sensors Inspired by new advances in Deep Learning, a new field is emerging called Deep Analytics that focuses on analysis of large unconventional data collected from pictures, videos, written documents, mobile apps, sensors and IoT (Internet of Things). The information sources with these kinds of data are exploding. For example, amount of video traffic on Internet alone is going to become 82% of total traffic by 2020 according Cisco. Given these vast amount of this new data, it is critical for data scientists to be involved in developing techniques for analysis of such data. I will describe a number of advances and challenges in this field with applications to Computer Vision, NLP (Natural Language Processing) and sensors in context of a number of applications. The applications include drug safety, worker safety, automated damage detection and prevention. Bio: Siddhartha (Sid) Dalal is a Professor of Professional Practice at Columbia University. Prior to Columbia he was Chief Data Scientist and Sr. VP at AIG in charge of R&D that included creation and application of AI, Statistics and CS to Computer Vision, Natural Language Processing and Sensors/IOT for managing risks. He came to AIG from RAND Corporation where was the CTO. Sid also was VP of Research at Xerox overseeing their worldwide imaging and software services research, and Bell Labs and Bellcore/SAIC as Chief Scientist and Executive Director. Sid has an MBA and a Ph.D. from the University of Rochester with over 100 peer reviewed publications, patents and monographs covering the areas of risk analysis, medical informatics, Bayesian statistics and economics, image processing and sensor networks. At Rand he was responsible for the creation of technology and spinning off of Praedicat, Inc., a casualty insurance analytics company. Sid is a member of US Army Science Board, an advisory board of 20 scientists appointed by the Secretary of Defense to advise US Army on technology. He has received several awards including from IEEE, ASA and ASQ. 
11/27/19 

12/4/19 
Anne Goldfield (PhD/Counseling and Psychological Services) 
12/11/19 
Two sigma
