Student Seminar – Spring 2023

Schedule for Spring 2023

Attention: The Student Seminar will be in hybrid mode this semester. Most talks and events will be held in person, and people can also join via zoom. In-person participation is only available to Columbia affiliates with building access.

Seminars are on Wednesdays 
Time: 12:00 – 1:00pm

Location: Room 903, 1255 Amsterdam Avenue

Zoom Link: columbiauniversity.zoom.us/j/99159893951?pwd=VlNIUXFIR3ZDMzJuakZUWW0zM2pOdz09
Meeting ID: 991 5989 3951
Passcode: 284676

Contacts: Jaesung Son, Luhuan Wu

Information for speakers: For information about schedule, direction, equipment, reimbursement and hotel, please click here.

 
1/25/23

Prof. David Banks (Duke) 

Title:  Snakes and Ladders:  Strategies for Professional Success

Abstract:  Everyone wants to climb the ladder, but we all encounter obstacles.  The tutorial provides a number of tips for presenting yourself and your work in ways that favor your interests.  It also describes some useful habits and strategies to grow one’s career. Not all comments are applicable to all people, but as Eisenhower said, “It isn’t the plan—it’s the planning.”

 

2/1/23

Two Sigma <> Columbia PhD Recruiting event: “Papers We Love + Q&A Session”.
Details:
The goal of the event is to share our quant researchers’ favorite papers and interests. We will provide an overview of the research papers/topics and further discuss the details and share potential applications together. I’ll share the article a week prior to the event so that folks can read it and be ready to discuss. Afterwards, we can have Q&A.
2/8/23
Dr. Brian Trippe (Columbia)
 

Title: Probabilistic protein design with diffusion generative models

Abstract: Computational design of novel proteins with structures not found in nature has applications across biomedicine and material design. Probabilistic machine learning methods that leverage datasets of naturally occurring protein structures have shown considerable promise in this endeavor. In the first part of the talk I will describe some of the statistical and computational challenges in this application area, and how a class of generative machine learning methods, known as diffusion probabilistic models, provide a useful platform for addressing these challenges. In the second part of the talk, I will describe recent results that enable combining protein structure prediction neural networks with diffusion models to designing new proteins with desired functional characteristics.

2/15/23
Prof. Yuqi Gu (Columbia)
 

Title: Sharing Academic Journey and Research Experience: Exploring the Unknown and Uncovering the Unobserved

Abstract: In this experience-sharing-format talk suggested by the student seminar organizers, I will spend the first half talking about my own academic journey and personal take about finding career goals, developing research interests, building up research skills, etc. I will touch upon my reason for choosing academia – the joy of thinking freely and deeply to explore the unknown. I will provide a few personal suggestions for graduate students interested in pursuing academic careers.

In the second half, I will talk about my research experience and interests in uncovering the unobserved – with the keyword of “latent structures”. I will briefly sketch three project directions I am currently working on. The first is using latent variable modeling in deep generative modeling and representation learning. Here I have been working to understand the identifiability and other properties of deep nonlinear models, and I am interested in developing more interpretable new models and even discovering potential causal explanations. The second direction is high-dimensional statistics when latent structure exists. Both the high dimensionality and unobserved latent structures challenge the traditional statistical methods, and I am interested in developing high-dimensional estimation and inference approaches with theoretical guarantees; one concrete example I am working on is the spectral methods. The third direction is the methodology and applications in psychometrics. Here I have been working to develop principled statistical methods and theory to model educational and psychological data with latent traits.

2/22/23

Prof. Tian Zheng (Columbia) 

Title: Applied Statistics: from backyard to living room

Abstract: In this talk, I will discuss my “growing up” story as an applied statistician and data scientist. It used to be, as John Tukey famously said, that “The best thing about being a statistician is that you get to play in everyone’s backyard.” Nowadays, statisticians are being invited into “the living room”, to work directly on fun and exciting projects. I will discuss a few of my current research projects as examples. 
3/1/23

Alumni Dr. Yang Kang and Dr. Elliott Rodriguez (D.E. Shaw)

Title: Alumni event: from campus to quantitative finance industry. 

Abstract: Founded in 1988 over a small bookstore in downtown New York City, the D. E. Shaw group began with six employees and $28 million in capital and quickly became a pioneer in computational finance. Today, we operate hundreds of independent intelligent trading engines in nearly every liquid electronic marketplace. Our efforts include deploying statistical arbitrage models to monetize market inefficiencies, developing new technologies when existing solutions won’t do, and launching innovative businesses across industries.

Yang and Elliott graduated from our department in 2017 and 2022, respectively. For his PhD, Yang was working on distributionally robust optimization with Jose Blanchet. He is now in the macro group working on systematic forecasts in rates and currencies. Elliott’s PhD research was in ML and bioinformatics, advised by John Cunningham. He now works in the equities group, where he uses a wide range of statistical learning techniques and state-of-the-art computational tools with the goal of predicting the stock market.

3/8/23

Speaker: Long Zhao, Joe Suk, Arnab Auddy, Casey Bradshaw, Collin Cademartori, and Ye Tian
PhD panel session
Abstract: sharing experiences and tips of phd journeys.
3/15/23
Spring Break – No Seminar
3/22/23
Prof. Molei Liu (Columbia)
 
Title: Realizing the Potential of Electronic Health Record Data

Abstract:Electronic health records (EHR) linked with biobank databases offer immense potential for biomedical research, personalized risk prediction, and improved clinical practice. However, realizing this potential poses significant challenges due to methodological obstacles such as data heterogeneity, high-dimensionality, privacy concerns, and the paucity of accurate outcomes. In this talk, I will present a framework of statistical methods I have developed to address these challenges, including high-dimensional and semi-parametric inference, federated learning, semi-supervised learning, and transfer learning. These methods have been applied in real-world biomedical studies, and I will discuss their efficacy in addressing the unique challenges posed by EHR data. This talk aims to provide a comprehensive overview of the statistical challenges in analyzing EHR data and offer insights into the potential solutions to these problems.

3/29/23

Dr. Haoda Fu (Eli Lilly) 

Title: Our Recent Development on Cost Constraint Machine Learning Models

Abstract: This talk addresses the problem of selecting the optimal combination of biomarkers for diagnosing a disease subtype when there is a cost constrain. For example, if the total cost cannot exceed a certain amount, the choice is between measuring 10 cheap biomarkers or 2 expensive ones. The problem can be formulated as an L0 penalty, which is equivalent to the best subset selection problem. However, traditional algorithms can only solve up to ~35 variables for best subset selection, and until recently, no good solution existed even for this special case. We have modified and extended a recently developed algorithm to handle cost constraint problems with thousands of variables.

The talk covers the background of the problem, method development, and theoretical results. We will present an example of dynamic programming that illustrates how algorithms can make a difference in computing. Through this talk, we aim to showcase the combination of modern statistics, computer science, and algorithms.

4/5/23

Prof. Chris Wiggins (Columbia)

Title: The Differential Geometry of Smooth Bandits

Abstract: In this talk, we delve into the intriguing world of stochastic optimization algorithms, specifically multi-armed and contextual bandits, discussing the problem’s background, its 90-year history (though the name is only 68 years old), and the recent explosion in industrial interest. We will explore selected examples of how techniques from statistical physics and approximate sampling methods can inspire innovative, performant, and interpretable approaches. I’ll introduce how the problem can be approximated using the language of differential geometry as a low-dimensional dynamical system, offering an interpretable and analytically tractable method for understanding the performance of bandits. Throughout the talk, we will emphasize aspects that are particularly suitable for graduate students seeking research topics in this interdisciplinary domain.

4/12/23
Bianca Dumitrascu (Columbia)
 

Title:  Learning statistical representations of embryonic development

Research Abstract: During embryonic development, single cells read in local information from their environments and use this information to move, divide and specialize. As a result, the environments themselves change. However, it remains unclear how gene expression programs interact with cell morphology and mechanical forces to orchestrate organogenesis in early embryos. Recent advances in single cell techniques and in toto imaging enable unique venues in exploring this link between genomics and biophysics, which dynamically maps cells to organisms. I will describe statistical machine learning frameworks aimed at understanding how tissue level mechanical and morphometric information impact gene expression patterns in spatio-temporal contexts. We use these tools to understand boundary formation in the early development of mouse embryos and to align data from light sheet recordings of pre-gastrulation development.

Bonus/Informal Abstract:

I will attempt to give a short history of statistical thought in genetics/genomics through the past 100+ years with a focus on how methods have been informed by data availability. I will particularly focus on parts of this history which are relevant to the directions of my research group. I will provide a preliminary biological primer to help you navigate the talk and I will talk about my unconventional (but exciting to me!) path to statistics.  We will have ample time for Q&A so bring all your questions about computational biology.

4/19/23
Charles Y. Tan (Pfizer)
 

Title: Roles of Statistician in Biopharmaceutical Industry

Abstract: Statisticians have contributed to the rise of modern R&D driven biopharmaceutical industry. Today, together with other quantitatively trained scientists and engineers, statisticians are making deeper inroads into this dynamic industry. This talk attempts to describe the landscape through three different perspectives. First, it’s a ground level view through one person’s career path. Second, it’s a general survey of different job roles of statisticians in the biopharmaceutical industry. Third, it’s a “war story” from the trench of the fight against the COVID pandemic. In the final analysis, I’d argue that statisticians are uniquely positioned to be “guardians of the scientific method.”

Bio: Dr. Tan is now an Executive Director Biostatistics at Pfizer. He obtained his bachelor degree in applied mathematics from Fudan University and a PhD degree in statistics from Temple University.

4/26/23
Morgane Austern (Harvard)
5/3/23
 
Speaker: Milad Bakhshizadeh (Stanford)
 
Title: Exponential tail bounds and Large Deviation Principle for Heavy-Tailed U-Statistics
 

Abstract: We study deviation of U-statistics when samples have heavy-tailed distribution so the kernel of the U-statistic does not have bounded exponential moments at any positive point. We obtain an exponential upper bound for the tail of the U-statistics which clearly denotes two regions of tail decay, the first is a Gaussian decay and the second behaves like the tail of the kernel. For several common U-statistics, we also show the upper bound has the right rate of decay as well as sharp constants by obtaining rough logarithmic limits which in turn can be used to develop LDP for U-statistics. In spite of usual LDP results in the literature, processes we consider in this work have LDP speed slower than their sample size n.