Schedule for Spring 2023
Seminars are on Wednesdays
Time: 12:00 - 1:00pm
Location: Room 903, 1255 Amsterdam Avenue
Zoom Link: columbiauniversity.zoom.us/j/
Meeting ID: 991 5989 3951
Contacts: Jaesung Son, Luhuan Wu
Information for speakers: For information about schedule, direction, equipment, reimbursement and hotel, please click here.
Prof. David Banks (Duke)
Title: Snakes and Ladders: Strategies for Professional Success
Abstract: Everyone wants to climb the ladder, but we all encounter obstacles. The tutorial provides a number of tips for presenting yourself and your work in ways that favor your interests. It also describes some useful habits and strategies to grow one’s career. Not all comments are applicable to all people, but as Eisenhower said, “It isn’t the plan---it’s the planning.”
Two Sigma <> Columbia PhD Recruiting event: "Papers We Love + Q&A Session".
The goal of the event is to share our quant researchers' favorite papers and interests. We will provide an overview of the research papers/topics and further discuss the details and share potential applications together. I’ll share the article a week prior to the event so that folks can read it and be ready to discuss. Afterwards, we can have Q&A.
Dr. Brian Trippe (Columbia)
Title: Probabilistic protein design with diffusion generative models
Abstract: Computational design of novel proteins with structures not found in nature has applications across biomedicine and material design. Probabilistic machine learning methods that leverage datasets of naturally occurring protein structures have shown considerable promise in this endeavor. In the first part of the talk I will describe some of the statistical and computational challenges in this application area, and how a class of generative machine learning methods, known as diffusion probabilistic models, provide a useful platform for addressing these challenges. In the second part of the talk, I will describe recent results that enable combining protein structure prediction neural networks with diffusion models to designing new proteins with desired functional characteristics.
Prof. Yuqi Gu (Columbia)
Title: Sharing Academic Journey and Research Experience: Exploring the Unknown and Uncovering the Unobserved
Abstract: In this experience-sharing-format talk suggested by the student seminar organizers, I will spend the first half talking about my own academic journey and personal take about finding career goals, developing research interests, building up research skills, etc. I will touch upon my reason for choosing academia – the joy of thinking freely and deeply to explore the unknown. I will provide a few personal suggestions for graduate students interested in pursuing academic careers.
In the second half, I will talk about my research experience and interests in uncovering the unobserved – with the keyword of “latent structures”. I will briefly sketch three project directions I am currently working on. The first is using latent variable modeling in deep generative modeling and representation learning. Here I have been working to understand the identifiability and other properties of deep nonlinear models, and I am interested in developing more interpretable new models and even discovering potential causal explanations. The second direction is high-dimensional statistics when latent structure exists. Both the high dimensionality and unobserved latent structures challenge the traditional statistical methods, and I am interested in developing high-dimensional estimation and inference approaches with theoretical guarantees; one concrete example I am working on is the spectral methods. The third direction is the methodology and applications in psychometrics. Here I have been working to develop principled statistical methods and theory to model educational and psychological data with latent traits.
Prof. Tian Zheng (Columbia)
Title: Applied Statistics: from backyard to living room
Abstract: In this talk, I will discuss my "growing up" story as an applied statistician and data scientist. It used to be, as John Tukey famously said, that "The best thing about being a statistician is that you get to play in everyone's backyard.” Nowadays, statisticians are being invited into "the living room", to work directly on fun and exciting projects. I will discuss a few of my current research projects as examples.
Alumni Dr. Yang Kang and Dr. Elliott Rodriguez (D.E. Shaw)
Title: Alumni event: from campus to quantitative finance industry.
Abstract: Founded in 1988 over a small bookstore in downtown New York City, the D. E. Shaw group began with six employees and $28 million in capital and quickly became a pioneer in computational finance. Today, we operate hundreds of independent intelligent trading engines in nearly every liquid electronic marketplace. Our efforts include deploying statistical arbitrage models to monetize market inefficiencies, developing new technologies when existing solutions won’t do, and launching innovative businesses across industries.
Yang and Elliott graduated from our department in 2017 and 2022, respectively. For his PhD, Yang was working on distributionally robust optimization with Jose Blanchet. He is now in the macro group working on systematic forecasts in rates and currencies. Elliott’s PhD research was in ML and bioinformatics, advised by John Cunningham. He now works in the equities group, where he uses a wide range of statistical learning techniques and state-of-the-art computational tools with the goal of predicting the stock market.
Speaker: Long Zhao, Joe Suk, Arnab Auddy, Casey Bradshaw, Collin Cademartori, and Ye Tian
PhD panel session
Abstract: sharing experiences and tips of phd journeys.
Spring Break - No Seminar
Prof. Molei Liu (Columbia)
Title: Realizing the Potential of Electronic Health Record Data
Abstract:Electronic health records (EHR) linked with biobank databases offer immense potential for biomedical research, personalized risk prediction, and improved clinical practice. However, realizing this potential poses significant challenges due to methodological obstacles such as data heterogeneity, high-dimensionality, privacy concerns, and the paucity of accurate outcomes. In this talk, I will present a framework of statistical methods I have developed to address these challenges, including high-dimensional and semi-parametric inference, federated learning, semi-supervised learning, and transfer learning. These methods have been applied in real-world biomedical studies, and I will discuss their efficacy in addressing the unique challenges posed by EHR data. This talk aims to provide a comprehensive overview of the statistical challenges in analyzing EHR data and offer insights into the potential solutions to these problems.
Dr. Haoda Fu (Eli Lilly)
Title: Our Recent Development on Cost Constraint Machine Learning Models
Abstract: This talk addresses the problem of selecting the optimal combination of biomarkers for diagnosing a disease subtype when there is a cost constrain. For example, if the total cost cannot exceed a certain amount, the choice is between measuring 10 cheap biomarkers or 2 expensive ones. The problem can be formulated as an L0 penalty, which is equivalent to the best subset selection problem. However, traditional algorithms can only solve up to ~35 variables for best subset selection, and until recently, no good solution existed even for this special case. We have modified and extended a recently developed algorithm to handle cost constraint problems with thousands of variables.
The talk covers the background of the problem, method development, and theoretical results. We will present an example of dynamic programming that illustrates how algorithms can make a difference in computing. Through this talk, we aim to showcase the combination of modern statistics, computer science, and algorithms.