Student Seminar Series

Choose which semester to display:

Schedule for Spring 2024

Seminars are on Wednesdays 

Time: 12:00 - 1:00pm

Location: Room 903, 1255 Amsterdam Avenue

Contacts: Wribhu Banik, Seunghyun Lee, Anirban Nath

 
1/24/2024

Speakers: Samory Kpotufe & Bodhi Sen (Columbia Stats)

Title: TBA

Abstract: Samory and Bodhi will both be talking about their current research interests and what it's like to do research with them.

1/31, 2/7

NO SEMINAR

2/14/2024

Speaker: Dan Lacker (Columbia IEOR)

Title: The (projected) Langevin dynamics: sampling, optimal transport, and variational inference

Abstract: This is a talk in two parts. The first part will survey a classical picture of the Langevin diffusion, with a focus on its applications to sampling and optimization. The second part will discuss my recent work on one or two (as time permits) analogous diffusion dynamics, which are designed to sample from probability measures arising in (1) entropic optimal transport and (2) mean field variational inference.

 

 

 

2/21/2024

Speaker:  Genevera Allen (Rice)

Title: Statistical Machine Learning for Scientific Discovery

Abstract: In this talk, I will give an overview of my research program which develops new statistical machine learning techniques to help scientists make reproducible and reliable data-driven-discoveries from large and complex data.  

The first part will focus on an example of my research motivated by neuroscience: Understanding how large populations of neurons communicate in the brain at rest, in response to stimuli, or to produce behavior are fundamental open questions in neuroscience.  Many approach this by estimating the intrinsic functional neuronal connectivity using probabilistic graphical models, but there remain major statistical and computational hurdles to graph learning from new large-scale calcium imaging technologies.  I will highlight a new graph learning strategy my group has developed to address a critical unsolved neuroscience challenge that we call Graph Quilting, or graph learning from partial covariances resulting from non-simultaneously recorded neurons.  

The second part will focus on an example of my research in interpretable machine learning: Feature importance inference has been a long-standing statistical problem that helps promote scientific discoveries. Instead of testing for parameters that are only interpretable for specific models, there has been increasing interest in model-agnostic methods that can be applied to any statistical or machine learning model.  I will highlight a new approach to feature occlusion or leave-one-covariate-out (LOCO) inference that leverages minipatch ensemble learning to increase statistical power and improve computational efficiency without making any limiting assumptions on the model or data distribution.   

Finally, I will conclude by highlighting current and future research directions in my group related to modern multivariate analysis, graphical models, ensemble learning, machine learning interpretability and fairness, and applications in neuroscience and genomics. 

 

2/28/2024
Speaker: Dr. Simon Tavare (Columbia Stats and Bioscience)
Title: Cancer by the Numbers

Abstract: After a brief overview of the history of cancer, I will illustrate how the mathematical sciences can contribute to cancer research through a number of examples. Cancer development is characterized by occurrences of genomic alterations ranging in extent and impact, and the complex interdependence between these genomic events shapes the selection landscape. Stochastic modeling can help evaluate the role of each mutational process during tumor progression, but existing frameworks only capture certain aspects of tumorigenesis.  I will outline CINner, a stochastic framework for modeling genomic diversity and selection during tumor evolution. The main advantage of CINner is its flexibility to incorporate many genomic events that directly impact cellular fitness, from driver gene mutations to copy number alterations (CNAs) including focal amplifications and deletions, mis-segregations, and whole-genome duplication. CINner raises a number of difficult statistical inference problems due to the lack of a feasible way to compute likelihoods. I will outline a new approach to approximate Bayesian computation – ABC – that exploits distributional random forests. I will give some examples of how this ABC-DRF methodology works in practice, and try to convince you that ABC has really come of age.

 

 
 

 

3/6/2024

Speaker: Zhongyuan Lyu (Columbia DSI)

Title: Optimal Clustering of Multi-layer Networks

 Abstract: We study the fundamental limit of clustering networks when a multi-layer network is present. Under the mixture multi-layer stochastic block model (MMSBM), we show that the minimax optimal network clustering error rate, which takes an exponential form and is characterized by the Rényi-1/2 divergence between the edge probability distributions of the component networks. We propose a novel two-stage network clustering method including a tensor-based initialization and a one-step refinement procedure by likelihood-based Lloyd’s algorithm. Our proposed algorithm achieves the minimax optimal network clustering error rate and allows extreme network sparsity under MMSBM. We also extend our methodology and analysis framework to study the minimax optimal clustering error rate for mixture of discrete distributions including Binomial, Poisson, and multi-layer Poisson networks.

 

 

3/13/2024

NO SEMINAR

 

 

Speaker:

Title:

Abstract:

 

 

Speaker:

Title:

Abstract:

 

 

Speaker:

Title:

Abstract:

 

 

Speaker:

Title:

Abstract:

 

Speaker:

Title:

Abstract:

 

Speaker:

Title:

Abstract:

 

 

Speaker:

Title:

Abstract:

 

 

Speaker:

Title:

Abstract:

 
 

Speaker:

Title:

Abstract: