Schedule for Spring 2019
Seminars are on Wednesdays
Time: 12:00pm – 1:00pm
Location: Room 1025, 1255 Amsterdam Avenue
Contacts: Yuling Yao, Owen Ward
Information for speakers: For information about schedule, direction, equipment, reimbursement and hotel, please click here.
Wenda Zhou “Statistical Computing Essentials”
Shira Mitchell (NYC Mayor’s Office of Data Analytic)
“Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions”
Elisa Perrone (MIT)
Title: Geometric structure in dependence models and applications
Abstract: The growing availability of data makes it challenging yet crucial to model complex dependence traits. For example, hydrological and financial data typically display tail dependences, non-exchangeability, or stochastic monotonicity. Copulas serve as tools for capturing these complex traits and constructing accurate dependence models which resemble the underlying distributions of data. This talk explores the geometric properties of copulas to address dependence modeling challenges in several applications, such as hydrology and finance. In particular, we study the class of discrete copulas, i.e., restrictions of copulas on uniform grid domains, which admits representations as convex polytopes. In the first part of the talk, we give a geometric characterization of discrete copulas with desirable stochastic constraints in terms of the properties of their associated convex polytopes. In doing so, we draw connections to the popular Birkhoff polytopes, thereby unifying and extending results from both the statistics and the discrete geometry literature. In the second part of the talk, we further consolidate the statistics/discrete geometry bridge by showing the significance of our geometric findings to (1) construct entropy-copula models useful in hydrology, and (2) design test statistics for stochastic monotonicity properties of interest in finance.
Linxi Liu (Columbia)
Title: Exploring RNA-protein interactions at amino-acid level via a multinomial logistic regression model with latent responses
Abstract: In eukaryotic cells, alternative splicing occurs during RNA processing and greatly increases the biodiversity of proteins encoded by the genome. It is already known that RNA-binding proteins (RBP) play a central role in the regulation of splicing, while at the molecular level it is still unclear how proteins interact and crosslink with RNA. The recently developed high-throughput sequencing of RNA isolated by crosslinking immuno- precipitation (HITS-CLIP) method allows genome-wide mapping of RBP-binding footprint regions at single-nucleotide resolution. Together with information about protein-RNA complex 3-dimensional structures, we can make inference of crosslinking at amino-acid- nucleotide level by using statistical models. While generally the interaction at this level can hardly be detected in the experiments.
In this work, we introduce a multinomial logistic regression with latent responses to model the potential crosslinking between 20 different amino acids and the nucleotide. We also introduce a set of variable selection indicators for each category. Under the Bayesian framework, we are able to make inference of latent responses and association between explanatory variables and the response based on the posterior distribution. The results well coincide with our current understanding of RBPs.
|2/27/19||Timothy Jones (Columbia)|
Sharon Lohr (Arizona State University)
“Measuring Crime: Behind the Statistics”
In 1915, the Chicago City Council asked statistician Edith Abbott to report “upon the frequency of murder, assault, burglary, robbery, theft and like crimes in Chicago.” Her report, drawing on published and unpublished statistics from the courts, probation office, house of correction, and police department, set the stage for subsequent collections and evaluations of crime statistics. Her conclusions—that statistics’ quality depend on the systems of data collection and that multiple sources of data are needed to study crime—hold today.
Drawing on Abbott’s insights, I set out eight questions to ask about a statistic before you rely on it. I then go through these questions for three sources of statistics about sexual assault: the Uniform Crime Reports, the National Crime Victimization Survey, and the National Intimate Partner and Sexual Violence Survey.
“Optimization for the Working Statistician”
Statistical problems are defined via optimization problems. For instance, the acts of trying to find a MLE (or more broadly, performing M-estimation/empirical risk minimization), finding a most powerful hypothesis test, or finding a minimax/admissable estimator in decision theory are all those of solving an optimization problem. In statistics, we usually ignore the process of being able to find such an optimal value and take it for granted; in some respects, we suppose that the ‘optimization error’ and the ‘statistical error’ can be neatly decoupled and we only concern ourselves with the latter. I’ll argue that this perspective is not useful for the pragmatist, and perhaps even worse, an incorrect and uninteresting one. In particular, I’ll talk about non-convex optimization both from an optimization and a statistical point of view, and give some highlights of some interesting aspects of both.
Lauren Kennedy (Columbia)
Samory Kpotufe (Columbia)
|4/10/19||Joshua Gordon (Google/Tensorflow)|
Marco Avella (Columbia)