Schedule for Fall 2020
The Student Seminar has migrated to Zoom for the Fall 2020 semester.
Seminars are on Wednesdays
Time: 12:00 – 1:00pm
Contacts: Diane Lu, Leon Fernandes
Information for speakers: For information about schedule, direction, equipment, reimbursement and hotel, please click here.
Welcome to the New Academic Year & Campuswire Workshop
11:30am – 12:00pm: Welcome to the New Academic Year.
Elliott Rodriguez, Ding Zhou, Zhi Wang, Yuanzhe Xu and others (Columbia)
“Sharing Summer Internship Experiences”
George Hripcsak (Columbia)
Title: Drawing reproducible conclusions from observational medical data with OHDSI
Title: Two Sigma Quant Talk
At Two Sigma, our community of scientists, technologists and academics collaborate to solve some of the most challenging economic problems.
We rely on the scientific method, rooted in hypothesis, analysis, and experimentation, to drive data-driven decisions, to manage risk, and to expand into new areas of focus. In this way, we create systematic tools and technologies to forecast the future of global markets.
If you’re interested in hearing more about the scientific method to modeling, please join our Quant Talk. We hope to see you there!
Our Quant Researchers Include:
James Roger (Metrum Research Group)
Title: Pharmacometrics is Like This
Scientists working in biomedical research often have some sense of what to expect from a proper “biostatistician”, but relatively few know what to make of a statistician who calls himself or herself a “pharmacometrician”. Thus freed from the shackles of other people’s expectations, the pharmacometric statistician encounters problems and opportunities that are different from those encountered by the more conventionally branded biostatistician. Generally speaking, “pharmacometric analyses” put greater emphasis on understanding data generating mechanisms and evaluating associated causal narratives. In this talk I will try to convey the spirit and the value of pharmacometric approaches by way of three real examples.
I won’t have time to discuss any single application in depth, but I will try to convey the broad contours of each model and give a sense of the value proposition associated with each analysis.
Prof. Victor H. de la Pena (Columbia)
Title: Some Open Problems in Probability and Statistics
Abstract: In this talk I will discuss a few open problems. The man references for the talk are:
Sumit Mukherjee (Columbia)
Title: Viewing a permutation as a copula
Abstract: The idea of viewing a permutation as a copula, (i.e. a probability measure on the unit square with uniform marginals) first originated in Combinatorics. Using this representation, we can compute limiting properties of various statistics under non uniform probability models on the space of permutations. Examples include the number of fixed points, the number of cycles of a given length, and the number of inversions. Focusing on Statistics, we analyze a class of non uniform probability measures on permutations, which include the celebrated Mallows models. We compute the limiting log normalizing constant for such models, and give an iterative algorithm for computing this limit. We also show consistency of the MLE and the Pseudo-likelihood estimator in these models.
Kobi Abayomi (Seton Hall University)
Title: What is Data
Abstract: An intentional singularization to illustrate some examples from business use cases where pseudo-experimental is as good as it gets.
Ph.D. Student Town Hall
Start time: 11:30 am
End Time: 12:30 pm
Marco Avella Medina (Columbia)
Title: Differentially private inference via noisy optimization
Abstract: Over the last two decades differential privacy has emerged a promising rigorous paradigm for the release of sensitive data in the computer science community. It assumes there is a trusted curator who holds the data of individuals in a database and the goal of privacy is to simultaneously protect individual data while allowing statistical analysis of the database as a whole. In this talk, we will discuss a general optimization-based approach for computing differentially private M-estimators and confidence intervals. In particular, we will show how robust statistics can be used in conjunction with noisy gradient descent and noisy Newton-type methods in order to obtain optimal private estimators. Our convergence analysis demonstrates that our algorithms converge with high probability to a neighborhood of the non-private M-estimators. The radius of this neighborhood is optimal in the sense it correspond to the statistical minimax cost of differential privacy. We will then turn to the problem of inference and propose a differentially private estimator of the asymptotic variance of our private M-estimators. This naturally lead to the use of approximate pivotal statistics for the construction of confidence intervals and hypothesis testing. We demonstrate the good small sample empirical performance our methods in simulations and real data examples.
This is based on joint work with Casey Bradshaw and Po-Ling Loh.
Title: A Trajectorial Approach to Gradient Flow properties of Conservative Diffusions and Markov Chains
Abstract: We provide a detailed, probabilistic interpretation for the variational characterization of conservative diffusion as entropic gradient flow. Jordan, Kinderlehrer, and Otto showed in 1998 that, for diffusions of Langevin-Smoluchowski type, the Fokker-Planck probability density flow minimizes the rate of relative entropy dissipation, as measured by the distance traveled in terms of the quadratic Wasserstein metric in the ambient space of configurations. Using a very direct perturbation analysis we obtain novel, stochastic-process versions of such features; these are valid along almost every trajectory of the motion in both the forward and, most transparently, the backward, directions of time. The original results follow then simply by “aggregating”, i.e., taking expectations. As a bonus, the HWI inequality of Otto and Villani relating relative entropy, Fisher information, and Wasserstein distance, falls in our lap; and with it the celebrated log-Sobolev, Talagrand and Poincare inequalities of functional analysis. Similar ideas work in the context of continuous-time Markov Chains; but now both the functional analysis and the geometry are considerably more involved.
Bodhisattva Sen (Columbia)
Title: Measuring Association on Topological Spaces Using Kernels and Geometric Graphs
Abstract: In this work, we propose a class of simple, nonparametric, yet interpretable measures of association between two random variables X and Y taking values in general topological spaces. These nonparametric measures — defined using the theory of reproducing kernel Hilbert spaces — capture the strength of dependence between X and Y and have the property that they are 0 if and only if the variables are independent and 1 if and only if one variable is a measurable function of the other. Further, these population measures can be consistently estimated using the general framework of geometric graphs which include k-nearest neighbor graphs and minimum spanning trees. Moreover, a subclass of these estimators are also shown to adapt to the intrinsic dimensionality of the underlying distribution. Some of these empirical measures can also be computed in near-linear time. If X and Y are independent, these empirical measures (properly normalized) have a standard normal limiting distribution and hence, can be readily used to test for independence. In fact, as far as we are aware, these are the only procedures that possess all the above mentioned desirable properties. The correlation coefficient proposed in Dette et al. (2013), Chatterjee (2019), and Azadkia and Chatterjee (2019) can be seen as a special case of this general class of measures. If time permits, I will also describe how the same ideas can be effectively used to measure the strength of conditional dependence.
This is joint work with Nabarun Deb and Promit Ghosal.
Richard Davis (Columbia)
Title: Applications of Distance Correlation to Time Series
Charles Margossian (Columbia)
Title: Bayesian inference for latent Gaussian models: MCMC, approximate methods, and hybrids
this year, I would really like a scalable algorithm to do Bayesian inference on latent Gaussian models (LGM). LGMs are a class of multilevel models, which can be used to pool information across different groups of data. Examples include Gaussian processes and GLMs with a sparsity inducing prior. We will consider two applications: (i) a disease map of Finland (where Santa lives) and (ii) a genomic study amongst patients with prostate cancer. Candidate inference methods include Hamiltonian Monte Carlo (HMC) sampling — a gradient-based MCMC method –, and variational inference. There are also benefits to combining sampling and approximation methods, by embedding a Laplace approximation inside an HMC sampler, as discussed in a recent paper (see also this complimentary notebook).