# Adji Bousso Dieng

I am a Ph.D student in the department of Statistics at Columbia University where I am jointly being advised by David Blei and John Paisley. I am interested in using Statistics and Machine Learning tools to build flexible models for high dimensional data and to derive generic and scalable algorithms for inferring the hidden quantities described by these models. In that regard I research the following topics: Machine Learning, Deep Learning, Probabilistic Modeling, Variational Methods, and applications in NLP. I hold a Diplome d'Ingenieur from Telecom ParisTech (France's "Grandes Ecoles"). I spent the third year of Telecom ParisTech's curriculum at Cornell University where I earned a Master in Statistics. In my spare time I like acting and photography.

# News

March 2017: I Received an ICLR 2017 Travel Award. I will also be volunteering at the conference.

March 2017: I presented a poster on Maximin Variational Inference at the NYAS ML Symposium.

February 2017: Our TopicRNN paper got accepted to ICLR 2017.

November 2016: I received a Microsoft Azure Research Award.

May 2016: This summer I will be joining Microsoft Research for an internship with Chong Wang and Jianfeng Gao.

# Publications

#### TopicRNN: A Recurrent Neural Network With Long-Range Semantic Dependency

Adji B. Dieng, Chong Wang, Jianfeng Gao, and John Paisley

International Conference on Learning Representations, 2017

Neural network-based language models have achieved state of the art results on many NLP tasks. One difficult problem is to capture long-range dependencies as motivated in the introduction of this paper. We propose to solve this by integrating latent topics as context and jointly training these contextual features with the parameters of an RNN language model. We provide a natural way of doing this integration by modeling stop words that are excluded by topic models but needed for sequential language models. This is done via binary classification where the probability of being a stop word is dictated by the hidden layer of the RNN. This modeling approach is possible when the contextual features as provided by the topics are passed directly to the softmax output layer of the RNN as additional bias. We report SOTA-comparable results on the Penn TreeBank and the IMDB.

#### The $\chi$-Divergence For Approximate Inference

Adji B. Dieng, Dustin Tran, Rajesh Ranganath, John Paisley, and David M. Blei

International Conference on Machine Learning, 2017 (Submitted)

Variational inference with the traditional KL(q || p) divergence can run into pathologies. For example it typically underestimates posterior uncertainty. We propose CHIVI, a complementary algorithm to traditional variational inference. CHIVI is a black box algorithm that minimizes the $\chi$-divergence from the posterior to the family of approximating distributions and provides an upper bound of the model evidence. CHIVI performs well on different probabilistic models. On Bayesian probit regression and Gaussian process classification it yielded better classification error rates than expectation propagation (EP) and classical variational inference (VI). When modeling basketball data with a Cox process, it gave better estimates of posterior uncertainty. Finally, the CHIVI upper bound (CUBO) can be used alongside the classical VI lower bound (ELBO) to sandwich-estimate the model evidence.