Variational minimax estimation of discrete distributions under
KL loss
To appear,
Neural Information Processing Systems 2004
We develop a family of upper and lower bounds on the worst-case
expected KL loss for estimating a discrete distribution on a finite
number $m$ of points, given $N$ i.i.d.~samples. Our upper bounds are
approximation-theoretic, similar to recent bounds for estimating
discrete entropy; the lower bounds are based on Bayesian averages of
the KL loss under Dirichlet distributions. The upper bounds are
convex in their parameters and thus can be minimized by descent
methods to provide estimators with low worst-case error; the lower
bounds are indexed by a one-dimensional parameter and are thus easily
maximized. Asymptotic analysis of the bounds demonstrates the uniform
KL-consistency of a wide class of estimators as $c = N/m \to \infty$
(no matter how slowly), and shows that no estimator is consistent for
$c$ bounded (in contrast to entropy estimation). Moreover, the bounds
are asymptotically tight as $c\to 0$ or $\infty$, and are shown
numerically to be tight within a factor of two for all $c$. Finally,
in the sparse-data limit $c\to 0$, we find that the Dirichlet-Bayes
(add-constant) estimator with parameter scaling like $-c\log(c)$
optimizes both the upper and lower bounds, suggesting an optimal
choice of the ``add-constant'' parameter in this regime.
Reprint (90K,
pdf) | Related
work on estimation of information-theoretic
quantities | Liam
Paninski's research page