Estimating entropy on m bins given fewer than m samples
Appeared as
IEEE Transactions on Information Theory 50: 2200-2203.
Consider a sequence $p_N$ of discrete probability measures, supported
on $m_N$ points, and assume that we observe $N$ i.i.d. samples from
each $p_N$. We demonstrate the existence of an estimator of the
entropy, $H(p_N)$, which is consistent even if the ratio $N/m_N$ is
bounded (and, as a corollary, even if this ratio tends to zero, albeit
at a sufficiently slow rate).
Reprint (200K, pdf) | Related work on estimating information-theoretic
quantities | Liam
Paninski's home