entropy - Entropy calculation


Return entropy estimate and uncertainty from a random sample.


Use sklearn.mixture.BayesianGaussianMixture to estimate entropy.


Entropy estimate from covariance matrix C


Weighted Kozachenko-Leonenko nearest-neighbour entropy calculation.


Multivariate normal entropy approximation.

Estimate entropy after a fit.

The gmm_entropy() function computes the entropy from a Gaussian mixture model. This provides a reasonable estimate even for non-Gaussian distributions. This is the recommended method for estimating the entropy of a sample.

The cov_entropy() method computes the entropy associated with the covariance matrix. This covariance matrix can be estimated during the fitting procedure (BFGS updates an estimate of the Hessian matrix for example), or computed by estimating derivatives when the fit is complete.

The MVNEntropy class estimates the covariance from an MCMC sample and uses this covariance to estimate the entropy. This gives a better estimate of the entropy than the equivalent direct calculation, which requires many more samples for a good kernel density estimate. The reject_normal attribute is True if the MCMC sample is significantly different from normal. Unfortunately, this almost always the case for any reasonable sample size that isn’t strictly gaussian.

The entropy() function computes the entropy directly from a set of MCMC samples, normalized by a scale factor computed from the kernel density estimate at a subset of the points.[1]

There are many other entropy calculations implemented within this file, as well as a number of sampling distributions for which the true entropy is known. Furthermore, entropy was computed against dream output and checked for consistency. None of the methods is truly excellent in terms of minimum sample size, maximum dimensions and speed, but many of them are pretty good.

The following is an informal summary of the results from different algorithms applied to DREAM output:

from .entropy import Timer as T

# Try MVN ... only good for normal distributions, but very fast
with T(): M = entropy.MVNEntropy(drawn.points)
print("Entropy from MVN: %s"%str(M))

# Try wnn ... no good.
with T(): S_wnn, Serr_wnn = entropy.wnn_entropy(drawn.points, n_est=20000)
print("Entropy from wnn: %s"%str(S_wnn))

# Try wnn with bootstrap ... still no good.
with T(): S_wnn, Serr_wnn = entropy.wnn_bootstrap(drawn.points)
print("Entropy from wnn bootstrap: %s"%str(S_wnn))

# Try wnn entropy with thinning ... still no good.
#drawn = self.draw(portion=portion, vars=vars,
#                  selection=selection, thin=10)
with T(): S_wnn, Serr_wnn = entropy.wnn_entropy(points)
print("Entropy from wnn: %s"%str(S_wnn))

# Try wnn with gmm ... still no good
with T(): S_wnn, Serr_wnn = entropy.wnn_entropy(drawn.points, n_est=20000, gmm=20)
print("Entropy from wnn with gmm: %s"%str(S_wnn))

# Try pure gmm ... pretty good
with T(): S_gmm, Serr_gmm = entropy.gmm_entropy(drawn.points, n_est=10000)
print("Entropy from gmm: %s"%str(S_gmm))

# Try kde from statsmodels ... pretty good
with T(): S_kde_stats = entropy.kde_entropy_statsmodels(drawn.points, n_est=10000)
print("Entropy from kde statsmodels: %s"%str(S_kde_stats))

# Try kde from sklearn ... pretty good
with T(): S_kde = entropy.kde_entropy_sklearn(drawn.points, n_est=10000)
print("Entropy from kde sklearn: %s"%str(S_kde))

# Try kde from sklearn at points from gmm ... pretty good
with T(): S_kde_gmm = entropy.kde_entropy_sklearn_gmm(drawn.points, n_est=10000)
print("Entropy from kde+gmm: %s"%str(S_kde_gmm))

# Try Kramer ... pretty good, but doesn't support marginal entropy
with T(): S, Serr = entropy.entropy(drawn.points, drawn.logp, N_entropy=n_est)
print("Entropy from Kramer: %s"%str(S))
class bumps.dream.entropy.MVNEntropy(x, alpha=0.05, max_points=1000)[source]

Bases: object

Multivariate normal entropy approximation.

Uses Mardia’s multivariate skewness and kurtosis test to estimate normality.

x is a set of points

alpha is the cutoff for the normality test.

max_points is the maximum number of points to use when checking normality. Since the normality test is \(O(n^2)\) in memory and time, where \(n\) is the number of points, max_points defaults to 1000. The entropy is computed from the full dataset.

The returned object has the following attributes:

p_kurtosis is the p-value for the kurtosis normality test

p_skewness is the p-value for the skewness normality test

reject_normal is True if either the the kurtosis or the skew test fails

entropy is the estimated entropy of the best normal approximation to the distribution


Entropy estimate from covariance matrix C

bumps.dream.entropy.entropy(points, logp, N_entropy=10000, N_norm=2500)[source]

Return entropy estimate and uncertainty from a random sample.

points is a set of draws from an underlying distribution, as returned by a Markov chain Monte Carlo process for example.

logp is the log-likelihood for each draw.

N_norm is the number of points \(k\) to use to estimate the posterior density normalization factor \(P(D) = \hat N\), converting from \(\log( P(D|M) P(M) )\) to \(\log( P(D|M)P(M)/P(D) )\). The relative uncertainty \(\Delta\hat S/\hat S\) scales with \(\sqrt{k}\), with the default N_norm=2500 corresponding to 2% relative uncertainty. Computation cost is \(O(nk)\) where \(n\) is number of points in the draw.

N_entropy is the number of points used to estimate the entropy \(\hat S = - \int P(M|D) \log P(M|D)\) from the normalized log likelihood values.

bumps.dream.entropy.gmm_entropy(points, n_est=None, n_components=None)[source]

Use sklearn.mixture.BayesianGaussianMixture to estimate entropy.

points are the data points in the sample.

n_est are the number of points to use in the estimation; default is 10,000 points, or 0 for all the points.

n_components are the number of Gaussians in the mixture. Default is \(5 \sqrt{d}\) where \(d\) is the number of dimensions.

Returns estimated entropy and uncertainty in the estimate.

This method uses BayesianGaussianMixture from scikit-learn to build a model of the point distribution, then uses Monte Carlo sampling to determine the entropy of that distribution. The entropy uncertainty is computed from the variance in the MC sample scaled by the number of samples. This does not incorporate any uncertainty in the sampling that generated the point distribution or the uncertainty in the GMM used to model that distribution.

bumps.dream.entropy.wnn_entropy(points, k=None, weights=True, n_est=None, gmm=None)[source]

Weighted Kozachenko-Leonenko nearest-neighbour entropy calculation.

k is the number of neighbours to consider, with default \(k=n^{1/3}\)

n_est is the number of points to use for estimating the entropy, with default \(n_\rm{est} = n\)

weights is True for default weights, False for unweighted (using the distance to the kth neighbour only), or a vector of weights of length k.

gmm is the number of gaussians to use to model the distribution using a gaussian mixture model. Default is 0, and the points represent an empirical distribution.

Returns entropy H in bits and its uncertainty.

Berrett, T. B., Samworth, R.J., Yuan, M., 2016. Efficient multivariate entropy estimation via k-nearest neighbour distances. DOI:10.1214/18-AOS1688 https://arxiv.org/abs/1606.00304