entropy - Entropy calculation¶
Return entropy estimate and uncertainty from a random sample. |
|
Use sklearn.mixture.BayesianGaussianMixture to estimate entropy. |
|
Entropy estimate from covariance matrix C |
|
Weighted Kozachenko-Leonenko nearest-neighbour entropy calculation. |
|
Multivariate normal entropy approximation. |
Estimate entropy after a fit.
The gmm_entropy()
function computes the entropy from a Gaussian mixture
model. This provides a reasonable estimate even for non-Gaussian distributions.
This is the recommended method for estimating the entropy of a sample.
The cov_entropy()
method computes the entropy associated with the
covariance matrix. This covariance matrix can be estimated during the
fitting procedure (BFGS updates an estimate of the Hessian matrix for example),
or computed by estimating derivatives when the fit is complete.
The MVNEntropy
class estimates the covariance from an MCMC sample and
uses this covariance to estimate the entropy. This gives a better
estimate of the entropy than the equivalent direct calculation, which requires
many more samples for a good kernel density estimate. The reject_normal
attribute is True if the MCMC sample is significantly different from normal.
Unfortunately, this almost always the case for any reasonable sample size that
isn’t strictly gaussian.
The entropy()
function computes the entropy directly from a set
of MCMC samples, normalized by a scale factor computed from the kernel density
estimate at a subset of the points.[1]
There are many other entropy calculations implemented within this file, as well as a number of sampling distributions for which the true entropy is known. Furthermore, entropy was computed against dream output and checked for consistency. None of the methods is truly excellent in terms of minimum sample size, maximum dimensions and speed, but many of them are pretty good.
The following is an informal summary of the results from different algorithms applied to DREAM output:
from .entropy import Timer as T
# Try MVN ... only good for normal distributions, but very fast
with T(): M = entropy.MVNEntropy(drawn.points)
print("Entropy from MVN: %s"%str(M))
# Try wnn ... no good.
with T(): S_wnn, Serr_wnn = entropy.wnn_entropy(drawn.points, n_est=20000)
print("Entropy from wnn: %s"%str(S_wnn))
# Try wnn with bootstrap ... still no good.
with T(): S_wnn, Serr_wnn = entropy.wnn_bootstrap(drawn.points)
print("Entropy from wnn bootstrap: %s"%str(S_wnn))
# Try wnn entropy with thinning ... still no good.
#drawn = self.draw(portion=portion, vars=vars,
# selection=selection, thin=10)
with T(): S_wnn, Serr_wnn = entropy.wnn_entropy(points)
print("Entropy from wnn: %s"%str(S_wnn))
# Try wnn with gmm ... still no good
with T(): S_wnn, Serr_wnn = entropy.wnn_entropy(drawn.points, n_est=20000, gmm=20)
print("Entropy from wnn with gmm: %s"%str(S_wnn))
# Try pure gmm ... pretty good
with T(): S_gmm, Serr_gmm = entropy.gmm_entropy(drawn.points, n_est=10000)
print("Entropy from gmm: %s"%str(S_gmm))
# Try kde from statsmodels ... pretty good
with T(): S_kde_stats = entropy.kde_entropy_statsmodels(drawn.points, n_est=10000)
print("Entropy from kde statsmodels: %s"%str(S_kde_stats))
# Try kde from sklearn ... pretty good
with T(): S_kde = entropy.kde_entropy_sklearn(drawn.points, n_est=10000)
print("Entropy from kde sklearn: %s"%str(S_kde))
# Try kde from sklearn at points from gmm ... pretty good
with T(): S_kde_gmm = entropy.kde_entropy_sklearn_gmm(drawn.points, n_est=10000)
print("Entropy from kde+gmm: %s"%str(S_kde_gmm))
# Try Kramer ... pretty good, but doesn't support marginal entropy
with T(): S, Serr = entropy.entropy(drawn.points, drawn.logp, N_entropy=n_est)
print("Entropy from Kramer: %s"%str(S))
- class bumps.dream.entropy.MVNEntropy(x, alpha=0.05, max_points=1000)[source]¶
Bases:
object
Multivariate normal entropy approximation.
Uses Mardia’s multivariate skewness and kurtosis test to estimate normality.
x is a set of points
alpha is the cutoff for the normality test.
max_points is the maximum number of points to use when checking normality. Since the normality test is \(O(n^2)\) in memory and time, where \(n\) is the number of points, max_points defaults to 1000. The entropy is computed from the full dataset.
The returned object has the following attributes:
p_kurtosis is the p-value for the kurtosis normality test
p_skewness is the p-value for the skewness normality test
reject_normal is True if either the the kurtosis or the skew test fails
entropy is the estimated entropy of the best normal approximation to the distribution
- bumps.dream.entropy.entropy(points, logp, N_entropy=10000, N_norm=2500)[source]¶
Return entropy estimate and uncertainty from a random sample.
points is a set of draws from an underlying distribution, as returned by a Markov chain Monte Carlo process for example.
logp is the log-likelihood for each draw.
N_norm is the number of points \(k\) to use to estimate the posterior density normalization factor \(P(D) = \hat N\), converting from \(\log( P(D|M) P(M) )\) to \(\log( P(D|M)P(M)/P(D) )\). The relative uncertainty \(\Delta\hat S/\hat S\) scales with \(\sqrt{k}\), with the default N_norm=2500 corresponding to 2% relative uncertainty. Computation cost is \(O(nk)\) where \(n\) is number of points in the draw.
N_entropy is the number of points used to estimate the entropy \(\hat S = - \int P(M|D) \log P(M|D)\) from the normalized log likelihood values.
- bumps.dream.entropy.gmm_entropy(points, n_est=None, n_components=None)[source]¶
Use sklearn.mixture.BayesianGaussianMixture to estimate entropy.
points are the data points in the sample.
n_est are the number of points to use in the estimation; default is 10,000 points, or 0 for all the points.
n_components are the number of Gaussians in the mixture. Default is \(5 \sqrt{d}\) where \(d\) is the number of dimensions.
Returns estimated entropy and uncertainty in the estimate.
This method uses BayesianGaussianMixture from scikit-learn to build a model of the point distribution, then uses Monte Carlo sampling to determine the entropy of that distribution. The entropy uncertainty is computed from the variance in the MC sample scaled by the number of samples. This does not incorporate any uncertainty in the sampling that generated the point distribution or the uncertainty in the GMM used to model that distribution.
- bumps.dream.entropy.wnn_entropy(points, k=None, weights=True, n_est=None, gmm=None)[source]¶
Weighted Kozachenko-Leonenko nearest-neighbour entropy calculation.
k is the number of neighbours to consider, with default \(k=n^{1/3}\)
n_est is the number of points to use for estimating the entropy, with default \(n_\rm{est} = n\)
weights is True for default weights, False for unweighted (using the distance to the kth neighbour only), or a vector of weights of length k.
gmm is the number of gaussians to use to model the distribution using a gaussian mixture model. Default is 0, and the points represent an empirical distribution.
Returns entropy H in bits and its uncertainty.
Berrett, T. B., Samworth, R.J., Yuan, M., 2016. Efficient multivariate entropy estimation via k-nearest neighbour distances. DOI:10.1214/18-AOS1688 https://arxiv.org/abs/1606.00304