# stats - Statistics helper functions¶

 VarStats var_stats format_vars parse_var Parse a line returned by format_vars back into the statistics for the variable on that line. stats Find mean and standard deviation of a set of weighted samples. credible_interval Find the credible interval covering the portion ci of the data. shortest_credible_interval Find the credible interval covering the portion ci of the data.

Statistics helper functions.

class bumps.dream.stats.VarStats(**kw)[source]

Bases: object

bumps.dream.stats.var_stats(draw, vars=None)[source]
bumps.dream.stats.format_vars(all_vstats)[source]
bumps.dream.stats.parse_var(line)[source]

Parse a line returned by format_vars back into the statistics for the variable on that line.

bumps.dream.stats.stats(x, weights=None)[source]

Find mean and standard deviation of a set of weighted samples.

Note that the median is not strictly correct (we choose an endpoint of the sample for the case where the median falls between two values in the sample), but this is good enough when the sample size is large.

bumps.dream.stats.credible_interval(x, ci, weights=None)[source]

Find the credible interval covering the portion ci of the data.

x are samples from the posterior distribution.

ci is a set of intervals in [0,1]. For a $$1-\sigma$$ interval use ci=erf(1/sqrt(2)), or 0.68. About 1e5 samples are needed for 2 digits of precision on a $$1-\sigma$$ credible interval. For a 95% interval, about 1e6 samples are needed for 2 digits of precision. At least 1000 points are needed for an unbiased result, otherwise the resulting interval will be shorter than expected (tested on a variety of distributions including exponential, cauchy, gaussian, beta and gamma).

weights is a vector of weights for each x, or None for unweighted. One could weight points according to temperature in a parallel tempering dataset.

Returns an array [[x1_low, x1_high], [l2_low, x2_high], …] where [xi_low, xi_high] are the starting and ending values for credible interval i.

This function is faster if the inputs are already sorted.

bumps.dream.stats.shortest_credible_interval(x, ci=0.95, weights=None)[source]

Find the credible interval covering the portion ci of the data.

x are samples from the posterior distribution. ci is the interval size in (0,1], and defaults to 0.95. For a 1-sigma interval use ci=erf(1/sqrt(2)). weights is a vector of weights for each x, or None for unweighted.

Returns the minimum and maximum values of the interval. If ci is a vector, return a vector of intervals.

This function is faster if the inputs are already sorted.

About 1e6 samples are needed for 2 digits of precision on a 95% credible interval, or 1e5 for 2 digits on a 1-sigma credible interval.

To remove bias towards toward smaller intervals, the midpoints between the surrounding intervals are used as the end points.