`mozanalysis.bayesian_stats.bayesian_bootstrap`

mozanalysis.bayesian_stats.bayesian_bootstrap.bb_mean(values, prob_weights)[source]

Calculate the mean of a bootstrap replicate.

Parameters:

values (pd.Series, ndarray) – One dimensional array of observed values
prob_weights (pd.Series, ndarray) – Equally shaped array of the probability weight associated with each value.

Returns:

The mean as a np.float.

mozanalysis.bayesian_stats.bayesian_bootstrap.make_bb_quantile_closure(quantiles)[source]

Return a function to calculate quantiles for a bootstrap replicate.

Parameters:: quantiles (float, list of floats) – Quantiles to compute

Returns a function that calculates quantiles for a bootstrap replicate:

Args:

values (pd.Series, ndarray):
One dimensional array of observed values

prob_weights (pd.Series, ndarray):
Equally shaped array of the probability weight associated with each value.

Returns:

A quantile as a np.float, or

several quantiles as a dict keyed by the quantiles

mozanalysis.bayesian_stats.bayesian_bootstrap.compare_branches(df, col_label, ref_branch_label='control', stat_fn=<function bb_mean>, num_samples=10000, threshold_quantile=None, individual_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995), comparative_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]

Jointly sample bootstrapped statistics then compare them.

Parameters:

df – a pandas DataFrame of queried experiment data in the standard format (see mozanalysis.experiment).
col_label (str) – Label for the df column contaning the metric to be analyzed.
ref_branch_label (str, optional) – String in df['branch'] that identifies the branch with respect to which we want to calculate uplifts - usually the control branch.
stat_fn (callable, optional) –
A function that either:
- Aggregates each resampled population to a scalar (e.g. the default, bb_mean), or
- Aggregates each resampled population to a dict of scalars (e.g. the func returned by make_bb_quantile_closure when given multiple quantiles.
In both cases, this function must accept two parameters:
- a one-dimensional ndarray or pandas Series of values,
- an identically shaped object of weights for these values
num_samples (int, optional) – The number of bootstrap iterations to perform.
threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g. 0.9999.
individual_summary_quantiles (list, optional) – Quantiles to determine the credible intervals on individual branch statistics. Change these when making Bonferroni corrections.
comparative_summary_quantiles (list, optional) – Quantiles to determine the credible intervals on comparative branch statistics (i.e. the change relative to the reference branch, probably the control). Change these when making Bonferroni corrections.

Returns:

If stat_fn returns a scalar (this is the default), then this function returns a dictionary has the following keys and values:

’individual’: dictionary mapping each branch name to a pandas Series that holds the expected value for the bootstrapped stat_fn, and credible intervals.

’comparative’: dictionary mapping each branch name to a pandas Series of summary statistics for the possible uplifts of the bootstrapped stat_fn relative to the reference branch.

Otherwise, when stat_fn returns a dict, then this function returns a similar dictionary, except the Series are replaced with DataFrames. Each row in each DataFrame corresponds to one output of stat_fn, and is the Series that would be returned if stat_fn computed only this statistic.

mozanalysis.bayesian_stats.bayesian_bootstrap.bootstrap_one_branch(data, stat_fn=<function bb_mean>, num_samples=10000, seed_start=None, threshold_quantile=None, summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]

Bootstrap stat_fn for one branch on its own.

Computes stat_fn for num_samples resamples of data, then returns summary statistics for the results.

Parameters:

data – The data as a list, 1D numpy array, or pandas Series
stat_fn (callable, optional) –
A function that either:
- Aggregates each resampled population to a scalar (e.g. the default, bb_mean), or
- Aggregates each resampled population to a dict of scalars (e.g. the func returned by make_bb_quantile_closure when given multiple quantiles.
In both cases, this function must accept two parameters:
- a one-dimensional ndarray or pandas Series of values,
- an identically shaped object of weights for these values
num_samples – The number of bootstrap iterations to perform
seed_start – An int with which to seed numpy’s RNG. It must be unique within this set of calculations.
threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g. 0.9999.
summary_quantiles (list, optional) – Quantiles to determine the confidence bands on the branch statistics. Change these when making Bonferroni corrections.

mozanalysis.bayesian_stats.bayesian_bootstrap.get_bootstrap_samples(data, stat_fn=<function bb_mean>, num_samples=10000, seed_start=None, threshold_quantile=None)[source]

Return stat_fn evaluated on resampled data.

Parameters:

data – The data as a list, 1D numpy array, or pandas series
stat_fn (callable, optional) –
A function that either:
- Aggregates each resampled population to a scalar (e.g. the default, bb_mean), or
- Aggregates each resampled population to a dict of scalars (e.g. the func returned by make_bb_quantile_closure when given multiple quantiles.
In both cases, this function must accept two parameters:
- a one-dimensional ndarray or pandas Series of values,
- an identically shaped object of weights for these values
num_samples – The number of samples to return
seed_start –
A seed for the random number generator; this function will use seeds in the range:
```
[seed_start, seed_start + num_samples)
```
and these particular seeds must not be used elsewhere in this calculation. By default, use a random seed.
threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g. 0.9999.

Returns:

A Series or DataFrame with one row per sample and one column per output of stat_fn.

References

Rubin, Donald B. The Bayesian Bootstrap. Ann. Statist. 9 (1981),: no. 1, 130–134. https://dx.doi.org/10.1214/aos/1176345338

mozanalysis.bayesian_stats.bayesian_bootstrap

`mozanalysis.bayesian_stats.bayesian_bootstrap`