mozanalysis.bayesian_stats.bayesian_bootstrap
- mozanalysis.bayesian_stats.bayesian_bootstrap.bb_mean(values, prob_weights)[source]
Calculate the mean of a bootstrap replicate.
- Parameters:
values (pd.Series, ndarray) – One dimensional array of observed values
prob_weights (pd.Series, ndarray) – Equally shaped array of the probability weight associated with each value.
- Returns:
The mean as a np.float.
- mozanalysis.bayesian_stats.bayesian_bootstrap.make_bb_quantile_closure(quantiles)[source]
Return a function to calculate quantiles for a bootstrap replicate.
- Parameters:
quantiles (float, list of floats) – Quantiles to compute
Returns a function that calculates quantiles for a bootstrap replicate:
Args:
- values (pd.Series, ndarray):
One dimensional array of observed values
- prob_weights (pd.Series, ndarray):
Equally shaped array of the probability weight associated with each value.
Returns:
A quantile as a np.float, or
several quantiles as a dict keyed by the quantiles
- mozanalysis.bayesian_stats.bayesian_bootstrap.compare_branches(df, col_label, ref_branch_label='control', stat_fn=<function bb_mean>, num_samples=10000, threshold_quantile=None, individual_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995), comparative_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]
Jointly sample bootstrapped statistics then compare them.
- Parameters:
df – a pandas DataFrame of queried experiment data in the standard format (see mozanalysis.experiment).
col_label (str) – Label for the df column contaning the metric to be analyzed.
ref_branch_label (str, optional) – String in
df['branch']
that identifies the branch with respect to which we want to calculate uplifts - usually the control branch.stat_fn (callable, optional) –
A function that either:
Aggregates each resampled population to a scalar (e.g. the default,
bb_mean
), orAggregates each resampled population to a dict of scalars (e.g. the func returned by
make_bb_quantile_closure
when given multiple quantiles.
In both cases, this function must accept two parameters:
a one-dimensional ndarray or pandas Series of values,
an identically shaped object of weights for these values
num_samples (int, optional) – The number of bootstrap iterations to perform.
threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g. 0.9999.
individual_summary_quantiles (list, optional) – Quantiles to determine the credible intervals on individual branch statistics. Change these when making Bonferroni corrections.
comparative_summary_quantiles (list, optional) – Quantiles to determine the credible intervals on comparative branch statistics (i.e. the change relative to the reference branch, probably the control). Change these when making Bonferroni corrections.
- Returns:
If
stat_fn
returns a scalar (this is the default), then this function returns a dictionary has the following keys and values:’individual’: dictionary mapping each branch name to a pandas Series that holds the expected value for the bootstrapped
stat_fn
, and credible intervals.’comparative’: dictionary mapping each branch name to a pandas Series of summary statistics for the possible uplifts of the bootstrapped
stat_fn
relative to the reference branch.
Otherwise, when
stat_fn
returns a dict, then this function returns a similar dictionary, except the Series are replaced with DataFrames. Each row in each DataFrame corresponds to one output ofstat_fn
, and is the Series that would be returned ifstat_fn
computed only this statistic.
- mozanalysis.bayesian_stats.bayesian_bootstrap.bootstrap_one_branch(data, stat_fn=<function bb_mean>, num_samples=10000, seed_start=None, threshold_quantile=None, summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]
Bootstrap
stat_fn
for one branch on its own.Computes
stat_fn
fornum_samples
resamples ofdata
, then returns summary statistics for the results.- Parameters:
data – The data as a list, 1D numpy array, or pandas Series
stat_fn (callable, optional) –
A function that either:
Aggregates each resampled population to a scalar (e.g. the default,
bb_mean
), orAggregates each resampled population to a dict of scalars (e.g. the func returned by
make_bb_quantile_closure
when given multiple quantiles.
In both cases, this function must accept two parameters:
a one-dimensional ndarray or pandas Series of values,
an identically shaped object of weights for these values
num_samples – The number of bootstrap iterations to perform
seed_start – An int with which to seed numpy’s RNG. It must be unique within this set of calculations.
threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g.
0.9999
.summary_quantiles (list, optional) – Quantiles to determine the confidence bands on the branch statistics. Change these when making Bonferroni corrections.
- mozanalysis.bayesian_stats.bayesian_bootstrap.get_bootstrap_samples(data, stat_fn=<function bb_mean>, num_samples=10000, seed_start=None, threshold_quantile=None)[source]
Return
stat_fn
evaluated on resampled data.- Parameters:
data – The data as a list, 1D numpy array, or pandas series
stat_fn (callable, optional) –
A function that either:
Aggregates each resampled population to a scalar (e.g. the default,
bb_mean
), orAggregates each resampled population to a dict of scalars (e.g. the func returned by
make_bb_quantile_closure
when given multiple quantiles.
In both cases, this function must accept two parameters:
a one-dimensional ndarray or pandas Series of values,
an identically shaped object of weights for these values
num_samples – The number of samples to return
seed_start –
A seed for the random number generator; this function will use seeds in the range:
[seed_start, seed_start + num_samples)
and these particular seeds must not be used elsewhere in this calculation. By default, use a random seed.
threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g.
0.9999
.
- Returns:
A Series or DataFrame with one row per sample and one column per output of
stat_fn
.
References
- Rubin, Donald B. The Bayesian Bootstrap. Ann. Statist. 9 (1981),
no. 1, 130–134. https://dx.doi.org/10.1214/aos/1176345338