mozanalysis.bayesian_stats.bayesian_bootstrap

mozanalysis.bayesian_stats.bayesian_bootstrap.bb_mean(values, prob_weights)[source]

Calculate the mean of a bootstrap replicate.

Parameters:
  • values (pd.Series, ndarray) – One dimensional array of observed values

  • prob_weights (pd.Series, ndarray) – Equally shaped array of the probability weight associated with each value.

Returns:

The mean as a np.float.

mozanalysis.bayesian_stats.bayesian_bootstrap.make_bb_quantile_closure(quantiles)[source]

Return a function to calculate quantiles for a bootstrap replicate.

Parameters:

quantiles (float, list of floats) – Quantiles to compute

Returns a function that calculates quantiles for a bootstrap replicate:

Args:

values (pd.Series, ndarray):

One dimensional array of observed values

prob_weights (pd.Series, ndarray):

Equally shaped array of the probability weight associated with each value.

Returns:

  • A quantile as a np.float, or

  • several quantiles as a dict keyed by the quantiles

mozanalysis.bayesian_stats.bayesian_bootstrap.compare_branches(df, col_label, ref_branch_label='control', stat_fn=<function bb_mean>, num_samples=10000, threshold_quantile=None, individual_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995), comparative_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]

Jointly sample bootstrapped statistics then compare them.

Parameters:
  • df – a pandas DataFrame of queried experiment data in the standard format (see mozanalysis.experiment).

  • col_label (str) – Label for the df column contaning the metric to be analyzed.

  • ref_branch_label (str, optional) – String in df['branch'] that identifies the branch with respect to which we want to calculate uplifts - usually the control branch.

  • stat_fn (callable, optional) –

    A function that either:

    • Aggregates each resampled population to a scalar (e.g. the default, bb_mean), or

    • Aggregates each resampled population to a dict of scalars (e.g. the func returned by make_bb_quantile_closure when given multiple quantiles.

    In both cases, this function must accept two parameters:

    • a one-dimensional ndarray or pandas Series of values,

    • an identically shaped object of weights for these values

  • num_samples (int, optional) – The number of bootstrap iterations to perform.

  • threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g. 0.9999.

  • individual_summary_quantiles (list, optional) – Quantiles to determine the credible intervals on individual branch statistics. Change these when making Bonferroni corrections.

  • comparative_summary_quantiles (list, optional) – Quantiles to determine the credible intervals on comparative branch statistics (i.e. the change relative to the reference branch, probably the control). Change these when making Bonferroni corrections.

Returns:

If stat_fn returns a scalar (this is the default), then this function returns a dictionary has the following keys and values:

  • ’individual’: dictionary mapping each branch name to a pandas Series that holds the expected value for the bootstrapped stat_fn, and credible intervals.

  • ’comparative’: dictionary mapping each branch name to a pandas Series of summary statistics for the possible uplifts of the bootstrapped stat_fn relative to the reference branch.

Otherwise, when stat_fn returns a dict, then this function returns a similar dictionary, except the Series are replaced with DataFrames. Each row in each DataFrame corresponds to one output of stat_fn, and is the Series that would be returned if stat_fn computed only this statistic.

mozanalysis.bayesian_stats.bayesian_bootstrap.bootstrap_one_branch(data, stat_fn=<function bb_mean>, num_samples=10000, seed_start=None, threshold_quantile=None, summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]

Bootstrap stat_fn for one branch on its own.

Computes stat_fn for num_samples resamples of data, then returns summary statistics for the results.

Parameters:
  • data – The data as a list, 1D numpy array, or pandas Series

  • stat_fn (callable, optional) –

    A function that either:

    • Aggregates each resampled population to a scalar (e.g. the default, bb_mean), or

    • Aggregates each resampled population to a dict of scalars (e.g. the func returned by make_bb_quantile_closure when given multiple quantiles.

    In both cases, this function must accept two parameters:

    • a one-dimensional ndarray or pandas Series of values,

    • an identically shaped object of weights for these values

  • num_samples – The number of bootstrap iterations to perform

  • seed_start – An int with which to seed numpy’s RNG. It must be unique within this set of calculations.

  • threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g. 0.9999.

  • summary_quantiles (list, optional) – Quantiles to determine the confidence bands on the branch statistics. Change these when making Bonferroni corrections.

mozanalysis.bayesian_stats.bayesian_bootstrap.get_bootstrap_samples(data, stat_fn=<function bb_mean>, num_samples=10000, seed_start=None, threshold_quantile=None)[source]

Return stat_fn evaluated on resampled data.

Parameters:
  • data – The data as a list, 1D numpy array, or pandas series

  • stat_fn (callable, optional) –

    A function that either:

    • Aggregates each resampled population to a scalar (e.g. the default, bb_mean), or

    • Aggregates each resampled population to a dict of scalars (e.g. the func returned by make_bb_quantile_closure when given multiple quantiles.

    In both cases, this function must accept two parameters:

    • a one-dimensional ndarray or pandas Series of values,

    • an identically shaped object of weights for these values

  • num_samples – The number of samples to return

  • seed_start

    A seed for the random number generator; this function will use seeds in the range:

    [seed_start, seed_start + num_samples)
    

    and these particular seeds must not be used elsewhere in this calculation. By default, use a random seed.

  • threshold_quantile (float, optional) – An optional threshold quantile, above which to discard outliers. E.g. 0.9999.

Returns:

A Series or DataFrame with one row per sample and one column per output of stat_fn.

References

Rubin, Donald B. The Bayesian Bootstrap. Ann. Statist. 9 (1981),

no. 1, 130–134. https://dx.doi.org/10.1214/aos/1176345338