mozanalysis.bayesian_stats.binary

mozanalysis.bayesian_stats.binary.compare_branches(df, col_label, ref_branch_label='control', num_samples=10000, individual_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995), comparative_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]

Jointly sample conversion rates for branches then compare them.

See compare_branches_from_agg for more details.

Parameters:
  • df (pd.DataFrame) – Queried experiment data in the standard format.

  • col_label (str) – Label for the df column contaning the metric to be analyzed.

  • ref_branch_label (str, optional) – String in df['branch'] that identifies the the branch with respect to which we want to calculate uplifts - usually the control branch.

  • num_samples (int, optional) – The number of samples to compute.

  • individual_summary_quantiles (list, optional) – Quantiles to determine the confidence bands on individual branch statistics. Change these when making Bonferroni corrections.

  • comparative_summary_quantiles (list, optional) – Quantiles to determine the confidence bands on comparative branch statistics (i.e. the change relative to the reference branch, probably the control). Change these when making Bonferroni corrections.

Returns a dictionary:

  • ‘individual’: dictionary mapping branch names to a pandas Series of summary stats for the posterior distribution over the branch’s conversion rate.

  • ‘comparative’: dictionary mapping branch names to a pandas Series of summary statistics for the possible uplifts of the conversion rate relative to the reference branch - see docs for mozanalysis.bayesian_stats.summarize_samples.summarize_joint_samples().

mozanalysis.bayesian_stats.binary.aggregate_col(df, col_label)[source]

Return the number of enrollments and conversions per branch.

Parameters:
  • df (pd.DataFrame) – Queried experiment data in the standard format.

  • col_label (str) – Label for the df column contaning the metric to be analyzed.

Returns:

A DataFrame. The index is the list of branches. It has the following columns:

  • num_enrollments: The number of experiment subjects enrolled in this branch who were eligible for the metric.

  • num_conversions: The number of these enrolled experiment subjects who met the metric’s conversion criteria.

mozanalysis.bayesian_stats.binary.summarize_one_branch_from_agg(s, num_enrollments_label='num_enrollments', num_conversions_label='num_conversions', quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]

Return stats about a branch’s conversion rate.

Calculate and return a Series of summary stats for the posterior distribution over the branch’s conversion rate.

Parameters:
  • s (pd.Series) – Holds the number of enrollments and number of conversions for this branch and metric.

  • num_enrollments_label (str, optional) – The label in this Series for the number of enrollments

  • num_conversions_label (str, optional) – The label in this Series for the number of conversions

  • quantiles (list, optional) – The quantiles to return as summary statistics.

Returns:

A pandas Series; the index contains the stringified quantiles plus 'mean'.

mozanalysis.bayesian_stats.binary.compare_branches_from_agg(df, ref_branch_label='control', num_enrollments_label='num_enrollments', num_conversions_label='num_conversions', num_samples=10000, individual_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995), comparative_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]

Jointly sample conversion rates for two branches then compare them.

Calculates various quantiles on the uplift of the non-control branch’s sampled conversion rates with respect to the control branch’s sampled conversion rates.

The data in df is modelled as being generated binomially, with a Beta(1, 1) (uniform) prior over the conversion rate parameter.

Parameters:
  • df

    A pandas dataframe of integers.

    • df.index lists the experiment branches

    • df.columns is [num_enrollments_label, num_conversions_label]

  • ref_branch_label (str, optional) – Label for the df row containing data for the control branch

  • num_enrollments_label – Label for the df column containing the number of enrollments in each branch.

  • num_conversions_label – Label for the df column containing the number of conversions in each branch.

  • num_samples – The number of samples to compute

Returns a dictionary:

  • ‘individual’: dictionary mapping branch names to a pandas Series of summary stats for the posterior distribution over the branch’s conversion rate.

  • ‘comparative’: dictionary mapping branch names to a pandas Series of summary statistics for the possible uplifts of the conversion rate relative to the reference branch - see docs for mozanalysis.stats.summarize_samples.summarize_joint_samples().

mozanalysis.bayesian_stats.binary.get_samples(df, num_enrollments_label, num_conversions_label, num_samples)[source]

Return samples from Beta distributions.

Assumes a Beta(1, 1) prior.

Parameters:
  • df

    A pandas dataframe of integers:

    • df.index lists the experiment branches

    • df.columns is (num_enrollments_label, num_conversions_label)

  • num_enrollments_label – Label for the df column containing the number of enrollments in each branch.

  • num_conversions_label – Label for the df column containing the number of conversions in each branch.

  • num_samples – The number of samples to compute

Returns a pandas.DataFrame of sampled conversion rates

  • columns: list of branches

  • index: enumeration of samples