mozanalysis.bayesian_stats.binary
- mozanalysis.bayesian_stats.binary.compare_branches(df, col_label, ref_branch_label='control', num_samples=10000, individual_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995), comparative_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]
Jointly sample conversion rates for branches then compare them.
See compare_branches_from_agg for more details.
- Parameters:
df (pd.DataFrame) – Queried experiment data in the standard format.
col_label (str) – Label for the df column contaning the metric to be analyzed.
ref_branch_label (str, optional) – String in
df['branch']
that identifies the the branch with respect to which we want to calculate uplifts - usually the control branch.num_samples (int, optional) – The number of samples to compute.
individual_summary_quantiles (list, optional) – Quantiles to determine the confidence bands on individual branch statistics. Change these when making Bonferroni corrections.
comparative_summary_quantiles (list, optional) – Quantiles to determine the confidence bands on comparative branch statistics (i.e. the change relative to the reference branch, probably the control). Change these when making Bonferroni corrections.
Returns a dictionary:
‘individual’: dictionary mapping branch names to a pandas Series of summary stats for the posterior distribution over the branch’s conversion rate.
‘comparative’: dictionary mapping branch names to a pandas Series of summary statistics for the possible uplifts of the conversion rate relative to the reference branch - see docs for
mozanalysis.bayesian_stats.summarize_samples.summarize_joint_samples()
.
- mozanalysis.bayesian_stats.binary.aggregate_col(df, col_label)[source]
Return the number of enrollments and conversions per branch.
- Parameters:
df (pd.DataFrame) – Queried experiment data in the standard format.
col_label (str) – Label for the df column contaning the metric to be analyzed.
- Returns:
A DataFrame. The index is the list of branches. It has the following columns:
num_enrollments: The number of experiment subjects enrolled in this branch who were eligible for the metric.
num_conversions: The number of these enrolled experiment subjects who met the metric’s conversion criteria.
- mozanalysis.bayesian_stats.binary.summarize_one_branch_from_agg(s, num_enrollments_label='num_enrollments', num_conversions_label='num_conversions', quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]
Return stats about a branch’s conversion rate.
Calculate and return a Series of summary stats for the posterior distribution over the branch’s conversion rate.
- Parameters:
s (pd.Series) – Holds the number of enrollments and number of conversions for this branch and metric.
num_enrollments_label (str, optional) – The label in this Series for the number of enrollments
num_conversions_label (str, optional) – The label in this Series for the number of conversions
quantiles (list, optional) – The quantiles to return as summary statistics.
- Returns:
A pandas Series; the index contains the stringified
quantiles
plus'mean'
.
- mozanalysis.bayesian_stats.binary.compare_branches_from_agg(df, ref_branch_label='control', num_enrollments_label='num_enrollments', num_conversions_label='num_conversions', num_samples=10000, individual_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995), comparative_summary_quantiles=(0.005, 0.025, 0.5, 0.975, 0.995))[source]
Jointly sample conversion rates for two branches then compare them.
Calculates various quantiles on the uplift of the non-control branch’s sampled conversion rates with respect to the control branch’s sampled conversion rates.
The data in df is modelled as being generated binomially, with a Beta(1, 1) (uniform) prior over the conversion rate parameter.
- Parameters:
df –
A pandas dataframe of integers.
df.index
lists the experiment branchesdf.columns
is[num_enrollments_label, num_conversions_label]
ref_branch_label (str, optional) – Label for the df row containing data for the control branch
num_enrollments_label – Label for the df column containing the number of enrollments in each branch.
num_conversions_label – Label for the df column containing the number of conversions in each branch.
num_samples – The number of samples to compute
Returns a dictionary:
‘individual’: dictionary mapping branch names to a pandas Series of summary stats for the posterior distribution over the branch’s conversion rate.
‘comparative’: dictionary mapping branch names to a pandas Series of summary statistics for the possible uplifts of the conversion rate relative to the reference branch - see docs for
mozanalysis.stats.summarize_samples.summarize_joint_samples()
.
- mozanalysis.bayesian_stats.binary.get_samples(df, num_enrollments_label, num_conversions_label, num_samples)[source]
Return samples from Beta distributions.
Assumes a Beta(1, 1) prior.
- Parameters:
df –
A pandas dataframe of integers:
df.index
lists the experiment branchesdf.columns
is(num_enrollments_label, num_conversions_label)
num_enrollments_label – Label for the df column containing the number of enrollments in each branch.
num_conversions_label – Label for the df column containing the number of conversions in each branch.
num_samples – The number of samples to compute
Returns a pandas.DataFrame of sampled conversion rates
columns: list of branches
index: enumeration of samples