mozanalysis.frequentist_stats.sample_size

module for sample size calculations

class mozanalysis.frequentist_stats.sample_size.ResultsHolder(*args, metrics: dict | None = None, params: dict | None = None, **kwargs)[source]

Object to hold results from different methods. It extends the dictionary objects so that users can interact with it like a dictionary with the same keys/values as before, making it backward compatible.

property metrics

List of metrics used to generate the results. Defaults to None

property params

Parameters used to generated the results. Defaults to None

static make_friendly_name(ugly_name: str) str[source]

Turns a name into a friendly name by replacing underscores with spaces and capitlizing words other than ones like “per”, “of”, etc :param ugly_name: name to make pretty :type ugly_name: str

Returns:

reformatted name

Return type:

pretty_name (str)

class mozanalysis.frequentist_stats.sample_size.SampleSizeResultsHolder(*args, metrics: dict | None = None, params: dict | None = None, **kwargs)[source]

Object to hold results from different methods. It extends the dictionary objects so that users can interact with it with a dictionary with the same keys/values as before, making it backward compatible. The dictionary functionality is extended to include additional attributes to hold metadata, a method for plotting results and a method for returning results as a dataframe

property dataframe: DataFrame

dataframe property

returns data as a dataframe rather than a dict

plot_results(result_name: str = 'sample_size_per_branch')[source]

plots the outputs of the sampling methods

Parameters:
  • result_name (str) – sample size method output to plot.

  • sample_size_per_branch (Defaults to)

class mozanalysis.frequentist_stats.sample_size.EmpiricalEffectSizeResultsHolder(*args, metrics: dict | None = None, params: dict | None = None, **kwargs)[source]

ResultsHolder for empirical_effect_size_sample_size_calc

style_empirical_sizing_result(empirical_sizing_df: DataFrame) Styler[source]

Pretty-print the DF returned by empirical_sizing().

Returns a pandas Styler object.

property dataframe: DataFrame

dataframe property

returns results consolidated into a dataframe

get_styled_dataframe() Styler[source]

returns styled dataframe for results from empirical_effect_size_sample_size_calc

Parameters:
  • style (bool) – If true, return a cleaned up and formatted pandas Styler.

  • dataframe (Otherwise return a)

Returns:

styled dataframe for visualization

Return type:

Styler

class mozanalysis.frequentist_stats.sample_size.SampleSizeCurveResultHolder(*args, **kwargs)[source]
property dataframe: DataFrame

dataframe property

get_styled_dataframe(input_data: DataFrame | None = None, show_population_pct: bool = True, simulated_values: list[float] | None = None, append_stats: bool = False, highlight_lessthan: list[float] | None = None, trim_highlight_threshold: float = 0.15) Styler[source]

Returns styled dataframe useful for visualization

Parameters:
  • input_data (pd.DataFrame, optional) – Metric data used for summary stats. Defaults to None.

  • show_population_pct (bool, optional) – Controls whether output is a percent of population or a count. Defaults to True.

  • simulated_values (List[float], optional) – List of values that were varied to create curves. Defaults to None.

  • append_stats (bool, optional) – Controls whether or not to append summary stats to output dataframe. Defaults to False.

  • highlight_lessthan (List[float], optional) – list of sample size thresholds to highlight in the results. For each threshold, sample sizes lower than it (but higher than any other thresholds) are highlighted in a predefined colour. When show_population_pct is True, thresholds should be expressed as a percentage between 0 and 100, not a decimal between 0 and 1 (for example, to set a threshold for 5%, supply [5]). At most 3 different thresholds are supported: only the 3 lowest thresholds supplied will be used, and any others are silently ignored. Defaults to None.

  • trim_highlight_threshold (float, optional) – if summary stats are shown, cases for which the trimmed mean differs from the raw mean by more than this threshold are highlighted. These metrics are strongly affected by outliers. The threshold should be a relative difference value between 0 and 1.. Defaults to 0.15.

Returns:

styled output

Return type:

Styler

mozanalysis.frequentist_stats.sample_size.sample_size_curves(df: DataFrame, metrics_list: list, solver, effect_size: float | ndarray | Series | list[float] = 0.01, power: float | ndarray | Series | list[float] = 0.8, alpha: float | ndarray | Series | list[float] = 0.05, **solver_kwargs) SampleSizeCurveResultHolder[source]

Loop over a list of different parameters to produce sample size estimates given those parameters. A single parameter in [effect_size, power, alpha] should be passed a list; the sample size curve will be calculated with this as the variable.

Parameters:
  • df – A pandas DataFrame of queried historical data.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.

  • of (solver (any function that returns sample size as function) – effect_size, power, alpha): The solver being used to calculate sample size.

  • effect_size (float or ArrayLike, default .01) – For test of differences in proportions, the absolute difference; for tests of differences in mean, the percent change.

  • alpha (float or ArrayLike, default .05) – Significance level for the experiment.

  • power (float or ArrayLike, default .90) – Probability of detecting an effect, when a significant effect exists.

  • **solver_kwargs (dict) – Arguments necessary for the provided solver.

Returns:

The data attribute contains a dictionary of pd.DataFrame objects. An item in the dictionary is created for each metric in metric_list, containing a DataFrame of sample size per branch, number of clients that satisfied targeting, and population proportion per branch at each value of the iterable parameter. Additional methods for ease of use are documented in the class.

Return type:

SampleSizeCurveResultHolder

mozanalysis.frequentist_stats.sample_size.difference_of_proportions_sample_size_calc(df: DataFrame, metrics_list: list[Metric], effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) SampleSizeResultsHolder[source]

Perform sample size calculation for an experiment to test for a difference in proportions.

Parameters:
  • df – A pandas DataFrame of queried historical data.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each

  • effect_size (float, default .01) – Difference in proportion for the minimum detectable effect – effect_size = p(event under alt) - p(event under null)

  • alpha (float, default .05) – Significance level for the experiment.

  • power (float, default .90) – Probability of detecting an effect, when a significant effect exists.

  • outlier_percentile (float, default .995) – Percentile at which to trim each columns.

Returns:

The data attribute contains a dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric. Additional methods for ease of use are documented in the class.

Return type:

SampleSizeResultsHolder

mozanalysis.frequentist_stats.sample_size.z_or_t_ind_sample_size_calc(df: DataFrame, metrics_list: list[Metric], test: str = 'z', effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) SampleSizeResultsHolder[source]

Perform sample size calculation for an experiment based on independent samples t or z tests.

Parameters:
  • df – A pandas DataFrame of queried historical data.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each

  • test (str, default z) – z or t to indicate which solver to use

  • effect_size (float, default .01) – Percent change in metrics expected as a result of the experiment treatment

  • alpha (float, default .05) – Significance level for the experiment.

  • power (float, default .90) – Probability of detecting an effect, when a significant effect exists.

  • outlier_percentile (float, default .995) – Percentile at which to trim each columns.

Returns:

The data attribute contains a dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric. Additional methods for ease of use are documented in the class.

Return type:

SampleSizeResultsHolder

mozanalysis.frequentist_stats.sample_size.empirical_effect_size_sample_size_calc(res: TimeSeriesResult, bq_context: BigQueryContext, metric_list: list, quantile: float = 0.9, power: float = 0.8, alpha: float = 0.05, parent_distribution: str = 'normal', plot_effect_sizes: bool = False) EmpiricalEffectSizeResultsHolder[source]

Perform sample size calculation with empirical effect size and asymptotic approximation of Wilcoxen-Mann-Whitney U Test. Empirical effect size is estimated using a quantile of week-to-week changes over the course of the study, and the variance in the test statistic is estimated as a quantile of weekly variance in metrics. Sample size calculation is based on the asymptotic relative efficiency (ARE) of the U test to the T test (see Stapleton 2008, pg 266, or https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/ Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerManual.pdf)

Parameters:
  • res – A TimeSeriesResult, generated by mozanalysis.sizing.HistoricalTarget.get_time_series_data.

  • bq_context – A mozanalysis.bq.BigQueryContext object that handles downloading time series data from BigQuery.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.

  • quantile (float, default .90) – Quantile used to calculate the effect size as the quantile of week-to-week metric changes and the variance of the mean.

  • alpha (float, default .05) – Significance level for the experiment.

  • power (float, default .90) – Probability of detecting an effect, when a significant effect exists.

  • parent_distribution (str, default "normal") – Distribution of the parent data; must be normal, uniform, logistic, or laplace.

  • plot_effect_sizes (bool, default False) – Whether or not to plot the distribution of effect sizes observed in historical data.

Returns:

The data attribute contains a dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are dictionaries containing the required sample size per branch to achieve the desired power for that metric, along with additional information. Additional methods for ease of use are documented in the class.

Return type:

EmpiricalEffectSizeResultsHolder

mozanalysis.frequentist_stats.sample_size.poisson_diff_solve_sample_size(df: DataFrame, metrics_list: list[Metric], effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) SampleSizeResultsHolder[source]

Sample size for test of difference of Poisson rates, based on Poisson rate’s asymptotic normality.

Parameters:
  • df – A pandas DataFrame of queried historical data.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each

  • test (str, default z) – z or t to indicate which solver to use

  • effect_size (float, default .01) – Percent change in metrics expected as a result of the experiment treatment

  • alpha (float, default .05) – Significance level for the experiment.

  • power (float, default .90) – Probability of detecting an effect, when a significant effect exists.

  • outlier_percentile (float, default .995) – Percentile at which to trim each columns.

Returns:

The data attribute contains a dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric. Additional methods for ease of use are documented in the class

Return type:

SampleSizeResultsHolder

mozanalysis.frequentist_stats.sample_size.variable_enrollment_length_sample_size_calc(bq_context: BigQueryContext, start_date: str | datetime, max_enrollment_days: int, analysis_length: int, metric_list: list[Metric], target_list: list[Segment], variable_window_length: int = 7, experiment_name: str | None = '', app_id: str | None = '', to_pandas: bool = True, **sizing_kwargs) dict[str, dict[str, int] | DataFrame][source]

Sample size calculation over a variable enrollment window. This function will fetch a DataFrame with metrics defined in metric_list for a target population defined in the target_list over an enrollment window of length max_enrollment_days. Sample size calculation is performed using clients enrolled in the first variable_window_length dates in that max enrollment window; that window is incrementally widened by the variable window length and sample size calculation performed again, until the last enrollment date is reached.

Parameters:
  • bq_context – A mozanalysis.bq.BigQueryContext object that handles downloading data from BigQuery.

  • start_date (str or datetime in %Y-%m-%d format) – First date of enrollment for sizing job.

  • max_enrollment_days (int) – Maximum number of dates to consider for the enrollment period for the experiment in question.

  • analysis_length (int) – Number of days to record metrics for each client in the experiment in question.

  • metric_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.

  • target_list (list of mozanalysis.segments.Segment) – List of segments used to identify clients to include in the study.

  • variable_window_length (int) – Length of the intervals used to extend the enrollment period incrementally. Sample sizes are recalculated over each variable enrollment period.

  • experiment_name (str) – Optional name used to name the target and metric tables in BigQuery.

  • app_id (str) – Application that experiment will be run on.

  • **sizing_kwargs – Arguments to pass to z_or_t_ind_sample_size_calc

Returns:

A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.