mozanalysis.frequentist_stats.sample_size

mozanalysis.frequentist_stats.sample_size.sample_size_curves(df: DataFrame, metrics_list: list, solver, effect_size: float | ndarray | Series | List[float] = 0.01, power: float | ndarray | Series | List[float] = 0.8, alpha: float | ndarray | Series | List[float] = 0.05, **solver_kwargs) Dict[str, DataFrame][source]

Loop over a list of different parameters to produce sample size estimates given those parameters. A single parameter in [effect_size, power, alpha] should be passed a list; the sample size curve will be calculated with this as the variable.

Parameters:
  • df – A pandas DataFrame of queried historical data.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.

  • of (solver (any function that returns sample size as function) – effect_size, power, alpha): The solver being used to calculate sample size.

  • effect_size (float or ArrayLike, default .01) – For test of differences in proportions, the absolute difference; for tests of differences in mean, the percent change.

  • alpha (float or ArrayLike, default .05) – Significance level for the experiment.

  • power (float or ArrayLike, default .90) – Probability of detecting an effect, when a significant effect exists.

  • **solver_kwargs (dict) – Arguments necessary for the provided solver.

Returns:

A dictionary of pd.DataFrame objects. An item in the dictionary is created for each metric in metric_list, containing a DataFrame of sample size per branch, number of clients that satisfied targeting, and population proportion per branch at each value of the iterable parameter.

mozanalysis.frequentist_stats.sample_size.difference_of_proportions_sample_size_calc(df: DataFrame, metrics_list: List[Metric], effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) dict[source]

Perform sample size calculation for an experiment to test for a difference in proportions.

Parameters:
  • df – A pandas DataFrame of queried historical data.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each

  • effect_size (float, default .01) – Difference in proportion for the minimum detectable effect – effect_size = p(event under alt) - p(event under null)

  • alpha (float, default .05) – Significance level for the experiment.

  • power (float, default .90) – Probability of detecting an effect, when a significant effect exists.

  • outlier_percentile (float, default .995) – Percentile at which to trim each columns.

Returns:

A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.

mozanalysis.frequentist_stats.sample_size.z_or_t_ind_sample_size_calc(df: DataFrame, metrics_list: List[Metric], test: str = 'z', effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) dict[source]

Perform sample size calculation for an experiment based on independent samples t or z tests.

Parameters:
  • df – A pandas DataFrame of queried historical data.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each

  • test (str, default z) – z or t to indicate which solver to use

  • effect_size (float, default .01) – Percent change in metrics expected as a result of the experiment treatment

  • alpha (float, default .05) – Significance level for the experiment.

  • power (float, default .90) – Probability of detecting an effect, when a significant effect exists.

  • outlier_percentile (float, default .995) – Percentile at which to trim each columns.

Returns:

A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.

mozanalysis.frequentist_stats.sample_size.empirical_effect_size_sample_size_calc(res: TimeSeriesResult, bq_context: BigQueryContext, metric_list: list, quantile: float = 0.9, power: float = 0.8, alpha: float = 0.05, parent_distribution: str = 'normal', plot_effect_sizes: bool = False) dict[source]

Perform sample size calculation with empirical effect size and asymptotic approximation of Wilcoxen-Mann-Whitney U Test. Empirical effect size is estimated using a quantile of week-to-week changes over the course of the study, and the variance in the test statistic is estimated as a quantile of weekly variance in metrics. Sample size calculation is based on the asymptotic relative efficiency (ARE) of the U test to the T test (see Stapleton 2008, pg 266, or https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/ Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPowerManual.pdf)

Parameters:
  • res – A TimeSeriesResult, generated by mozanalysis.sizing.HistoricalTarget.get_time_series_data.

  • bq_context – A mozanalysis.bq.BigQueryContext object that handles downloading time series data from BigQuery.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.

  • quantile (float, default .90) – Quantile used to calculate the effect size as the quantile of week-to-week metric changes and the variance of the mean.

  • alpha (float, default .05) – Significance level for the experiment.

  • power (float, default .90) – Probability of detecting an effect, when a significant effect exists.

  • parent_distribution (str, default "normal") – Distribution of the parent data; must be normal, uniform, logistic, or laplace.

  • plot_effect_sizes (bool, default False) – Whether or not to plot the distribution of effect sizes observed in historical data.

Returns:

A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.

mozanalysis.frequentist_stats.sample_size.poisson_diff_solve_sample_size(df: DataFrame, metrics_list: List[Metric], effect_size: float = 0.01, alpha: float = 0.05, power: float = 0.9, outlier_percentile: float = 99.5) dict[source]

Sample size for test of difference of Poisson rates, based on Poisson rate’s asymptotic normality.

Parameters:
  • df – A pandas DataFrame of queried historical data.

  • metrics_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each

  • test (str, default z) – z or t to indicate which solver to use

  • effect_size (float, default .01) – Percent change in metrics expected as a result of the experiment treatment

  • alpha (float, default .05) – Significance level for the experiment.

  • power (float, default .90) – Probability of detecting an effect, when a significant effect exists.

  • outlier_percentile (float, default .995) – Percentile at which to trim each columns.

Returns:

A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.

mozanalysis.frequentist_stats.sample_size.variable_enrollment_length_sample_size_calc(bq_context: BigQueryContext, start_date: str | datetime, max_enrollment_days: int, analysis_length: int, metric_list: List[Metric], target_list: List[Segment], variable_window_length: int = 7, experiment_name: str | None = '', app_id: str | None = '', to_pandas: bool = True, **sizing_kwargs) Dict[str, Dict[str, int] | DataFrame][source]

Sample size calculation over a variable enrollment window. This function will fetch a DataFrame with metrics defined in metric_list for a target population defined in the target_list over an enrollment window of length max_enrollment_days. Sample size calculation is performed using clients enrolled in the first variable_window_length dates in that max enrollment window; that window is incrementally widened by the variable window length and sample size calculation performed again, until the last enrollment date is reached.

Parameters:
  • bq_context – A mozanalysis.bq.BigQueryContext object that handles downloading data from BigQuery.

  • start_date (str or datetime in %Y-%m-%d format) – First date of enrollment for sizing job.

  • max_enrollment_days (int) – Maximum number of dates to consider for the enrollment period for the experiment in question.

  • analysis_length (int) – Number of days to record metrics for each client in the experiment in question.

  • metric_list (list of mozanalysis.metrics.Metric) – List of metrics used to construct the results df from HistoricalTarget. The names of these metrics are used to return results for sample size calculation for each.

  • target_list (list of mozanalysis.segments.Segment) – List of segments used to identify clients to include in the study.

  • variable_window_length (int) – Length of the intervals used to extend the enrollment period incrementally. Sample sizes are recalculated over each variable enrollment period.

  • experiment_name (str) – Optional name used to name the target and metric tables in BigQuery.

  • app_id (str) – Application that experiment will be run on.

  • **sizing_kwargs – Arguments to pass to z_or_t_ind_sample_size_calc

Returns:

A dictionary. Keys in the dictionary are the metrics column names from the DataFrame; values are the required sample size per branch to achieve the desired power for that metric.