mozanalysis.segments

class mozanalysis.segments.SegmentDataSource(name, from_expr, window_start: int = 0, window_end: int = 0, client_id_column: str = 'client_id', submission_date_column: str = 'submission_date', default_dataset: str | None = None, app_name: str | None = None, group_id_column: str = 'profile_group_id')[source]

Represents a table or view, from which segments may be defined.

window_start and window_end define the window of data used to determine whether each client fits a segment. Ideally this window ends at/before the moment of enrollment, so that user’s branches can’t bias the segment assignment. window_start and window_end are integers, representing the number of days before or after enrollment.

Parameters:
  • name (str) – Name for the Data Source. Should be unique to avoid confusion.

  • from_expr (str) – FROM expression - often just a fully-qualified table name. Sometimes a subquery. May contain the string {dataset} which will be replaced with an app-specific dataset for Glean apps. If the expression is templated on dataset, default_dataset is mandatory.

  • window_start (int, optional) – See above.

  • window_end (int, optional) – See above.

  • client_id_column (str, optional) – Name of the column that contains the client_id (join key). Defaults to ‘client_id’.

  • submission_date_column (str, optional) – Name of the column that contains the submission date (as a date, not timestamp). Defaults to ‘submission_date’.

  • default_dataset (str, optional) – The value to use for {dataset} in from_expr if a value is not provided at runtime. Mandatory if from_expr contains a {dataset} parameter.

  • app_name – (str, optional): app_name used with metric-hub, used for validation

  • group_id_column (str, optional) – Name of the column that contains the group_id (join key). Defaults to ‘profile_group_id’.

from_expr_for(dataset: str | None) str[source]

Expands the from_expr template for the given dataset. If from_expr is not a template, returns from_expr.

Parameters:

dataset (str or None) – Dataset name to substitute into the from expression.

build_query(segment_list, time_limits, experiment_slug, from_expr_dataset=None, analysis_unit: AnalysisUnit = AnalysisUnit.CLIENT)[source]

Return a nearly self contained SQL query.

The query takes a list of {analysis_id}``s from ``raw_enrollments, and adds one non-NULL boolean column per segment: True if the client is in the segment, False otherwise.

build_query_target(target, time_limits, from_expr_dataset=None, analysis_unit: AnalysisUnit = AnalysisUnit.CLIENT)[source]

Return a nearly-self contained SQL query, for use with mozanalysis.sizing.HistoricalTarget.

This query returns all distinct client IDs that satisfy the criteria for inclusion in a historical analysis using this datasource. Separate sub-queries are constructed for each additional Segment in the analysis.

class mozanalysis.segments.Segment(name: str, data_source, select_expr: str, friendly_name: str | None = None, description: str | None = None, app_name: str | None = None)[source]

Represents an experiment Segment.

Parameters:
  • name (str) – The segment’s name; will be a column name.

  • data_source (SegmentDataSource) – Data source that provides the columns referenced in select_expr.

  • select_expr (str) – A SQL select expression that includes an aggregation function (we GROUP BY {analysis_unit}). Returns a non-NULL BOOL: True if the user is in the segment, False otherwise.

  • friendly_name (str) – A human-readable dashboard title for this segment

  • description (str) – A paragraph of Markdown-formatted text describing the segment in more detail, to be shown on dashboards

  • app_name – (str, optional): app_name used with metric-hub, used for validation