mozetl.addon_aggregates package

Submodules

mozetl.addon_aggregates.addon_aggregates module

ETL code for the addon_aggregates dataset

mozetl.addon_aggregates.addon_aggregates.add_addon_columns(df)[source]

Constructs additional indicator columns decribing the add-on/theme present in a given record. The columns are

is_self_install is_shield_addon is_foreign_install is_system is_web_extension Which maps True -> 1 and False -> 0

Parameters

df – SparkDF, exploded on active_addons, each record maps to a single add-on

:return df with the above columns added

mozetl.addon_aggregates.addon_aggregates.aggregate_addons(df)[source]

Aggregates add-on indicators by client, channel, version and locale. The result is a DataFrame with the additional aggregate columns:

n_self_installed_addons (int) n_shield_addons (int) n_foreign_installed_addons (int) n_system_addons (int) n_web_extensions (int) first_addon_install_date (str %Y%m%d) profile_creation_date (str %Y%m%d)

for each of the above facets.

Parameters

df – an expoded instance of main_summary by active_addons with various additional indicator columns

Return SparkDF

an aggregated dataset with each of the above columns

mozetl.addon_aggregates.addon_aggregates.get_dest(output_bucket, output_prefix, output_version, date=None, sample_id=None)[source]

Stitches together an s3 destination.

Parameters
  • output_bucket – s3 output_bucket

  • output_prefix – s3 output_prefix (within output_bucket)

  • output_version – dataset output_version

:retrn str -> s3://output_bucket/output_prefix/output_version/submissin_date_s3=[date]/sample_id=[sid]

mozetl.addon_aggregates.addon_aggregates.load_main_summary(spark, input_bucket, input_prefix, input_version)[source]

Loads main_summary from the bucket constructed from input_bucket, input_prefix, input_version

Parameters
  • spark – SparkSession object

  • input_bucket – s3 bucket (telemetry-parquet)

  • input_prefix – s3 prefix (main_summary)

  • input_version – dataset version (v4)

:return SparkDF

mozetl.addon_aggregates.addon_aggregates.ms_explode_addons(ms)[source]

Explodes the active_addons object in the ms DataFrame and selects relevant fields

Parameters

ms – a subset of main_summary

:return SparkDF

Module contents