mozetl package

Subpackages

Submodules

mozetl.cli module

mozetl.constants module

mozetl.main module

mozetl.main.etl_job(sc, sqlContext)[source]

This is the function that will be executed on the cluster

mozetl.main.get_data(sc)[source]
mozetl.main.ping_to_row(ping)[source]
mozetl.main.transform_pings(pings)[source]

Take a dataframe of main pings and summarize OS share

mozetl.schemas module

mozetl.utils module

mozetl.utils.delete_from_s3(bucket_name, keys_to_delete)[source]
mozetl.utils.extract_submission_window_for_activity_day(frame, date, lag_days)[source]

Extract rows with an activity_date of date minus lag_days and a submission_date between activity_date and date (inclusive).

:date ‘Y-m-d’ of the end of the target period :lag_days number of days after date in the target period :frame DataFrame homologous with main_summary

Note that the start_date will be lag-days days before date. In other words, if you pass in 2017-01-20 and set lag-days to 5, the aggregation will be processed for day 2017-01-15 (the resulting data will cover submission dates including the activity day itself plus 5 days of lag for a total of 6 days).

mozetl.utils.format_as_submission_date(date)[source]
mozetl.utils.format_spark_path(bucket, prefix)[source]
mozetl.utils.generate_filter_parameters(end_date, days_back)[source]
mozetl.utils.parse_as_submission_date(date_string)[source]
mozetl.utils.send_ses('me@example.com, 'greetings', "Hi!", 'you@example.com)[source]

```

Raises a RuntimeError if the message did not send correctly.

mozetl.utils.stop_session_safely(spark_session)[source]

Safely stops Spark session. This is no-op if running on Databricks - we shouldn’t stop session there since it’s managed by the platform, doing so fails the job.

mozetl.utils.upload_file_to_s3(client, filepath, bucket, key, ACL='bucket-owner-full-control')[source]
mozetl.utils.write_csv(dataframe, path, header=True)[source]

Write a dataframe to local disk.

Disclaimer: Do not write csv files larger than driver memory. This is ~15GB for ec2 c3.xlarge (due to caching overhead).

mozetl.utils.write_csv_to_s3(dataframe, bucket, key, header=True)[source]

Module contents