mozetl.landfill package

Submodules

mozetl.landfill.sampler module

Landfill Sampler

Take a stratified sample of documents sent to ingestion from the raw data store used for platform backfill.

Changelog: v1 - Initial schema used for edge-validator integration v2 - Addition of document version as a partition value v3 - Retain whitelisted metadata fields and simplify schema

mozetl.landfill.sampler.extract(sc, submission_date, sample=0.01)[source]
mozetl.landfill.sampler.save(submission_date, bucket, prefix, df)[source]
mozetl.landfill.sampler.transform(landfill, n_documents=1000)[source]

Module contents