GCP Ingestion
GCP Ingestion is a monorepo for documentation and implementation of the Mozilla telemetry ingestion system deployed to Google Cloud Platform (GCP).
The components are:
- ingestion-edge: a simple Python service for accepting HTTP messages and delivering to Google Cloud Pub/Sub (deployment docs 🔒)
- ingestion-beam: a Java module defining Apache Beam jobs for streaming and batch transformations of ingested messages (deployment docs 🔒)
- ingestion-sink: a Java application that runs in Kubernetes, reading input from Google Cloud Pub/Sub and emitting records to outputs like GCS or BigQuery (deployment docs 🔒)
The design behind the system along with various trade offs are documented in the architecture section.
This project requires Java 11.
To manage multiple local JDKs, consider jenv and the
jenv enable-plugin maven
command.
Also consider reading through
Apache Beam's wiki article on IntelliJ IDEA setup
for some ideas on configuring an IDE environment.
Feel free to ask us in #data-help
on Slack or #telemetry
on chat.mozilla.org
if you have specific questions.