GCP Ingestion
GCP Ingestion is a monorepo for documentation and implementation of the Mozilla telemetry ingestion system deployed to Google Cloud Platform (GCP).
The components are:
- ingestion-edge: a simple Python service for accepting HTTP messages and delivering to Google Cloud Pub/Sub (deployment docs 🔒)
- ingestion-beam: a Java module defining Apache Beam jobs for streaming and batch transformations of ingested messages (deployment docs 🔒)
- ingestion-sink: a Java application that runs in Kubernetes, reading input from Google Cloud Pub/Sub and emitting records to outputs like GCS or BigQuery (deployment docs 🔒)
- ingestion-core: shared Java code used by both
ingestion-beamandingestion-sink; not a separately deployed service
The design behind the system along with various trade offs are documented in the architecture section.
This project requires Java 11.
To manage multiple local JDKs, consider jenv and the
jenv enable-plugin maven command.
Also consider reading through
Apache Beam's wiki article on IntelliJ IDEA setup
for some ideas on configuring an IDE environment.
ingestion-core is not consumed as a published jar; the root pom.xml co-compiles
its sources into ingestion-beam and ingestion-sink via build-helper's
add-source goal. IDEs that don't recognize this report phantom "cannot resolve
symbol" errors - enable the opt-in IDE profile (in both module poms) to fix them.
Feel free to ask us in #data-help on Slack or #telemetry on chat.mozilla.org
if you have specific questions.