Rally Decoder Job
The Rally decoder job is a variant of the decoder job
defined in the com.mozilla.telemetry.decoder.rally
package (source).
The decoder supports the Rally data donation and sharing platform.
More information can be found on the mana page.
See bug 1628539 for
initial implementation of the pioneer-v2
decoder and bug
1697342 for
implementation of the Glean.js encrypted pings.
Overview
The Rally decoder includes three new options: --pioneerEnabled
,
--pioneerMetadataLocation
, and --pioneerKmsEnabled
. And example of running
the job is as follows, which is captured in the bin/run-pioneer-benchmark
script.
mvn compile exec:java -Dexec.mainClass=com.mozilla.telemetry.Decoder -Dexec.args="\
--runner=Dataflow \
--profilingAgentConfiguration='{\"APICurated\": true}'
--project=$project \
--autoscalingAlgorithm=NONE \
--workerMachineType=n1-standard-1 \
--gcpTempLocation=$bucket/tmp \
--numWorkers=2 \
--region=us-central1 \
--pioneerEnabled=true \
--pioneerMetadataLocation=$bucket/$prefix/metadata/metadata.json \
--pioneerKmsEnabled=false \
--pioneerDecompressPayload=false \
--geoIspDatabase=$bucket/$prefix/metadata/GeoIP2-ISP.mmdb \
--geoCityDatabase=$bucket/$prefix/metadata/GeoLite2-City.mmdb \
--geoCityFilter=$bucket/$prefix/metadata/cities15000.txt \
--schemasLocation=$bucket/$prefix/metadata/schemas.tar.gz \
--inputType=file \
--input=$bucket/$prefix/input/ciphertext/'part-*' \
--outputType=file \
--output=$bucket/$prefix/output/ciphertext/ \
--errorOutputType=file \
--errorOutput=$bucket/$prefix/error/ciphertext/ \
"
The --pioneerEnabled
flag enables the transform in the decoder pipeline, which
comes before schema validation and payload processing. It uses the document
specified by the --pioneerMetadataLocation
to locate information for the
KeyStore
. The metadata location takes on the following form validated by this
schema:
[
{
"private_key_id": "rally-study-foo",
"private_key_uri": "src/test/resources/jwe/rally-study-foo.private.json",
"kms_resource_id": "projects/DUMMY_PROJECT_ID/locations/global/keyRings/test-ingestion-beam-integration/cryptoKeys/study-foo"
},
....
]
The decoder reads the JSON Web Key into
memory from the location in the private_key_uri
. It can be encrypted using
Cloud Key Management Service by
specifying kms_resource_id
and enabling the --pioneerKmsEnabled
flag.
The decoder decrypts pings that follow conventions for Rally or Pioneer pings. All encryption and decryption takes place using JSON Web Encryption (JWE). An envelope is a piece of metadata that surrounds the encrypted data. The Rally envelope is an object with a payload field containing a JWE compact object. After decrypting the payload, the ping takes the form of a Glean ping. The document namespace (as per the HTTP Edge Server Specification) is used to fetch the key from memory.
The Pioneer ping's envelope uses the legacy mechanism of sending data through
the Telemetry pipeline as a telemetry.pioneer-study.4
ping. In addition, the
envelope explicitly specifies the routing information for the ping. Finally, the
decoder constructs a PubSub message that includes the routing information and
decrypted message.
Notable design decisions
Data SRE allocates each Rally study a JWK pair. The client must encode messages with the key to reach the analysis environment. The client may not always have the key, so there are exceptions for the enrollment and deletion pings. The decoder will ignore the payload and extract the pioneer id for these document types.