Skip to content

Republisher Job

A job for republishing subsets of decoded messages to new destinations. Defined in the com.mozilla.telemetry.Republisher class (source).

The primary intention is to produce smaller derived Pub/Sub topics so that consumers that only need a specific subset of messages don't incur the cost of reading the entire stream of decoded messages.

Capabilities

Debug Republishing

If --enableDebugDestination is set, messages containing an x_debug_id attribute will be republished to a destination that's configurable at runtime. This is currently expected to be a feature specific to structured ingestion, so should not be set for telemetry-decoded input.

Per-docType Republishing

If --perDocTypeEnabledList is provided, a separate producer will be created for each docType specified in the given comma-separated list. See the --help output for details on format.

Per-Channel Sampled Republishing

If --perChannelSampleRatios is provided, a separate producer will be created for each specified release channel. The messages will be randomly sampled according to the ratios provided per channel. This is currently intended as a feature only for telemetry data, so should not be set for structured-decoded input. See the --help output for details on format.

Executing

Republisher jobs are executed the same way as sink jobs but with a few differences in flags. You'll need to set the mainClass:

  • -Dexec.mainClass=com.mozilla.telemetry.Republisher
  • For Dataflow Flex Templates, change the docker-compose build argument to --build-arg FLEX_TEMPLATE_JAVA_MAIN_CLASS=com.mozilla.telemetry.Republisher

The --outputType flag is still required as in the sink, but the --output configuration is ignored for the Republisher. Instead, there is a separate destination configuration flag for each of the three republishing types. For each type, there is an compile-time option that affects what publishers are generated in the graph for the Dataflow job along with a runtime option that determines the specific location (usually a topic name) for each publisher.

To enable debug republishing:

  • --enableDebugDestination (compile-time)
  • --debugDestination=/some/pubsub/topic/path

To enable per-docType republishing:

  • --perDocTypeDestination='{"/some/pubsub/topic/path/per-doctype-name":["activity-stream/impression-stats"]}' (compile-time)

To enable per-channel sampled republishing:

  • --perChannelSampleRatios='{"nightly":1.0,"beta":0.1,"release":0.01}' (compile-time)
  • --perChannelDestination=/some/pubsub/topic/path/per-channel-${channel} (compile-time)

Example:

# create a test input file
mkdir -p tmp/
echo '{"payload":"dGVzdA==","attributeMap":{"x_debug_id":"mysession"}}' > tmp/input.json

# Republish only messages with x_debug_id attribute to stdout.
./bin/mvn compile exec:java -Dexec.mainClass=com.mozilla.telemetry.Republisher -Dexec.args="\
    --inputType=file \
    --input=tmp/input.json \
    --outputType=stdout \
    --errorOutputType=stderr \
    --enableDebugDestination
"

# check the RepublisherOptions help page for options specific to Republisher
./bin/mvn compile exec:java -Dexec.args=--help=RepublisherOptions
"