Worker

The worker is implemented as a Django management command, and runs out of the same Docker contains as the webserver. The instance 1 is separate from the webhead instances, though.

There are two functionalities the worker fulfills.

  1. Update local clones in the shared storage and reflect it in the database.

  2. Retrieve data from Taskcluster about finished compare-locales results and reflect it in the database.

Running

The worker is run via

./manage.py consume-pulse

The required configuration setting is PULSE_USER, which is also used to create the queue names on pulse, f"queue/{settings.PULSE_USER}/*", and PULSE_PASSWORD.

hg.m.o

Elmo also consumes hg push messages from Pulse. The worker updates the local clone and the repository metadata. This is done in parallel to the builds triggered in Taskcluster by the pushes.

The generated data is modeled by the life.models package. The models are Changeset and File, as well as Repository and Push.

The Pulse exchange we use is exchange/hgpushes/v2, which sends three types, changegroup.1 for new pushes to existing repositories, and newrepo.1 for new repositories. obsolete.1 is the third, which elmo ignores at the moment. The data models don’t have support for flags or obsolescense.

The newrepo.1 notification is of interest to elmo when new repositories are added to l10n-central. That’s modeled as a Forest. New repositories can be created with existing content but no pushes. Thus the worker clones the upstream repo and processes all heads.

The changegroup.1 notification is of interest for repositories known to elmo, modeled by Repository. The data is processed into Push objects.

Note

Ensure to handle race conditions between push messages and build results.

Taskcluster

To get results from Taskcluster, the worker will consume the exchange/taskcluster-queue/v1/task-completed exchange with the routing key chose for elmo, for example project.l10n.elmo.v1.

It will acknowledge the message once the meta data for all comparisons has been created and are entered in the database.

Note

The Taskcluster artifacts API returns a redirect. Proposal: do not try to resolve the redirect immediately, but possibly do so on the first request through the webserver.

To be able to hook up the revisions, the worker needs to (softly) rely on updating the clones in the jobs below.

With ‘softly’ we mean that we create lightweight placeholder objects for revisions, with just the hash. The metadata and parents etc would be set up independently.

Github

Similar infrastructure exists for Github.

Note

The details here will be filled out once we track projects on Github.

1

It needs investigation if it’s safe to run the worker in parallel.