Orchestrators
An orchestrator is responsible for workflow management and parallelization.
Supported orchestrators:
- Taskcluster - Mozilla task execution framework. It is also used for Firefox CI. It provides access to the hybrid cloud workers (GCP + on-prem) with increased scalability and observability. Usage instructions.
- Snakemake - a file based orchestrator that can be used to run the pipeline locally or on a Slurm cluster. Usage instructions.
- Metaflow - Outerbounds Metaflow is used for experimental workflows that require fast iteration or access to high-end GPUs. We use it to generate synthetic finetuning data with LLMs (see LLM generated data).
Mozilla has switched to Taskcluster for model training, and the Snakemake pipeline is not maintained. Feel free to contribute if you find bugs.