Home Previous - Acoustic Model and Language Model Next - Training your model

Setting up your environment for training using DeepSpeech


This section of the Playbook assumes you are comfortable installing DeepSpeech and using it with a pre-trained model, and that you are comfortable setting up a Python virtual environment.

Here, we provide information on setting up a Docker environment for training your own speech recognition model using DeepSpeech. We also cover dependencies Docker has for NVIDIA GPUs, so that you can use your GPU(s) for training a model.

** Do not train using only CPU(s) **

This Playbook assumes that you will be using NVIDIA GPU(s). Training a DeepSpeech speech recognition model on CPU(s) only will take a very, very, very long time. Do not train on your CPU(s).

Installing dependencies for working with GPUs under Docker

Before we install Docker, we are going to make sure that we have all the Ubuntu Linux dependencies required for working with NVIDIA GPUs and Docker.


Although non-NVIDIA GPUs exist, they are currently rare, and we do not aim to support them in this Playbook.

GPU drivers

By default, your machine should already have GPU drivers installed. A good way to check is with the nvidia-smi tool. If your drivers are installed correctly, nvidia-smi will report the driver version and CUDA version.

$ nvidia-smi

Sat Jan  9 11:48:50 2021       
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  GeForce GTX 1060    Off  | 00000000:01:00.0  On |                  N/A |
| N/A   70C    P0    27W /  N/A |    766MiB /  6069MiB |      2%      Default |
|                               |                      |                  N/A |

| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================| |                                                                             |

If your drivers are not installed correctly, you will likely see this warning:

$ nvidia-smi

Command 'nvidia-smi' not found, but can be installed with:

sudo apt install nvidia-utils-440         # version 440.100-0ubuntu0.20.04.1, or
sudo apt install nvidia-340               # version 340.108-0ubuntu2
sudo apt install nvidia-utils-435         # version 435.21-0ubuntu7
sudo apt install nvidia-utils-390         # version 390.141-0ubuntu0.20.04.1
sudo apt install nvidia-utils-450         # version 450.102.04-0ubuntu0.20.04.1
sudo apt install nvidia-utils-450-server  # version 450.80.02-0ubuntu0.20.04.3
sudo apt install nvidia-utils-460         # version 460.32.03-0ubuntu0.20.04.1
sudo apt install nvidia-utils-418-server  # version 418.152.00-0ubuntu0.20.04.1
sudo apt install nvidia-utils-440-server  # version 440.95.01-0ubuntu0.20.04.1

Follow this guide to install your GPU drivers.

Once you’ve installed your drivers, use nvidia-smi to prove that they are installed correctly.

Note that you may need to restart your host after installing the GPU drivers.

Ideally, you should not be running any other processes on your GPU(s) before you start training.

Next, we will install the utility nvtop so that you can monitor the performance of your GPU(s). We will also use nvtop to prove that Docker is able to use your GPU(s) later in this document.

$ sudo apt install nvtop

Note that you may need to restart your host after installing nvtop.

If you run nvtop you will see a graph similar to this:

Screenshot of nvtop

You are now ready to install Docker.

Docker is virtualization software that allows a consistent collection of software, dependencies and environments to be packaged into a container which is then run on a host, or many hosts. It is one way to manage the many software dependencies which are required for training a model with DeepSpeech, particularly if using an NVIDIA GPU.

Install Docker

First, you must install Docker on your host. Follow the instructions on the Docker website.

Ensure that you create a docker group and that you add yourself to this group

Once you have installed Docker, be sure to follow the post-installation steps. These include setting up a docker group and adding your user account to this group. If you do not follow this step, you will need to use sudo with every Docker command, and this can have unexpected results.

If you try to use docker commands and constantly receive permission warnings, it’s likely that you have forgotten this step.

Install the nvidia-container-toolkit

Next, we need to install nvidia-container-toolkit. This is necessary to allow Docker to be able to access the GPU(s) on your machine for training.

First, add the repository for your distribution, following the instructions on the NVIDIA Docker GitHub page. For example:

curl -s -L | \
  sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update

Next, install nvidia-container-toolkit:

$ sudo apt-get install -y nvidia-container-toolkit

Pulling down a pre-built DeepSpeech Docker image

Once you have installed Docker and the nvidia-container-toolkit, you are ready to build a Docker image. Although it’s possible to build your own Docker image from scratch, we’re going to use a pre-built DeepSpeech training image which is hosted on Docker Hub. Once the image is pulled down, you can then create a Docker container from the image to perform training.

As you become more proficient with using DeepSpeech, you can use the pre-built Docker image as the basis for your own images.

Running this command will download several gigabytes of data. Do not perform this command if you are on a limited or metered internet connection

$ docker pull mozilla/deepspeech-train:v0.9.3
v0.9.3: Pulling from mozilla/deepspeech-train
f08d8e2a3ba1: Already exists
3baa9cb2483b: Already exists
94e5ff4c0b15: Already exists
1860925334f9: Already exists
05cc64cc481f: Already exists
b11f037be8e8: Already exists
24379c211bf5: Already exists
53981215c263: Already exists
c0ceb2f35c41: Already exists
561bb56e4cdc: Already exists
234039146e10: Already exists
337aa0f03969: Already exists
dcdea3197954: Already exists
c8801cd7156e: Already exists
5d257706d831: Already exists
82712d12e970: Already exists
906db168d174: Pull complete
c10704302cc5: Pull complete
e2e47c9348fd: Pull complete
abe58eddcb1d: Pull complete
39d09406a9b6: Pull complete
e7951492be3a: Pull complete
9fa61ebf6d03: Pull complete
e0ddd6e8433a: Pull complete
d1063ba3c721: Pull complete
08b710a537f3: Pull complete
e2d3cd841a00: Pull complete
c053d59fe6ba: Pull complete
03f8cdcdf89b: Pull complete
Digest: sha256:9af7b131e1114aed685917a1faf61198e36abdf33c8ef3ae960ca375a3e0643f
Status: Downloaded newer image for mozilla/deepspeech-train:v0.9.3

If you do not which to use the v0.9.3 DeepSpeech image, a list of previous images is available.

You will now see the mozilla/deepspeech-train image when you run the command docker image ls:

$ docker image ls
REPOSITORY                             TAG              IMAGE ID       CREATED         SIZE
mozilla/deepspeech-train               v0.9.3           7cdc0bb1fe2a   7 days ago      4.77GB

Testing the image by creating a container and running a script

Now that you have your Docker image pulled down, you can create a container from the image. Here, we’re going to create a container and run a simple test to make sure that the image is working correctly.

Note that you can refer to Docker images by id - such as 7cdc0bb1fe2a in the example above, or by the image’s name and tag. Here, we will be using the image name and tag - ie mozilla/deepspeech-train:v0.9.3.

$ docker run  -it --name ds-test --entrypoint /bin/bash mozilla/deepspeech-train:v0.9.3

The entrypoint instruction following docker run tells Docker to run the /bin/bash (ie shell) after creating the container.

This command assumes that /bin/bash will be invoked as the root user. This is necessary, as the Docker container needs to make changes to the filesystem. If you use the -u $(id -u):$(id -g) switches, you will tell Docker to invoke /bin/bash as the current user of the host that is running the Docker container. You will likely encounter permission denied errors while running training.

When you run the above command, you should see the following prompt:

________                               _______________                
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ /
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/

WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...


In a separate terminal, you can see that you now have a Docker image running:

$ docker ps
CONTAINER ID   IMAGE          COMMAND       CREATED              STATUS              PORTS     NAMES
d14b2d062526   7cdc0bb1fe2a   "/bin/bash"   About a minute ago   Up About a minute             compassionate_rhodes

DeepSpeech includes a number of convenience scripts in the bin directory. They are named for the corpus they are configured for. To ensure that your Docker environment is functioning correctly, run one of these scripts (in the terminal session where your container is running).

root@d14b2d062526:/DeepSpeech/bin# ./bin/

This will train on a single audio file for 200 epochs.

We’ve now proved that the image is working correctly.

Setting up a bind mount to store persistent data

Now that we have a Docker image pulled down, we can create a container from the image, and do training from within the image.

However, Docker containers are not persistent. This means that if the host on which the container is running reboots, or there is a fatal error within the container, all the results stored within the container will be lost. We need to set up persistent storage so that the checkpoints and exported model are stored outside the container.

To do this we create a bind mount for Docker. A bind mount allows Docker to store files externally to the container, on your local filesystem.

First, stop and remove the container we created above.

$ docker rm -f ds-test

Next, we will create a new container, except this time we will also create a bind mount so that it can store persistent data.

First, we create a directory on our local system for the bind mount.

$ mkdir deepspeech-data

Next, we create a container and instruct it to use a bind mount to the directory.

$ docker run  -it \
  --entrypoint /bin/bash \
  --name deepspeech-training \
  --gpus all \
  --mount type=bind,source="$(pwd)"/deepspeech-data,target=/DeepSpeech/deepspeech-data \

We all pass the --gpus all parameter here to instruct Docker to use all available GPUs. If you need to restrict the use of GPUs, then please consult the Docker documentation. You can also restrict the amount of memory or CPU(s) that the Docker container consumes. This might be useful if you need to use the host that you’re training on at the same time as the training is occurring, or if you’re on a shared host or cluster (for example at a university).

From within the container, the deepspeech-data directory will now be available:

root@e964b1e5a60c:/DeepSpeech# ls | grep deepspeech-data

You are now ready to begin training your model.

Extending the base deepspeech-training Docker image for your needs

As you become more comfortable training speech recognition models with DeepSpeech, you may wish to extend the base Docker image. You can do this using the FROM instruction in a Dockerfile, for example:

# Custom Dockerfile for training models using DeepSpeech

# Get the latest DeepSpeech image
FROM mozilla/deepspeech-train:v0.9.3

# Install nano editor
RUN apt-get -y update && apt-get install -y nano

# Install sox for inference and for processing Common Voice data
RUN apt-get -y update && apt-get install -y sox

You can then use docker build with this Dockerfile to build your own custom Docker image.

Home Previous - Acoustic Model and Language Model Next - Training your model