DeepSpeech Playbook

A crash course on training speech recognition models using DeepSpeech.


Start here. This section will set your expectations for what you can achieve with the DeepSpeech Playbook, and the prerequisites you’ll need to start to train your own speech recognition models.

About DeepSpeech

Once you know what you can achieve with the DeepSpeech Playbook, this section provides an overview of DeepSpeech itself, its component parts, and how it differs from other speech recognition engines you may have used in the past.

Formatting your training data

Before you can train a model, you will need to collect and format your corpus of data. This section provides an overview of the data format required for DeepSpeech, and walks through an example in prepping a dataset from Common Voice.

The alphabet.txt file

If you are training a model that uses a different alphabet to English, for example a language with diacritical marks, then you will need to modify the alphabet.txt file.

Building your own scorer

Learn what the scorer does, and how you can go about building your own.

Acoustic model and language model

Learn about the differences between DeepSpeech’s acoustic model and language model and how they combine to provide end to end speech recognition.

Setting up your training environment

This section walks you through building a Docker image, and spawning DeepSpeech in a Docker container with persistent storage. This approach avoids the complexities of dependencies such as tensorflow.

Training a model

Once you have your training data formatted, and your training environment established, this section will show you how to train a model, and provide guidance for overcoming common pitfalls.

Testing a model

Once you’ve trained a model, you will need to validate that it works for the context it’s been designed for. This section walks you through this process.

Deploying your model

Once trained and tested, your model is deployed. This section provides an overview of how you can deploy your model.

Applying DeepSpeech to real world problems

This section covers specific use cases where DeepSpeech can be applied to real world problems, such as transcription, keyword searching and voice controlled applications.

Setting up Continuous Integration

Learn how to set up Continuous Integration (CI) for your own fork of DeepSpeech. Intended for developers who are utilising DeepSpeech for their own specific use cases.

Introductory courses on machine learning

Providing an introduction to machine learning is beyond the scope of this PlayBook, howevever having an understanding of machine learning and deep learning concepts will aid your efforts in training speech recognition models with DeepSpeech.

Here, we’ve linked to several resources that you may find helpful; they’re listed in the order we recommend reading them in.

How you can help provide feedback on the DeepSpeech PlayBook

You can help to make the DeepSpeech PlayBook even better by providing via a GitHub Issue