End-to-end Speech Processing toolkit

ESPnet is the state-of-the-art toolkit that covers end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, speaker diarization, spoken language understanding, and much more!

Get started with ESPnet!

Running inference on existing ESPnet models

pip install espnet espnet-model-zoo and start using it immediately.

Fine-tuning ESPnet models

pip install espnet and use the espnetez module for fine-tuning.

High-performance training and full experiment replication

Complete the full installation and use the existing recipes.

Comprehensive Task Coverage

We offer complete recipes for a wide range of speech processing tasks.

ASR: Automatic Speech Recognition

TTS: Text-to-speech

Speech Enhancement

Weakly-supervised Learning

Speaker Embedding

Speech-to-Text Translation

Singing Voice Synthesis

Discrete Unit ASR

Speech Codec

ASR with Speech Enhancement

... And much more!

Tutorials

ESPnet2

Leveraging ESPnet2 recipes for full replication

ESPnet1

Documents on ESPnet1 recipes (Legacy)

Training configurations

Understanding and updating training configurations

Recipe tips

Various tips for using run.sh in ESPnet recipes

Audio formatting

Formatting audio files into wav.scp for ESPnet recipes

Task class and data input system

Common task/data interface for ESPnet2

Docker

Running ESPnet with Docker

Job scheduling system

Distributing jobs in a multi-machine environment

Distributed training

Handling multiple GPUs for training

Document Generation

Details on fixing the ESPnet documentation

How to cite ESPnet

@inproceedings{watanabe18_interspeech,
  title     = {ESPnet: End-to-End Speech Processing Toolkit},
  author    = {Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
  year      = {2018},
  booktitle = {Proc. Interspeech},
  pages     = {2207--2211},
  doi       = {10.21437/Interspeech.2018-1456},
  issn      = {2958-1796},
}

To cite individual modules, models, or recipes, please refer to Additional Citations.