End-to-end Speech Processing toolkit
ESPnet is the state-of-the-art toolkit that covers end-to-end speech recognition, text-to-speech, speech translation, speech enhancement, speaker diarization, spoken language understanding, and much more!
Get started with ESPnet!
Running inference on existing ESPnet models
pip install espnet espnet-model-zoo
and start using it immediately.
Fine-tuning ESPnet models
pip install espnet
and use the espnetez
module for fine-tuning.
High-performance training and full experiment replication
Complete the full installation and use the existing recipes.
Comprehensive Task Coverage
We offer complete recipes for a wide range of speech processing tasks.
Tutorials
ESPnet2
Leveraging ESPnet2 recipes for full replication
ESPnet1
Documents on ESPnet1 recipes (Legacy)
Training configurations
Understanding and updating training configurations
Recipe tips
Various tips for using run.sh
in ESPnet recipes
Audio formatting
Formatting audio files into wav.scp
for ESPnet recipes
Task class and data input system
Common task/data interface for ESPnet2
Docker
Running ESPnet with Docker
Job scheduling system
Distributing jobs in a multi-machine environment
Distributed training
Handling multiple GPUs for training
Document Generation
Details on fixing the ESPnet documentation
How to cite ESPnet
@inproceedings{watanabe18_interspeech,
title = {ESPnet: End-to-End Speech Processing Toolkit},
author = {Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson {Enrique Yalta Soplin} and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
year = {2018},
booktitle = {Proc. Interspeech},
pages = {2207--2211},
doi = {10.21437/Interspeech.2018-1456},
issn = {2958-1796},
}
To cite individual modules, models, or recipes, please refer to Additional Citations.