2024
pdf
bib
abs
Can Synthetic Speech Improve End-to-End Conversational Speech Translation?
Bismarck Bamfo Odoom
|
Nathaniel Robinson
|
Elijah Rippeth
|
Luis Tavarez-Arce
|
Kenton Murray
|
Matthew Wiesner
|
Paul McNamee
|
Philipp Koehn
|
Kevin Duh
Proceedings of the 16th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track)
Conversational speech translation is an important technology that fosters communication among people of different language backgrounds. Three-way parallel data in the form of source speech, source transcript, and target translation is usually required to train end-to-end systems. However, such datasets are not readily available and are expensive to create as this involves multiple annotation stages. In this paper, we investigate the use of synthetic data from generative models, namely machine translation and text-to-speech synthesis, for training conversational speech translation systems. We show that adding synthetic data to the training recipe increasingly improves end-to-end training performance, especially when limited real data is available. However, when no real data is available, no amount of synthetic data helps.
pdf
bib
abs
Speech Data from Radio Broadcasts for Low Resource Languages
Bismarck Bamfo Odoom
|
Leibny Paola Garcia Perera
|
Prangthip Hansanti
|
Loic Barrault
|
Christophe Ropers
|
Matthew Wiesner
|
Kenton Murray
|
Alexandre Mourachko
|
Philipp Koehn
Proceedings of the 21st International Conference on Spoken Language Translation (IWSLT 2024)
We created a collection of speech data for 48 low resource languages. The corpus is extracted from radio broadcasts and processed with novel speech detection and language identification models based on a manually vetted subset of the audio for 10 languages. The data is made publicly available.
2023
pdf
bib
abs
JHU IWSLT 2023 Multilingual Speech Translation System Description
Henry Li Xinyuan
|
Neha Verma
|
Bismarck Bamfo Odoom
|
Ujvala Pradeep
|
Matthew Wiesner
|
Sanjeev Khudanpur
Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023)
We describe the Johns Hopkins ACL 60-60 Speech Translation systems submitted to the IWSLT 2023 Multilingual track, where we were tasked to translate ACL presentations from English into 10 languages. We developed cascaded speech translation systems for both the constrained and unconstrained subtracks. Our systems make use of pre-trained models as well as domain-specific corpora for this highly technical evaluation-only task. We find that the specific technical domain which ACL presentations fall into presents a unique challenge for both ASR and MT, and we present an error analysis and an ACL-specific corpus we produced to enable further work in this area.