skip to main content
10.3115/1220835.1220837dlproceedingsArticle/Chapter ViewAbstractPublication PageshltConference Proceedingsconference-collections
Article
Free access

Do we need phrases?: challenging the conventional wisdom in statistical machine translation

Published: 04 June 2006 Publication History

Abstract

We begin by exploring theoretical and practical issues with phrasal SMT, several of which are addressed by syntax-based SMT. Next, to address problems not handled by syntax, we propose the concept of a Minimal Translation Unit (MTU) and develop MTU sequence models. Finally we incorporate these models into a syntax-based SMT system and demonstrate that it improves on the state of the art translation quality within a theoretically more desirable framework.

References

[1]
Banchs, Rafael, Joseph Crego, Adrià de Gispert, Patrik Lambert, and Jose Mariño. 2005. Statistical machine translation of Euparl data by using bilingual n-grams. In Proceedings of ACL Workshop on Building and Using Parallel Texts.
[2]
Brown, Peter, Vincent Della Pietra, Stephen Della Pietra, and Robert Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics 19(2): 263--311.
[3]
Callison-Burch, Chris, Colin Bannard, and Josh Schroeder. 2005. Scaling phrase-based machine translation to larger corpora and longer phrases. In Proceedings of ACL.
[4]
Chiang, David. 2005. A hierarchical phrase-based model for statistical machine translation. In Proceedings of ACL.
[5]
Heidorn, George. 2000. "Intelligent writing assistance". In Dale et al. Handbook of Natural Language Processing, Marcel Dekker.
[6]
Koehn, Philipp, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase based translation. In Proceedings of NAACL.
[7]
Koehn, Philipp. 2004. Pharaoh: A beam search decoder for phrase-based statistical machine translation models. In Proceedings of AMTA.
[8]
Och, Franz Josef and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1): 19--51.
[9]
Och, Franz Josef and Hermann Ney. 2004. The Alignment Template approach to statistical machine translation, Computational Linguistics, 30(4):417--450.
[10]
Och, Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of ACL.
[11]
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of ACL.
[12]
Quirk, Chris and Arul Menezes. 2005. Dependency tree translation: syntactically-informed phrasal SMT. In Proceedings of ACL.
[13]
Stolcke, Andreas. 1998. Entropy-based pruning of backoff language models. In Proceedings of DARPA Broadcast News Transcription and Understanding.
[14]
Vogel, Stephan, Ying Zhang, Fei Huang, Alicia Tribble, Ashish Venugopal, Bing Zhao, Alex Waibel. 2003. The CMU statistical machine translation system. In Proceedings of MT Summit.
[15]
Zens, Richard, and Hermann Ney. 2003. A comparative study on reordering constraints in statistical machine translation. In Proceedings of ACL.
[16]
Zhang, Ying and Stephan Vogel. 2005. An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora. In Proceedings of EAMT.

Cited By

View all
  • (2014)Training Phrase-Based SMT without Explicit Word AlignmentProceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 840410.1007/978-3-642-54903-8_20(233-241)Online publication date: 6-Apr-2014
  • (2012)Forced derivation tree based model training to statistical machine translationProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391002(445-454)Online publication date: 12-Jul-2012
  • (2011)Rule Markov models for fast tree-to-string translationProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002580(856-864)Online publication date: 19-Jun-2011
  • Show More Cited By
  1. Do we need phrases?: challenging the conventional wisdom in statistical machine translation

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
      June 2006
      522 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 04 June 2006

      Qualifiers

      • Article

      Acceptance Rates

      HLT-NAACL '06 Paper Acceptance Rate 62 of 257 submissions, 24%;
      Overall Acceptance Rate 240 of 768 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)57
      • Downloads (Last 6 weeks)11
      Reflects downloads up to 24 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2014)Training Phrase-Based SMT without Explicit Word AlignmentProceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 840410.1007/978-3-642-54903-8_20(233-241)Online publication date: 6-Apr-2014
      • (2012)Forced derivation tree based model training to statistical machine translationProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2391002(445-454)Online publication date: 12-Jul-2012
      • (2011)Rule Markov models for fast tree-to-string translationProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002580(856-864)Online publication date: 19-Jun-2011
      • (2009)A study of translation rule classification for syntax-based statistical machine translationProceedings of the Third Workshop on Syntax and Structure in Statistical Translation10.5555/1626344.1626350(45-50)Online publication date: 5-Jun-2009
      • (2009)Word lattices for multi-source translationProceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics10.5555/1609067.1609147(719-727)Online publication date: 30-Mar-2009
      • (2009)Two-Stage Hypotheses Generation for Spoken Language TranslationACM Transactions on Asian Language Information Processing (TALIP)10.1145/1482343.14823478:1(1-22)Online publication date: 1-Mar-2009
      • (2008)Tera-scale translation models via pattern matchingProceedings of the 22nd International Conference on Computational Linguistics - Volume 110.5555/1599081.1599145(505-512)Online publication date: 18-Aug-2008
      • (2006)Microsoft research treelet translation systemProceedings of the Workshop on Statistical Machine Translation10.5555/1654650.1654676(158-161)Online publication date: 8-Jun-2006

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media