skip to main content
10.5555/3016100.3016285guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Character-aware neural language models

Published: 12 February 2016 Publication History

Abstract

We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.

References

[1]
Alexandrescu, A., and Kirchhoff, K. 2006. Factored Neural Language Models. In Proceedings of NAACL.
[2]
Ballesteros, M.; Dyer, C.; and Smith, N. A. 2015. Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs. In Proceedings of EMNLP.
[3]
Bengio, Y.; Ducharme, R.; and Vincent, P. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3:1137-1155.
[4]
Bengio, Y.; Simard, P.; and Frasconi, P. 1994. Learning Long-term Dependencies with Gradient Descent is Difficult. IEEE Transactions on Neural Networks 5:157-166.
[5]
Bilmes, J., and Kirchhoff, K. 2003. Factored Language Models and Generalized Parallel Backoff. In Proceedings of NAACL.
[6]
Botha, J., and Blunsom, P. 2014. Compositional Morphology for Word Representations and Language Modelling. In Proceedings of ICML.
[7]
Botha, J. 2014. Probabilistic Modelling of Morphologically Rich Languages. DPhil Dissertation, Oxford University.
[8]
Chen, S., and Goodman, J. 1998. An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report, Harvard University.
[9]
Cheng, W. C.; Kok, S.; Pham, H. V.; Chieu, H. L.; and Chai, K. M. 2014. Language Modeling with Sum-Product Networks. In Proceedings of INTERSPEECH.
[10]
Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of EMNLP.
[11]
Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; and Kuksa, P. 2011. Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research 12:2493-2537.
[12]
Creutz, M., and Lagus, K. 2007. Unsupervised Models for Morpheme Segmentation and Morphology Learning. In Proceedings of the ACM Transations on Speech and Language Processing.
[13]
Deerwester, S.; Dumais, S.; and Harshman, R. 1990. Indexing by Latent Semantic Analysis. Journal of American Society of Information Science 41:391-407.
[14]
dos Santos, C. N., and Guimaraes, V. 2015. Boosting Named Entity Recognition with Neural Character Embeddings. In Proceedings of ACL Named Entities Workshop.
[15]
dos Santos, C. N., and Zadrozny, B. 2014. Learning Character-level Representations for Part-of-Speech Tagging. In Proceedings of ICML.
[16]
Graves, A. 2013. Generating Sequences with Recurrent Neural Networks. arXiv:1308.0850.
[17]
Hinton, G.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2012. Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors. arxiv:1207.0580.
[18]
Hochreiter, S., and Schmidhuber, J. 1997. Long Short-Term Memory. Neural Computation 9:1735-1780.
[19]
Kalchbrenner, N.; Grefenstette, E.; and Blunsom, P. 2014. A Convolutional Neural Network for Modelling Sentences. In Proceedings of ACL.
[20]
Kim, Y. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of EMNLP.
[21]
Krizhevsky, A.; Sutskever, I.; and Hinton, G. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of NIPS.
[22]
LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; and Jackel, L. D. 1989. Handwritten Digit Recognition with a Backpropagation Network. In Proceedings of NIPS.
[23]
Lei, T.; Barzilay, R.; and Jaakola, T. 2015. Molding CNNs for Text: Non-linear, Non-consecutive Convolutions. In Proceedings of EMNLP.
[24]
Ling, W.; Lui, T.; Marujo, L.; Astudillo, R. F.; Amir, S.; Dyer, C.; Black, A. W.; and Trancoso, I. 2015. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation. In Proceedings of EMNLP.
[25]
Luong, M.-T.; Socher, R.; and Manning, C. 2013. Better Word Representations with Recursive Neural Networks for Morphology. In Proceedings of CoNLL.
[26]
Marcus, M.; Santorini, B.; and Marcinkiewicz, M. 1993. Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19:331-330.
[27]
Mikolov, T., and Zweig, G. 2012. Context Dependent Recurrent Neural Network Language Model. In Proceedings of SLT.
[28]
Mikolov, T.; Karafiat, M.; Burget, L.; Cernocky, J.; and Khudanpur, S. 2010. Recurrent Neural Network Based Language Model. In Proceedings of INTERSPEECH.
[29]
Mikolov, T.; Deoras, A.; Kombrink, S.; Burget, L.; and Cernocky, J. 2011. Empirical Evaluation and Combination of Advanced Language Modeling Techniques. In Proceedings of INTERSPEECH.
[30]
Mikolov, T.; Sutskever, I.; Deoras, A.; Le, H.-S.; Kombrink, S.; and Cernocky, J. 2012. Subword Language Modeling with Neural Networks. preprint: www.fit.vutbr.cz/~imikolov/rnnlm/char.pdf.
[31]
Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781.
[32]
Mnih, A., and Hinton, G. 2007. Three New Graphical Models for Statistical Language Modelling. In Proceedings of ICML.
[33]
Morin, F., and Bengio, Y. 2005. Hierarchical Probabilistic Neural Network Language Model. In Proceedings of AISTATS.
[34]
Pascanu, R.; Culcehre, C.; Cho, K.; and Bengio, Y. 2013. How to Construct Deep Neural Networks. arXiv:1312.6026.
[35]
Qui, S.; Cui, Q.; Bian, J.; and Gao, B. 2014. Co-learning of Word Representations and Morpheme Representations. In Proceedings of COLING.
[36]
Shen, Y.; He, X.; Gao, J.; Deng, L.; and Mesnil, G. 2014. A Latent Semantic Model with Convolutional-pooling Structure for Information Retrieval. In Proceedings of CIKM.
[37]
Srivastava, R. K.; Greff, K.; and Schmidhuber, J. 2015. Training Very Deep Networks. arXiv:1507.06228.
[38]
Sundermeyer, M.; Schluter, R.; and Ney, H. 2012. LSTM Neural Networks for Language Modeling.
[39]
Sutskever, I.; Martens, J.; and Hinton, G. 2011. Generating Text with Recurrent Neural Networks.
[40]
Sutskever, I.; Vinyals, O.; and Le, Q. 2014. Sequence to Sequence Learning with Neural Networks.
[41]
Wang, M.; Lu, Z.; Li, H.; Jiang, W.; and Liu, Q. 2015. genCNN: A Convolutional Architecture for Word Sequence Prediction. In Proceedings of ACL.
[42]
Werbos, P. 1990. Back-propagation Through Time: what it does and how to do it. In Proceedings of IEEE.
[43]
Zaremba, W.; Sutskever, I.; and Vinyals, O. 2014. Recurrent Neural Network Regularization. arXiv:1409.2329.
[44]
Zhang, S.; Jiang, H.; Xu, M.; Hou, J.; and Dai, L. 2015. The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models. In Proceedings of ACL.
[45]
Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level Convolutional Networks for Text Classification. In Proceedings of NIPS.

Cited By

View all
  • (2024)CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge ExtractionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644431(438-449)Online publication date: 15-Apr-2024
  • (2023)FedPcf : An Integrated Federated Learning Framework with Multi-Level Prospective Correction FactorProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592298(490-498)Online publication date: 12-Jun-2023
  • (2023)A Survey of Implicit Discourse Relation RecognitionACM Computing Surveys10.1145/357413455:12(1-34)Online publication date: 2-Mar-2023
  • Show More Cited By

Index Terms

  1. Character-aware neural language models
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
      February 2016
      4406 pages

      Sponsors

      • Association for the Advancement of Artificial Intelligence

      Publisher

      AAAI Press

      Publication History

      Published: 12 February 2016

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge ExtractionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644431(438-449)Online publication date: 15-Apr-2024
      • (2023)FedPcf : An Integrated Federated Learning Framework with Multi-Level Prospective Correction FactorProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592298(490-498)Online publication date: 12-Jun-2023
      • (2023)A Survey of Implicit Discourse Relation RecognitionACM Computing Surveys10.1145/357413455:12(1-34)Online publication date: 2-Mar-2023
      • (2023)A robust section identification method for scanned electronic health recordsProceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)10.1145/3570991.3571011(213-217)Online publication date: 4-Jan-2023
      • (2022)Don't pour cereal into coffeeProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601353(14890-14903)Online publication date: 28-Nov-2022
      • (2022)An Improved Math Word Problem (MWP) Model Using Unified Pretrained Language Model (UniLM) for PretrainingComputational Intelligence and Neuroscience10.1155/2022/74682862022Online publication date: 1-Jan-2022
      • (2022)Towards a systematic multi-modal representation learning for network dataProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564108(181-187)Online publication date: 14-Nov-2022
      • (2022)Saga: A Platform for Continuous Construction and Serving of Knowledge at ScaleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526049(2259-2272)Online publication date: 10-Jun-2022
      • (2022)CodeFillProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510172(401-412)Online publication date: 21-May-2022
      • (2022)Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/348704516:4(1-55)Online publication date: 8-Jan-2022
      • Show More Cited By

      View Options

      View options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media