Article

Character-aware neural language models

Authors:

Yacine Jernite,

Alexander M. RushAuthors Info & Claims

AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

Pages 2741 - 2749

Published: 12 February 2016 Publication History

Abstract

We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.

References

[1]

Alexandrescu, A., and Kirchhoff, K. 2006. Factored Neural Language Models. In Proceedings of NAACL.

Digital Library

[2]

Ballesteros, M.; Dyer, C.; and Smith, N. A. 2015. Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs. In Proceedings of EMNLP.

[3]

Bengio, Y.; Ducharme, R.; and Vincent, P. 2003. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3:1137-1155.

Digital Library

[4]

Bengio, Y.; Simard, P.; and Frasconi, P. 1994. Learning Long-term Dependencies with Gradient Descent is Difficult. IEEE Transactions on Neural Networks 5:157-166.

Digital Library

[5]

Bilmes, J., and Kirchhoff, K. 2003. Factored Language Models and Generalized Parallel Backoff. In Proceedings of NAACL.

Digital Library

[6]

Botha, J., and Blunsom, P. 2014. Compositional Morphology for Word Representations and Language Modelling. In Proceedings of ICML.

[7]

Botha, J. 2014. Probabilistic Modelling of Morphologically Rich Languages. DPhil Dissertation, Oxford University.

[8]

Chen, S., and Goodman, J. 1998. An Empirical Study of Smoothing Techniques for Language Modeling. Technical Report, Harvard University.

[9]

Cheng, W. C.; Kok, S.; Pham, H. V.; Chieu, H. L.; and Chai, K. M. 2014. Language Modeling with Sum-Product Networks. In Proceedings of INTERSPEECH.

[10]

Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In Proceedings of EMNLP.

[11]

Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; and Kuksa, P. 2011. Natural Language Processing (almost) from Scratch. Journal of Machine Learning Research 12:2493-2537.

Digital Library

[12]

Creutz, M., and Lagus, K. 2007. Unsupervised Models for Morpheme Segmentation and Morphology Learning. In Proceedings of the ACM Transations on Speech and Language Processing.

Digital Library

[13]

Deerwester, S.; Dumais, S.; and Harshman, R. 1990. Indexing by Latent Semantic Analysis. Journal of American Society of Information Science 41:391-407.

[14]

dos Santos, C. N., and Guimaraes, V. 2015. Boosting Named Entity Recognition with Neural Character Embeddings. In Proceedings of ACL Named Entities Workshop.

[15]

dos Santos, C. N., and Zadrozny, B. 2014. Learning Character-level Representations for Part-of-Speech Tagging. In Proceedings of ICML.

[16]

Graves, A. 2013. Generating Sequences with Recurrent Neural Networks. arXiv:1308.0850.

[17]

Hinton, G.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2012. Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors. arxiv:1207.0580.

[18]

Hochreiter, S., and Schmidhuber, J. 1997. Long Short-Term Memory. Neural Computation 9:1735-1780.

Digital Library

[19]

Kalchbrenner, N.; Grefenstette, E.; and Blunsom, P. 2014. A Convolutional Neural Network for Modelling Sentences. In Proceedings of ACL.

[20]

Kim, Y. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of EMNLP.

[21]

Krizhevsky, A.; Sutskever, I.; and Hinton, G. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of NIPS.

[22]

LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; and Jackel, L. D. 1989. Handwritten Digit Recognition with a Backpropagation Network. In Proceedings of NIPS.

Digital Library

[23]

Lei, T.; Barzilay, R.; and Jaakola, T. 2015. Molding CNNs for Text: Non-linear, Non-consecutive Convolutions. In Proceedings of EMNLP.

[24]

Ling, W.; Lui, T.; Marujo, L.; Astudillo, R. F.; Amir, S.; Dyer, C.; Black, A. W.; and Trancoso, I. 2015. Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation. In Proceedings of EMNLP.

[25]

Luong, M.-T.; Socher, R.; and Manning, C. 2013. Better Word Representations with Recursive Neural Networks for Morphology. In Proceedings of CoNLL.

[26]

Marcus, M.; Santorini, B.; and Marcinkiewicz, M. 1993. Building a Large Annotated Corpus of English: the Penn Treebank. Computational Linguistics 19:331-330.

Digital Library

[27]

Mikolov, T., and Zweig, G. 2012. Context Dependent Recurrent Neural Network Language Model. In Proceedings of SLT.

[28]

Mikolov, T.; Karafiat, M.; Burget, L.; Cernocky, J.; and Khudanpur, S. 2010. Recurrent Neural Network Based Language Model. In Proceedings of INTERSPEECH.

[29]

Mikolov, T.; Deoras, A.; Kombrink, S.; Burget, L.; and Cernocky, J. 2011. Empirical Evaluation and Combination of Advanced Language Modeling Techniques. In Proceedings of INTERSPEECH.

[30]

Mikolov, T.; Sutskever, I.; Deoras, A.; Le, H.-S.; Kombrink, S.; and Cernocky, J. 2012. Subword Language Modeling with Neural Networks. preprint: www.fit.vutbr.cz/~imikolov/rnnlm/char.pdf.

[31]

Mikolov, T.; Chen, K.; Corrado, G.; and Dean, J. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781.

[32]

Mnih, A., and Hinton, G. 2007. Three New Graphical Models for Statistical Language Modelling. In Proceedings of ICML.

Digital Library

[33]

Morin, F., and Bengio, Y. 2005. Hierarchical Probabilistic Neural Network Language Model. In Proceedings of AISTATS.

[34]

Pascanu, R.; Culcehre, C.; Cho, K.; and Bengio, Y. 2013. How to Construct Deep Neural Networks. arXiv:1312.6026.

[35]

Qui, S.; Cui, Q.; Bian, J.; and Gao, B. 2014. Co-learning of Word Representations and Morpheme Representations. In Proceedings of COLING.

[36]

Shen, Y.; He, X.; Gao, J.; Deng, L.; and Mesnil, G. 2014. A Latent Semantic Model with Convolutional-pooling Structure for Information Retrieval. In Proceedings of CIKM.

Digital Library

[37]

Srivastava, R. K.; Greff, K.; and Schmidhuber, J. 2015. Training Very Deep Networks. arXiv:1507.06228.

Digital Library

[38]

Sundermeyer, M.; Schluter, R.; and Ney, H. 2012. LSTM Neural Networks for Language Modeling.

[39]

Sutskever, I.; Martens, J.; and Hinton, G. 2011. Generating Text with Recurrent Neural Networks.

[40]

Sutskever, I.; Vinyals, O.; and Le, Q. 2014. Sequence to Sequence Learning with Neural Networks.

[41]

Wang, M.; Lu, Z.; Li, H.; Jiang, W.; and Liu, Q. 2015. genCNN: A Convolutional Architecture for Word Sequence Prediction. In Proceedings of ACL.

[42]

Werbos, P. 1990. Back-propagation Through Time: what it does and how to do it. In Proceedings of IEEE.

[43]

Zaremba, W.; Sutskever, I.; and Vinyals, O. 2014. Recurrent Neural Network Regularization. arXiv:1409.2329.

[44]

Zhang, S.; Jiang, H.; Xu, M.; Hou, J.; and Dai, L. 2015. The Fixed-Size Ordinally-Forgetting Encoding Method for Neural Network Language Models. In Proceedings of ACL.

[45]

Zhang, X.; Zhao, J.; and LeCun, Y. 2015. Character-level Convolutional Networks for Text Classification. In Proceedings of NIPS.

Digital Library

Cited By

Zhang ZMao XWang SYang KLu YBaysal OLinares-Vasquez MMoran KSteinmacher I(2024)CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge ExtractionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644431(438-449)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643916.3644431
Zang YXue ZOu SLong YZhou HDu J(2023)FedPcf : An Integrated Federated Learning Framework with Multi-Level Prospective Correction FactorProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592298(490-498)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3591106.3592298
Xiang WWang B(2023)A Survey of Implicit Discourse Relation RecognitionACM Computing Surveys10.1145/357413455:12(1-34)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3574134
Show More Cited By

Index Terms

Character-aware neural language models
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Enhancing recurrent neural network-based language models by word tokenization

Different approaches have been used to estimate language models from a given corpus. Recently, researchers have used different neural network architectures to estimate the language models from a given corpus using unsupervised learning neural networks ...
Paraphrastic language models

Natural languages are known for their expressive richness. Many sentences can be used to represent the same underlying meaning. Only modelling the observed surface word sequence can result in poor context coverage and generalization, for example, when ...
Exploring Character-Level Deep Learning Models for POS Tagging in Assamese Language
Abstract
The proposed research investigates a novel approach of character-level Long Short-Term Memory (LSTM) and Bidirectional Long Short-Term Memory (Bi-LSTM) for part-of-speech (POS) tagging in the Assamese language. The proposed work contributes to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

February 2016

4406 pages

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 12 February 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

100
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang ZMao XWang SYang KLu YBaysal OLinares-Vasquez MMoran KSteinmacher I(2024)CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge ExtractionProceedings of the 32nd IEEE/ACM International Conference on Program Comprehension10.1145/3643916.3644431(438-449)Online publication date: 15-Apr-2024
https://dl.acm.org/doi/10.1145/3643916.3644431
Zang YXue ZOu SLong YZhou HDu J(2023)FedPcf : An Integrated Federated Learning Framework with Multi-Level Prospective Correction FactorProceedings of the 2023 ACM International Conference on Multimedia Retrieval10.1145/3591106.3592298(490-498)Online publication date: 12-Jun-2023
https://dl.acm.org/doi/10.1145/3591106.3592298
Xiang WWang B(2023)A Survey of Implicit Discourse Relation RecognitionACM Computing Surveys10.1145/357413455:12(1-34)Online publication date: 2-Mar-2023
https://dl.acm.org/doi/10.1145/3574134
Subramanian ASuresh PSanthiappan S(2023)A robust section identification method for scanned electronic health recordsProceedings of the 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD)10.1145/3570991.3571011(213-217)Online publication date: 4-Jan-2023
https://dl.acm.org/doi/10.1145/3570991.3571011
Xu ZRawat YWong YKankanhalli MShah MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Don't pour cereal into coffeeProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601353(14890-14903)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601353
Zhang DLi W(2022)An Improved Math Word Problem (MWP) Model Using Unified Pretrained Language Model (UniLM) for PretrainingComputational Intelligence and Neuroscience10.1155/2022/74682862022Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1155/2022/7468286
Houidi ZAzorin RGallo MFinamore ARossi D(2022)Towards a systematic multi-modal representation learning for network dataProceedings of the 21st ACM Workshop on Hot Topics in Networks10.1145/3563766.3564108(181-187)Online publication date: 14-Nov-2022
https://dl.acm.org/doi/10.1145/3563766.3564108
Ilyas IRekatsinas TKonda VPound JQi XSoliman MIves ZBonifati AEl Abbadi A(2022)Saga: A Platform for Continuous Construction and Serving of Knowledge at ScaleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526049(2259-2272)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526049
Izadi MGismondi RGousios GDwyer MDamian DZeller A(2022)CodeFillProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510172(401-412)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510172
Gupta MAgrawal P(2022)Compression of Deep Learning Models for Text: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/348704516:4(1-55)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3487045
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents