Article

Distributed representations of words and phrases and their compositionality

Authors:

Ilya Sutskever,

Jeffrey DeanAuthors Info & Claims

NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

Pages 3111 - 3119

Published: 05 December 2013 Publication History

Abstract

The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling.

An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

References

[1]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003.

[2]

Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pages 160-167. ACM, 2008.

[3]

Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Domain adaptation for large-scale sentiment classification: A deep learning approach. In ICML, 513-520, 2011.

[4]

Michael U Gutmann and Aapo Hyvärinen. Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. The Journal of Machine Learning Research, 13:307-361, 2012.

[5]

Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. Extensions of recurrent neural network language model. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 5528-5531. IEEE, 2011.

[6]

Tomas Mikolov, Anoop Deoras, Daniel Povey, Lukas Burget and Jan Cernocky. Strategies for Training Large Scale Neural Network Language Models. In Proc. Automatic Speech Recognition and Understanding, 2011.

[7]

Tomas Mikolov. Statistical Language Models Based on Neural Networks. PhD thesis, PhD Thesis, Brno University of Technology, 2012.

[8]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.

[9]

Tomas Mikolov, Wen-tau Yih and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.

[10]

Andriy Mnih and Geoffrey E Hinton. A scalable hierarchical distributed language model. Advances in neural information processing systems, 21:1081-1088, 2009.

[11]

Andriy Mnih and Yee Whye Teh. A fast and simple algorithm for training neural probabilistic language models. arXiv preprint arXiv:1206.6426, 2012.

[12]

Frederic Morin and Yoshua Bengio. Hierarchical probabilistic neural network language model. In Proceedings of the international workshop on artificial intelligence and statistics, pages 246-252, 2005.

[13]

David E Rumelhart, Geoffrey E Hintont, and Ronald J Williams. Learning representations by back-propagating errors. Nature, 323(6088):533-536, 1986.

[14]

Holger Schwenk. Continuous space language models. Computer Speech and Language, vol. 21, 2007.

[15]

Richard Socher, Cliff C. Lin, Andrew Y. Ng, and Christopher D. Manning. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 26th International Conference on Machine Learning (ICML), volume 2, 2011.

[16]

Richard Socher, Brody Huval, Christopher D. Manning, and Andrew Y. Ng. Semantic Compositionality Through Recursive Matrix-Vector Spaces. In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2012.

[17]

Joseph Turian, Lev Ratinov, and Yoshua Bengio. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384-394. Association for Computational Linguistics, 2010.

[18]

Peter D. Turney and Patrick Pantel. From frequency to meaning: Vector space models of semantics. In Journal of Artificial Intelligence Research, 37:141-188, 2010.

[19]

Peter D. Turney. Distributional semantics beyond words: Supervised learning of analogy and paraphrase. In Transactions of the Association for Computational Linguistics (TACL), 353-366, 2013.

[20]

Jason Weston, Samy Bengio, and Nicolas Usunier. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three, pages 2764-2770. AAAI Press, 2011.

Cited By

Zheng YLi FLi CZhang ZCao RSohail N(2024)A Natural Language Processing Model for Automated Organization and Analysis of Intangible Cultural HeritageJournal of Organizational and End User Computing10.4018/JOEUC.34973636:1(1-27)Online publication date: 30-Jul-2024
https://dl.acm.org/doi/10.4018/JOEUC.349736
Fu SWang LYang J(2024)A review on network representation learning with multi-granularity perspectiveIntelligent Data Analysis10.3233/IDA-22732828:1(3-32)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDA-227328
Lyu CFan QGuyard PDiao Y(2024)A Spark Optimizer for Adaptive, Fine-Grained Parameter TuningProceedings of the VLDB Endowment10.14778/3681954.368202117:11(3565-3579)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682021
Show More Cited By

Distributed representations of words and phrases and their compositionality
1. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Vietnamese Paraphrase Identification Using Matching Duplicate Phrases and Similar Words
Future Data and Security Engineering
Abstract
Paraphrase identification is a core component for many significant tasks in natural language processing (e.g., text summarization, headline generation). A method suggested by Bach et al. for detecting Vietnamese paraphrase text using nine ...
Learning Distributed Representations of Uyghur Words and Morphemes
Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data
Abstract
While distributed representations have proven to be very successful in a variety of NLP tasks, learning distributed representations for agglutinative languages such as Uyghur still faces a major challenge: most words are composed of many morphemes ...
Extension of Zipf's law to words and phrases
COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1

Zipf's law states that the frequency of word tokens in a large corpus of natural language is inversely proportional to the rank. The law is investigated for two languages English and Mandarin and for n-gram word phrases as well as for single words. The ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2

December 2013

3236 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2,681
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng YLi FLi CZhang ZCao RSohail N(2024)A Natural Language Processing Model for Automated Organization and Analysis of Intangible Cultural HeritageJournal of Organizational and End User Computing10.4018/JOEUC.34973636:1(1-27)Online publication date: 30-Jul-2024
https://dl.acm.org/doi/10.4018/JOEUC.349736
Fu SWang LYang J(2024)A review on network representation learning with multi-granularity perspectiveIntelligent Data Analysis10.3233/IDA-22732828:1(3-32)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDA-227328
Lyu CFan QGuyard PDiao Y(2024)A Spark Optimizer for Adaptive, Fine-Grained Parameter TuningProceedings of the VLDB Endowment10.14778/3681954.368202117:11(3565-3579)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682021
Ikeuchi KWake NSasabuchi KTakamatsu J(2024)Semantic constraints to represent common sense required in household actions for multimodal learning-from-observation robotInternational Journal of Robotics Research10.1177/0278364923121292943:2(134-170)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1177/02783649231212929
Weng CQin YLin BLiu PChen L(2024)MatsVD: Boosting Statement-Level Vulnerability Detection via Dependency-Based AttentionProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3674807(115-124)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3674807
Feng YLi HCao YWang YFeng H(2024)CRABS-former: CRoss-Architecture Binary Code Similarity Detection based on TransformerProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671390(11-20)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671390
Zhang ZTawsif FRyu KYu THalfond W(2024)Mobile Bug Report Reproduction via Global Search on the App UI ModelProceedings of the ACM on Software Engineering10.1145/36608241:FSE(2656-2676)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660824
Göpfert CHaig AHsu CChow YVendrov ILu TRamachandran DPham HGhavamzadeh MBoutilier C(2024)Discovering Personalized Semantics for Soft Attributes in Recommender Systems Using Concept Activation VectorsACM Transactions on Recommender Systems10.1145/36586752:4(1-37)Online publication date: 16-Apr-2024
https://dl.acm.org/doi/10.1145/3658675
Sun WFang CGe YHu YChen YZhang QGe XLiu YChen Z(2024)A Survey of Source Code Search: A 3-Dimensional PerspectiveACM Transactions on Software Engineering and Methodology10.1145/365634133:6(1-51)Online publication date: 28-Jun-2024
https://dl.acm.org/doi/10.1145/3656341
Zheng CJiang GYan XYin PZhou QCheng J(2024)GE2: A General and Efficient Knowledge Graph Embedding Learning SystemProceedings of the ACM on Management of Data10.1145/36549862:3(1-27)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654986
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents