Article

Distributed representations of sentences and documents

Authors:

Tomas MikolovAuthors Info & Claims

ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32

Pages II-1188 - II-1196

Published: 21 June 2014 Publication History

Abstract

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperforms bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

References

[1]

Bengio, Yoshua, Schwenk, Holger, Senécal, Jean-Sébastien, Morin, Fréderic, and Gauvain, Jean-Luc. Neural probabilistic language models. In Innovations in Machine Learning, pp. 137-186. Springer, 2006.

[2]

Collobert, Ronan and Weston, Jason. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, pp. 160-167. ACM, 2008.

[3]

Collobert, Ronan, Weston, Jason, Bottou, Léon, Karlen, Michael, Kavukcuoglu, Koray, and Kuksa, Pavel. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493-2537, 2011.

[4]

Dahl, George E., Adams, Ryan P., and Larochelle, Hugo. Training Restricted Boltzmann Machines on word observations. In International Conference on Machine Learning, 2012.

[5]

Elman, Jeff. Finding structure in time. In Cognitive Science, pp. 179-211, 1990.

[6]

Frome, Andrea, Corrado, Greg S., Shlens, Jonathon, Bengio, Samy, Dean, Jeffrey, Ranzato, Marc'Aurelio, and Mikolov, Tomas. DeViSE: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, 2013.

[7]

Grefenstette, E., Dinu, G., Zhang, Y., Sadrzadeh, M., and Baroni, M. Multi-step regression learning for compositional distributional semantics. In Conference on Empirical Methods in Natural Language Processing, 2013.

[8]

Harris, Zellig. Distributional structure. Word, 1954.

[9]

Huang, Eric, Socher, Richard, Manning, Christopher, and Ng, Andrew Y. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 873-882. Association for Computational Linguistics, 2012.

[10]

Jaakkola, Tommi and Haussler, David. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11, pp. 487- 493, 1999.

[11]

Klein, Dan and Manning, Chris D. Accurate unlexicalized parsing. In Proceedings of Association for Computational Linguistics, 2003.

[12]

Larochelle, Hugo and Lauly, Stanislas. A neural autoregressive topic model. In Advances in Neural Information Processing Systems, 2012.

[13]

Maas, Andrew L., Daly, Raymond E., Pham, Peter T., Huang, Dan, Ng, Andrew Y., and Potts, Christopher. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011.

[14]

Mikolov, Tomas. Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.

[15]

Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013a.

[16]

Mikolov, Tomas, Le, Quoc V., and Sutskever, Ilya. Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168, 2013b.

[17]

Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. Distributed representations of phrases and their compositionality. In Advances on Neural Information Processing Systems, 2013c.

[18]

Mikolov, Tomas, Yih, Scott Wen-tau, and Zweig, Geoffrey. Linguistic regularities in continuous space word representations. In NAACL HLT, 2013d.

[19]

Mitchell, Jeff and Lapata, Mirella. Composition in distributional models of semantics. Cognitive Science, 2010.

[20]

Mnih, Andriy and Hinton, Geoffrey E. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems, pp. 1081-1088, 2008.

[21]

Morin, Frederic and Bengio, Yoshua. Hierarchical probabilistic neural network language model. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, pp. 246-252, 2005.

[22]

Pang, Bo and Lee, Lillian. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of Association for Computational Linguistics, pp. 115-124, 2005.

[23]

Perronnin, Florent and Dance, Christopher. Fisher kernels on visual vocabularies for image categorization. In IEEE Conference on Computer Vision and Pattern Recognition, 2007.

[24]

Perronnin, Florent, Liu, Yan, Sanchez, Jorge, and Poirier, Herve. Large-scale image retrieval with compressed fisher vectors. In IEEE Conference on Computer Vision and Pattern Recognition, 2010.

[25]

Rumelhart, David E, Hinton, Geoffrey E, and Williams, Ronald J. Learning representations by backpropagating errors. Nature, 323(6088):533-536, 1986.

[26]

Socher, Richard, Huang, Eric H., Pennington, Jeffrey, Manning, Chris D., and Ng, Andrew Y. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems, 2011a.

[27]

Socher, Richard, Lin, Cliff C, Ng, Andrew, and Manning, Chris. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML- 11), pp. 129-136, 2011b.

[28]

Socher, Richard, Pennington, Jeffrey, Huang, Eric H, Ng, Andrew Y, and Manning, Christopher D. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011c.

[29]

Socher, Richard, Chen, Danqi, Manning, Christopher D., and Ng, Andrew Y. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems, 2013a.

[30]

Socher, Richard, Perelygin, Alex,Wu, Jean Y., Chuang, Jason, Manning, Christopher D., Ng, Andrew Y., and Potts, Christopher. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing, 2013b.

[31]

Srivastava, Nitish, Salakhutdinov, Ruslan, and Hinton, Geoffrey. Modeling documents with deep boltzmann machines. In Uncertainty in Artificial Intelligence, 2013.

[32]

Turian, Joseph, Ratinov, Lev, and Bengio, Yoshua. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384-394. Association for Computational Linguistics, 2010.

[33]

Turney, Peter D. and Pantel, Patrick. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 2010.

[34]

Wang, Sida and Manning, Chris D. Baselines and bigrams: Simple, good sentiment and text classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012.

[35]

Yessenalina, Ainur and Cardie, Claire. Compositional matrix-space models for sentiment analysis. In Conference on Empirical Methods in Natural Language Processing, 2011.

[36]

Zanzotto, Fabio, Korkontzelos, Ioannis, Fallucchi, Francesca, and Manandhar, Suresh. Estimating linear models for compositional distributional semantics. In COLING, 2010.

[37]

Zhila, A., Yih, W.T., Meek, C., Zweig, G., and Mikolov, T. Combining heterogeneous models for measuring relational similarity. In NAACL HLT, 2013.

[38]

Zou, Will, Socher, Richard, Cer, Daniel, and Manning, Christopher. Bilingual word embeddings for phrase-based machine translation. In Conference on Empirical Methods in Natural Language Processing, 2013.

Cited By

Kula Evan Deursen AGousios GFilkov VRay BZhou M(2024)Context-Aware Automated Sprint Plan Generation for Agile Software DevelopmentProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695540(1745-1756)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695540
Feng YLi HCao YWang YFeng H(2024)CRABS-former: CRoss-Architecture Binary Code Similarity Detection based on TransformerProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671390(11-20)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671390
Aman HAmasaki SYokogawa TKawahara M(2024)A Quantitative Investigation of Trends in Confusing Variable Pairs Through Commits: Do Confusing Variable Pairs Survive?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661228(90-99)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661228
Show More Cited By

Distributed representations of sentences and documents
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Semi-supervised distributed representations of documents for sentiment analysis
Abstract
Learning document representation is important in applying machine learning algorithms for sentiment analysis. Distributed representation learning models of words and documents, one of neural language models, have overcome some limits ...
Extracting parallel paragraphs and sentences from english-persian translated documents
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

The task of sentence and paragraph alignment is essential for preparing parallel texts that are needed in applications such as machine translation. The lack of sufficient linguistic data for under-resourced languages like Persian is a challenging issue. ...
Some Remarks on Vector Representations of Legal Documents
DEXA '00: Proceedings of the 11th International Workshop on Database and Expert Systems Applications

Vector representation of legal documents is still the best way for computing classification clusters and labelling of its contents. This paper deals with the problem of diversity of legal documents making vector representation a difficult task. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32

June 2014

2786 pages

Publisher

JMLR.org

Publication History

Published: 21 June 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

699
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kula Evan Deursen AGousios GFilkov VRay BZhou M(2024)Context-Aware Automated Sprint Plan Generation for Agile Software DevelopmentProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695540(1745-1756)Online publication date: 27-Oct-2024
https://dl.acm.org/doi/10.1145/3691620.3695540
Feng YLi HCao YWang YFeng H(2024)CRABS-former: CRoss-Architecture Binary Code Similarity Detection based on TransformerProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671390(11-20)Online publication date: 24-Jul-2024
https://dl.acm.org/doi/10.1145/3671016.3671390
Aman HAmasaki SYokogawa TKawahara M(2024)A Quantitative Investigation of Trends in Confusing Variable Pairs Through Commits: Do Confusing Variable Pairs Survive?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661228(90-99)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661228
Abedu SAbdellatif AShihab E(2024)LLM-Based Chatbots for Mining Software Repositories: Challenges and OpportunitiesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661218(201-210)Online publication date: 18-Jun-2024
https://dl.acm.org/doi/10.1145/3661167.3661218
Dou SWu YJia HZhou YLiu YLiu Y(2024)CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone DetectionProceedings of the ACM on Software Engineering10.1145/36607771:FSE(1564-1584)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3660777
Iannone ESellitto GIaccarino EFerrucci FDe Lucia APalomba F(2024)Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?ACM Transactions on Software Engineering and Methodology10.1145/365444333:6(1-41)Online publication date: 27-Jun-2024
https://dl.acm.org/doi/10.1145/3654443
Zha RSun YQin CZhang LXu TZhu HChen E(2024)Towards Unified Representation Learning for Career Mobility Analysis with Trajectory HypergraphACM Transactions on Information Systems10.1145/365115842:4(1-28)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3651158
Yan YCooper NMoran KBavota GPoshyvanyk DRich S(2024)Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence GraphsProceedings of the ACM on Software Engineering10.1145/36437701:FSE(972-995)Online publication date: 12-Jul-2024
https://dl.acm.org/doi/10.1145/3643770
Shang XZhang SZhang YGuo SLi YChen RLi HLi XJiang H(2024)Analyzing and Detecting Information Types of Developer Live Chat ThreadsACM Transactions on Software Engineering and Methodology10.1145/364367733:5(1-32)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3643677
Cheng BZhao SWang KWang MBai GFeng RGuo YMa LWang H(2024)Beyond Fidelity: Explaining Vulnerability Localization of Learning-Based DetectorsACM Transactions on Software Engineering and Methodology10.1145/364154333:5(1-33)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3641543
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents