skip to main content
10.5555/3044805.3045025guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Distributed representations of sentences and documents

Published: 21 June 2014 Publication History

Abstract

Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, "powerful," "strong" and "Paris" are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bag-of-words models. Empirical results show that Paragraph Vectors outperforms bag-of-words models as well as other techniques for text representations. Finally, we achieve new state-of-the-art results on several text classification and sentiment analysis tasks.

References

[1]
Bengio, Yoshua, Schwenk, Holger, Senécal, Jean-Sébastien, Morin, Fréderic, and Gauvain, Jean-Luc. Neural probabilistic language models. In Innovations in Machine Learning, pp. 137-186. Springer, 2006.
[2]
Collobert, Ronan and Weston, Jason. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, pp. 160-167. ACM, 2008.
[3]
Collobert, Ronan, Weston, Jason, Bottou, Léon, Karlen, Michael, Kavukcuoglu, Koray, and Kuksa, Pavel. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493-2537, 2011.
[4]
Dahl, George E., Adams, Ryan P., and Larochelle, Hugo. Training Restricted Boltzmann Machines on word observations. In International Conference on Machine Learning, 2012.
[5]
Elman, Jeff. Finding structure in time. In Cognitive Science, pp. 179-211, 1990.
[6]
Frome, Andrea, Corrado, Greg S., Shlens, Jonathon, Bengio, Samy, Dean, Jeffrey, Ranzato, Marc'Aurelio, and Mikolov, Tomas. DeViSE: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, 2013.
[7]
Grefenstette, E., Dinu, G., Zhang, Y., Sadrzadeh, M., and Baroni, M. Multi-step regression learning for compositional distributional semantics. In Conference on Empirical Methods in Natural Language Processing, 2013.
[8]
Harris, Zellig. Distributional structure. Word, 1954.
[9]
Huang, Eric, Socher, Richard, Manning, Christopher, and Ng, Andrew Y. Improving word representations via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1, pp. 873-882. Association for Computational Linguistics, 2012.
[10]
Jaakkola, Tommi and Haussler, David. Exploiting generative models in discriminative classifiers. In Advances in Neural Information Processing Systems 11, pp. 487- 493, 1999.
[11]
Klein, Dan and Manning, Chris D. Accurate unlexicalized parsing. In Proceedings of Association for Computational Linguistics, 2003.
[12]
Larochelle, Hugo and Lauly, Stanislas. A neural autoregressive topic model. In Advances in Neural Information Processing Systems, 2012.
[13]
Maas, Andrew L., Daly, Raymond E., Pham, Peter T., Huang, Dan, Ng, Andrew Y., and Potts, Christopher. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011.
[14]
Mikolov, Tomas. Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012.
[15]
Mikolov, Tomas, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013a.
[16]
Mikolov, Tomas, Le, Quoc V., and Sutskever, Ilya. Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168, 2013b.
[17]
Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg, and Dean, Jeffrey. Distributed representations of phrases and their compositionality. In Advances on Neural Information Processing Systems, 2013c.
[18]
Mikolov, Tomas, Yih, Scott Wen-tau, and Zweig, Geoffrey. Linguistic regularities in continuous space word representations. In NAACL HLT, 2013d.
[19]
Mitchell, Jeff and Lapata, Mirella. Composition in distributional models of semantics. Cognitive Science, 2010.
[20]
Mnih, Andriy and Hinton, Geoffrey E. A scalable hierarchical distributed language model. In Advances in Neural Information Processing Systems, pp. 1081-1088, 2008.
[21]
Morin, Frederic and Bengio, Yoshua. Hierarchical probabilistic neural network language model. In Proceedings of the International Workshop on Artificial Intelligence and Statistics, pp. 246-252, 2005.
[22]
Pang, Bo and Lee, Lillian. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of Association for Computational Linguistics, pp. 115-124, 2005.
[23]
Perronnin, Florent and Dance, Christopher. Fisher kernels on visual vocabularies for image categorization. In IEEE Conference on Computer Vision and Pattern Recognition, 2007.
[24]
Perronnin, Florent, Liu, Yan, Sanchez, Jorge, and Poirier, Herve. Large-scale image retrieval with compressed fisher vectors. In IEEE Conference on Computer Vision and Pattern Recognition, 2010.
[25]
Rumelhart, David E, Hinton, Geoffrey E, and Williams, Ronald J. Learning representations by backpropagating errors. Nature, 323(6088):533-536, 1986.
[26]
Socher, Richard, Huang, Eric H., Pennington, Jeffrey, Manning, Chris D., and Ng, Andrew Y. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In Advances in Neural Information Processing Systems, 2011a.
[27]
Socher, Richard, Lin, Cliff C, Ng, Andrew, and Manning, Chris. Parsing natural scenes and natural language with recursive neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML- 11), pp. 129-136, 2011b.
[28]
Socher, Richard, Pennington, Jeffrey, Huang, Eric H, Ng, Andrew Y, and Manning, Christopher D. Semi-supervised recursive autoencoders for predicting sentiment distributions. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011c.
[29]
Socher, Richard, Chen, Danqi, Manning, Christopher D., and Ng, Andrew Y. Reasoning with neural tensor networks for knowledge base completion. In Advances in Neural Information Processing Systems, 2013a.
[30]
Socher, Richard, Perelygin, Alex,Wu, Jean Y., Chuang, Jason, Manning, Christopher D., Ng, Andrew Y., and Potts, Christopher. Recursive deep models for semantic compositionality over a sentiment treebank. In Conference on Empirical Methods in Natural Language Processing, 2013b.
[31]
Srivastava, Nitish, Salakhutdinov, Ruslan, and Hinton, Geoffrey. Modeling documents with deep boltzmann machines. In Uncertainty in Artificial Intelligence, 2013.
[32]
Turian, Joseph, Ratinov, Lev, and Bengio, Yoshua. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384-394. Association for Computational Linguistics, 2010.
[33]
Turney, Peter D. and Pantel, Patrick. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 2010.
[34]
Wang, Sida and Manning, Chris D. Baselines and bigrams: Simple, good sentiment and text classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, 2012.
[35]
Yessenalina, Ainur and Cardie, Claire. Compositional matrix-space models for sentiment analysis. In Conference on Empirical Methods in Natural Language Processing, 2011.
[36]
Zanzotto, Fabio, Korkontzelos, Ioannis, Fallucchi, Francesca, and Manandhar, Suresh. Estimating linear models for compositional distributional semantics. In COLING, 2010.
[37]
Zhila, A., Yih, W.T., Meek, C., Zweig, G., and Mikolov, T. Combining heterogeneous models for measuring relational similarity. In NAACL HLT, 2013.
[38]
Zou, Will, Socher, Richard, Cer, Daniel, and Manning, Christopher. Bilingual word embeddings for phrase-based machine translation. In Conference on Empirical Methods in Natural Language Processing, 2013.

Cited By

View all
  • (2024)Context-Aware Automated Sprint Plan Generation for Agile Software DevelopmentProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695540(1745-1756)Online publication date: 27-Oct-2024
  • (2024)CRABS-former: CRoss-Architecture Binary Code Similarity Detection based on TransformerProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671390(11-20)Online publication date: 24-Jul-2024
  • (2024)A Quantitative Investigation of Trends in Confusing Variable Pairs Through Commits: Do Confusing Variable Pairs Survive?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661228(90-99)Online publication date: 18-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
ICML'14: Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32
June 2014
2786 pages

Publisher

JMLR.org

Publication History

Published: 21 June 2014

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Context-Aware Automated Sprint Plan Generation for Agile Software DevelopmentProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering10.1145/3691620.3695540(1745-1756)Online publication date: 27-Oct-2024
  • (2024)CRABS-former: CRoss-Architecture Binary Code Similarity Detection based on TransformerProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671390(11-20)Online publication date: 24-Jul-2024
  • (2024)A Quantitative Investigation of Trends in Confusing Variable Pairs Through Commits: Do Confusing Variable Pairs Survive?Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661228(90-99)Online publication date: 18-Jun-2024
  • (2024)LLM-Based Chatbots for Mining Software Repositories: Challenges and OpportunitiesProceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering10.1145/3661167.3661218(201-210)Online publication date: 18-Jun-2024
  • (2024)CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone DetectionProceedings of the ACM on Software Engineering10.1145/36607771:FSE(1564-1584)Online publication date: 12-Jul-2024
  • (2024)Early and Realistic Exploitability Prediction of Just-Disclosed Software Vulnerabilities: How Reliable Can It Be?ACM Transactions on Software Engineering and Methodology10.1145/365444333:6(1-41)Online publication date: 27-Jun-2024
  • (2024)Towards Unified Representation Learning for Career Mobility Analysis with Trajectory HypergraphACM Transactions on Information Systems10.1145/365115842:4(1-28)Online publication date: 6-Mar-2024
  • (2024)Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence GraphsProceedings of the ACM on Software Engineering10.1145/36437701:FSE(972-995)Online publication date: 12-Jul-2024
  • (2024)Analyzing and Detecting Information Types of Developer Live Chat ThreadsACM Transactions on Software Engineering and Methodology10.1145/364367733:5(1-32)Online publication date: 4-Jun-2024
  • (2024)Beyond Fidelity: Explaining Vulnerability Localization of Learning-Based DetectorsACM Transactions on Software Engineering and Methodology10.1145/364154333:5(1-33)Online publication date: 4-Jun-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media