skip to main content
10.1145/3292500.3330955acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open access

Assessing The Factual Accuracy of Generated Text

Published: 25 July 2019 Publication History

Abstract

We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-to-end models are shown to be able to extract complete sets of facts from datasets with full pages of text. We then analyse multiple models that estimate factual accuracy on a Wikipedia text summarization task, and show their efficacy compared to ROUGE and other model-free variants by conducting a human evaluation study.

References

[1]
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, 2015, Neural Machine Translation by Jointly Learning to Align and Translate. In International Conference on Learning Representations .
[2]
Michele Banko, Michael J. Cafarella, Stephen Soderland, Matt Broadhead, and Oren Etzioni, 2007, Open Information Extraction from the Web. In Proceedings of the 20th International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2670--2676.
[3]
Peter F. Brown, Vincent J. Della Pietra, Robert L. Mercer, Stephen A. Della Pietra, and Jennifer C. Lai. 1992, An Estimate of an Upper Bound for the Entropy of English, Computational Linguistics, Vol. 18, 1 (March 1992), 31--40.
[4]
Ziqiang Cao, Furu Wei, Wenjie Li, and Sujian Li, 2017, Faithful to the Original: Fact Aware Neural Abstractive Summarization, CoRR, Vol. abs/1711.04434 (2017). arxiv: 1711.04434 http://arxiv.org/abs/1711.04434
[5]
Jason Chiu and Eric Nichols. 2016, Named Entity Recognition with Bidirectional LSTM-CNNs, Transactions of the Association for Computational Linguistics, Vol. 4 (2016), 357--370.
[6]
Kevin Clark and Christopher D. Manning. 2016, Improving Coreference Resolution by Learning Entity-Level Distributed Representations. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, 643--653.
[7]
Jenny Rose Finkel, Trond Grenager, and Christopher Manning, 2005, Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), 363--370.
[8]
Eduard Hovy, Chin-Yew Lin, Liang Zhou, and Junichi Fukumoto. 2006, Automated Summarization Evaluation with Basic Elements. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC'06), European Language Resources Association (ELRA).
[9]
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer, 2016, Neural Architectures for Named Entity Recognition, In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 260--270.
[10]
Alon Lavie and Abhaya Agarwal. 2007, Meteor: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments. In Proceedings of the Second Workshop on Statistical Machine Translation. Association for Computational Linguistics, Stroudsburg, PA, USA, 228--231.
[11]
Heeyoung Lee, Yves Peirsman, Angel Chang, Nathanael Chambers, Mihai Surdeanu, and Dan Jurafsky. 2011, Stanford's Multi-Pass Sieve Coreference Resolution System at the CoNLL-2011 Shared Task. In In Proceedings of the CoNLL-2011 Shared Task .
[12]
Jiwei Li, Will Monroe, Tianlin Shi, Sébastien Jean, Alan Ritter, and Dan Jurafsky, 2017, Adversarial Learning for Neural Dialogue Generation. In Conference on Empirical Methods in Natural Language Processing. 2157--2169.
[13]
Chin-Yew Lin, 2004, ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Stan Szpakowicz Marie-Francine Moens (Ed.). Association for Computational Linguistics, Barcelona, Spain, 74--81.
[14]
Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, and Maosong Sun. 2016, Neural Relation Extraction with Selective Attention over Instances. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 2124--2133.
[15]
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Łukasz Kaiser, and Noam Shazeer. 2018, Generating Wikipedia by Summarizing Long Sequences. In Proceedings of the 2018 International Conference on Learning Representations .
[16]
Andrew Mccallum and David Jensen. 2003, A Note on the Unification of Information Extraction and Data Mining using Conditional-Probability, Relational Models. In In Proceedings of the IJCAI-2003 Workshop on Learning Statistical Models from Relational Data .
[17]
Mike Mintz, Steven Bills, Rion Snow, and Dan Jurafsky, 2009, Distant Supervision for Relation Extraction Without Labeled Data. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 1003--1011.
[18]
Makoto Miwa and Mohit Bansal. 2016, End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, 1105--1116.
[19]
Makoto Miwa and Yutaka Sasaki. 2014, Modeling Joint Entity and Relation Extraction with Table Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1858--1869.
[20]
Thahir P. Mohamed, Estevam R. Hruschka, Jr., and Tom M. Mitchell. 2011, Discovering Relations Between Noun Categories. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 1447--1455.
[21]
Ramesh Nallapati, Bowen Zhou, Cícero Nogueira dos Santos, Çaglar Gülçehre, and Bing Xiang, 2016, Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond. In Proceedings of the 2016 SIGNLL Conference on Computational Natural Language Learning .
[22]
Ani Nenkova and Rebecca J. Passonneau. 2004, Evaluating Content Selection in Summarization: The Pyramid Method. In Proceedings of the 2005 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 145--152.
[23]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu, 2002, BLEU: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, 311--318.
[24]
Karthik Raghunathan, Heeyoung Lee, Sudarshan Rangarajan, Nathanael Chambers, Mihai Surdeanu, Dan Jurafsky, and Christopher Manning, 2010, A Multi-pass Sieve for Coreference Resolution. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Stroudsburg, PA, USA, 492--501.
[25]
Marta Recasens, Marie-Catherine de Marneffe, and Christopher Potts. 2013, The Life and Death of Discourse Entities: Identifying Singleton Mentions. In Proceedings of the 2013 North American Chapter of the Association for Computational Linguistics .
[26]
Sebastian Riedel, Limin Yao, Andrew McCallum, and Benjamin M. Marlin. 2013, Relation Extraction with Matrix Factorization and Universal Schemas. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 74--84.
[27]
Alexander M. Rush, Sumit Chopra, and Jason Weston, 2015, A Neural Attention Model for Abstractive Sentence Summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing .
[28]
Iulian Vlad Serban, Tim Klinger, Gerald Tesauro, Kartik Talamadupula, Bowen Zhou, Yoshua Bengio, and Aaron C. Courville, 2017. Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation. In Proceedings of the 2017 AAAI Conference on Artificial Intelligence. 3288--3294.
[29]
Iulian Vlad Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Mudumba, Alexandre de Bré bisson, Jose Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, and Yoshua Bengio, 2017. A Deep Reinforcement Learning Chatbot, CoRR, Vol. abs/1709.02349 (2017). arxiv: 1709.02349 http://arxiv.org/abs/1709.02349
[30]
Noam Shazeer and Mitchell Stern. 2018, Adafactor: Adaptive Learning Rates with Sublinear Memory Cost. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, 4603--4611.
[31]
Yanyao Shen, Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, and Animashree Anandkumar. 2017, Deep Active Learning for Named Entity Recognition, CoRR, Vol. abs/1707.05928 (2017). arxiv: 1707.05928 http://arxiv.org/abs/1707.05928
[32]
Daniil Sorokin and Iryna Gurevych. 2017, Context-Aware Representations for Knowledge Base Relation Extraction. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1784--1789.
[33]
Josef Steinberger and Karel Jezek. 2009, Evaluation Measures for Text Summarization, Computing and Informatics, Vol. 28 (2009), 251--275.
[34]
Mihai Surdeanu, Julie Tibshirani, Ramesh Nallapati, and Christopher D. Manning. 2012, Multi-instance Multi-label Learning for Relation Extraction. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, Stroudsburg, PA, USA, 455--465.
[35]
Ashish Vaswani, Samy Bengio, Eugene Brevdo, Francc ois Chollet, Aidan N. Gomez, Stephan Gouws, Llion Jones, Łukasz Kaiser, Nal Kalchbrenner, Niki Parmar, Ryan Sepassi, Noam Shazeer, and Jakob Uszkoreit. 2018, Tensor2Tensor for Neural Machine Translation, arXiv preprint, Vol. arXiv:1803.07416 (2018), http://arxiv.org/abs/1803.07416
[36]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017, Attention is All you Need. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). 5998--6008.
[37]
Denny Vrandevcić and Markus Krötzsch. 2014, Wikidata: A Free Collaborative Knowledgebase, Commun. ACM, Vol. 57 (2014), 78--85. Issue 10.
[38]
Sam Wiseman, Stuart M. Shieber, and Alexander M. Rush, 2017, Challenges in Data-to-Document Generation, CoRR, Vol. abs/1707.08052 (2017). arxiv: 1707.08052 http://arxiv.org/abs/1707.08052
[39]
Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Lukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. 2016, Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, CoRR, Vol. abs/1609.08144 (2016). arxiv: 1609.08144 http://arxiv.org/abs/1609.08144
[40]
Hengtong Zhang, Yaliang Li, Fenglong Ma, Jing Gao, and Lu Su. 2018, TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD '18). ACM, New York, NY, USA, 2729--2737.

Cited By

View all
  • (2024)Text summarization based on semantic graphs: an abstract meaning representation graph-to-text deep learning approachJournal of Big Data10.1186/s40537-024-00950-511:1Online publication date: 14-Jul-2024
  • (2024)MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization SystemsProceedings of the ACM on Software Engineering10.1145/36608201:FSE(2561-2583)Online publication date: 12-Jul-2024
  • (2024)Explaining the Unexplainable: The Impact of Misleading Explanations on Trust in Unreliable Predictions for Hardly Assessable TasksProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3627043.3659573(36-46)Online publication date: 22-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '19: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining
July 2019
3305 pages
ISBN:9781450362016
DOI:10.1145/3292500
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 July 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. factual correctness
  3. generative models
  4. metric
  5. transformers

Qualifiers

  • Research-article

Conference

KDD '19
Sponsor:

Acceptance Rates

KDD '19 Paper Acceptance Rate 110 of 1,200 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)621
  • Downloads (Last 6 weeks)79
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Text summarization based on semantic graphs: an abstract meaning representation graph-to-text deep learning approachJournal of Big Data10.1186/s40537-024-00950-511:1Online publication date: 14-Jul-2024
  • (2024)MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization SystemsProceedings of the ACM on Software Engineering10.1145/36608201:FSE(2561-2583)Online publication date: 12-Jul-2024
  • (2024)Explaining the Unexplainable: The Impact of Misleading Explanations on Trust in Unreliable Predictions for Hardly Assessable TasksProceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization10.1145/3627043.3659573(36-46)Online publication date: 22-Jun-2024
  • (2024)Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow QuestionsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642596(1-17)Online publication date: 11-May-2024
  • (2024)A Culturally Sensitive Test to Evaluate Nuanced GPT HallucinationIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33328375:6(2739-2751)Online publication date: Jun-2024
  • (2024)Learning Latent Variable for Logical Reasoning in Table-Based Fact Verification2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10651044(1-8)Online publication date: 30-Jun-2024
  • (2024)Prog-TAPAS: Enabling Table and Program Representation Consistency for Fact Verification2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650908(1-7)Online publication date: 30-Jun-2024
  • (2024)Asynchronous Event Driven Brain Teaser Using Node.js2024 5th International Conference on Electronics and Sustainable Communication Systems (ICESC)10.1109/ICESC60852.2024.10689905(113-118)Online publication date: 7-Aug-2024
  • (2024)Factual consistency evaluation of summarization in the Era of large language modelsExpert Systems with Applications10.1016/j.eswa.2024.124456254(124456)Online publication date: Nov-2024
  • (2024)A systematic literature review of deep learning-based text summarization: Techniques, input representation, training strategies, mechanisms, datasets, evaluation, and challengesExpert Systems with Applications10.1016/j.eswa.2024.124153252(124153)Online publication date: Oct-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media