skip to main content
research-article

Automatically Temporal Labeled Data Generation Using Positional Lexicon Expansion for Focus Time Estimation of News Articles

Published: 10 May 2024 Publication History

Abstract

Many facts change over time, which is a fundamental aspect of our physical environment. In the case of pandemic articles, the user is not interested in the creation date of the document but in the facts and the cause of the last pandemic. Fake news can be better combated by having a document with a temporal focus. Currently, neither the sequence of events nor the temporal focus is considered when obtaining news documents. Despite the limited number of temporal aspects in the available datasets, it is difficult to test and evaluate the temporal conclusions of the model. The goal of this work is to develop a temporal focus news article retrieval model based on co-training to advance research in semi-supervised learning. A mapping of the dataset is performed using (1) the evolving focus time of news articles and (2) the semi-supervised method based on coincidence contexts for learning low-dimensional continuous vectors for learning neural contrast embedding models generating focus time-based query in sequential news articles to facilitate temporal understanding by learning low-dimensional continuous vectors. A diverse dataset of news articles is used to evaluate the effectiveness of the proposed method. With semi-supervised learning and lexicon expansion, the result of the developed model can achieve 89%. The method performed better than previous baselines and traditional machine learning models with improvements of 12.65% and 4.7%, respectively.

References

[1]
Sina Ahmadi, Hossein Hassani, and Daban Q. Jaff. 2022. Leveraging multilingual news websites for building a kurdish parallel corpus. Trans. As. Low-Resour. Lang. Inf. Process. 21, 5 (2022), 1–11.
[2]
Ensar Emirali and M. Elif Karslıgil. 2022. Using word embeddings in detection of temporal expressions in Turkish texts. In Proceedings of the 30th Signal Processing and Communications Applications Conference (SIU’22). IEEE, 1–4.
[3]
Omar Alonso, Jannik Strötgen, Ricardo Baeza-Yates, and Michael Gertz. 2011. Temporal information retrieval: Challenges and opportunities. Twaw 11 (2011), 1–8.
[4]
Rajat Subhra Bhowmick, Isha Ganguli, Jayanta Paul, and Jaya Sil. 2021. A multimodal deep framework for derogatory social media post identification of a recognized person. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–19.
[5]
Kai Cao, Xiang Li, Weicheng Ma, and Ralph Grishman. 2018. Including new patterns to improve event extraction systems. In Proceedings of the 31st International Florida Artificial Intelligence Research Society Conference (FLAIRS’18). AAAI Press, 487–492.
[6]
Walter G. Charles. 2000. Contextual correlates of meaning. Appl. Psycholinguist. 21, 4 (2000), 505–524.
[7]
Costas Mavromatis, Prasanna Lakkur Subramanyam, Vassilis N. Ioannidis, Adesoji Adeshina, Phillip R. Howard, Tetiana Grinberg, Nagib Hakim, and George Karypis. 2022. Tempoqr: temporal question reasoning over knowledge graphs. Proceedings of the AAAI Conference on Artificial Intelligence 36, 5 (2022), 5825–5833.
[8]
Yubo Chen, Shulin Liu, Xiang Zhang, Kang Liu, and Jun Zhao. 2017. Automatically labeled data generation for large scale event extraction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 409–419.
[9]
Arkadipta De, Dibyanayan Bandyopadhyay, Baban Gain, and Asif Ekbal. 2021. A transformer-based approach to multilingual fake news detection in low-resource languages. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–20.
[10]
Shumin Deng, Ningyu Zhang, Jiaojian Kang, Yichi Zhang, Wei Zhang, and Huajun Chen. 2020. Meta-learning with dynamic-memory-based prototypical network for few-shot event detection. In Proceedings of the 13th International Conference on Web Search and Data Mining. 151–159.
[11]
Sakshi Dhall, Ashutosh Dhar Dwivedi, Saibal K. Pal, and Gautam Srivastava. 2021. Blockchain-based framework for reducing fake or vicious news spread on social media/messaging platforms. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–33.
[12]
Xiaocheng Feng, Bing Qin, and Ting Liu. 2018. A language-independent neural network for event detection. Sci. Chin. Inf. Sci. 61, 9 (2018), 1–12.
[13]
Michele Filannino. 2016. Data-driven temporal information extraction with applications in general and clinical domains. Faculty of Engineering and Physical Sciences, School of Computer Science The University of Manchester, 233 page.
[14]
Praphula Kumar Jain, Vijayalakshmi Saravanan, and Rajendra Pamula. 2021. A hybrid CNN-LSTM: A deep learning approach for consumer sentiment analysis using qualitative user-generated contents. Trans. As. Low-Resour. Lang. Inf. Process. 20, 5 (2021), 1–15.
[15]
Rachna Jain, Deepak Kumar Jain, and Nitika Sharma. 2021. Fake news classification: A quantitative research description. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–17.
[16]
Shafiq Ur Rehman Khan, Muhammd Arshad Islam, Muhammad Aleem, and Muhammad Azhar Iqbal. 2018. Temporal specificity-based text classification for information retrieval. Turk. J. Electr. Eng. Comput. Sci. 26, 6 (2018), 2915–2926.
[17]
Shafiq Ur Rehman Khan, Muhammad Arshad Islam, Muhammad Aleem, Muhammad Azhar Iqbal, and Usman Ahmed. 2018. Section-based focus time estimation of news articles. IEEE Access 6 (2018), 75452–75460.
[18]
Akshi Kumar, Christian Esposito, and Dimitrios A. Karras. 2021. Introduction to special issue on misinformation, fake news and rumor detection in low-resource languages.
[19]
Viet Dac Lai, Tuan Ngo Nguyen, and Thien Huu Nguyen. 2020. Event detection: Gate diversity and syntactic importance scores for graph convolution neural networks. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP’20). 5405–5411.
[20]
Donald Metzler, Rosie Jones, Fuchun Peng, and Ruiqiang Zhang. 2009. Improving search relevance for implicitly temporal queries. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 700–701.
[21]
Kashif Munir, Hai Zhao, and Zuchao Li. 2021. Neural unsupervised semantic role labeling. Trans. As. Low-Resour. Lang. Inf. Process. 20, 6 (2021), 1–16.
[22]
Chao Pang, Xinzhuo Jiang, Krishna S. Kalluri, Matthew Spotnitz, RuiJun Chen, Adler Perotte, and Karthik Natarajan. 2021. CEHR-BERT: Incorporating temporal information from structured EHR data to improve prediction tasks. In Machine Learning for Health. PMLR, 239–260.
[23]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.
[24]
Guy D. Rosin, Ido Guy, and Kira Radinsky. 2022. Time masking for temporal language models. In Proceedings of the 15th ACM International Conference on Web Search and Data Mining. 833–841.
[25]
Mohammadreza Samadi, Maryam Mousavian, and Saeedeh Momtazi. 2021. Persian fake news detection: Neural representation and classification at word and text levels. Trans. As. Low-Resour. Lang. Inf. Process. 21, 1 (2021), 1–11.
[26]
Taneeya Satyapanich, Francis Ferraro, and Tim Finin. 2020. Casie: Extracting cybersecurity event information from text. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 8749–8757.
[27]
Haolin Wang, Qingpeng Zhang, and Jiahu Yuan. 2017. Semantically enhanced medical information retrieval system: A tensor factorization based approach. IEEE Access 5 (2017), 7584–7593.
[28]
Jiexin Wang, Adam Jatowt, and Masatoshi Yoshikawa. 2021. Event occurrence date estimation based on multivariate time series analysis over temporal document collections. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 398–407.
[29]
Hang Yang, Yubo Chen, Kang Liu, Jun Zhao, and Taifeng Wang. 2021. Multi-sentence argument linking via an event-aware hierarchical encoder. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 3578–3582.
[30]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 1480–1489.
[31]
Jui-Feng Yeh, Wen-Yi Chen, and Mao-Chuan Su. 2015. Chinese spelling checker based on an inverted index list with a rescoring mechanism. ACM Trans. As. Low-Resour. Lang. Inf. Process. 14, 4 (2015), 1–28.
[32]
MohammadSadegh Zahedi, Abolfazl Aleahmad, Maseud Rahgozar, Farhad Oroumchian, and Arastoo Bozorgi. 2017. Time sensitive blog retrieval using temporal properties of queries. J. Inf. Sci. 43, 1 (2017), 103–121.

Cited By

View all
  • (2024)Leveraging Hybrid Adaptive Sine Cosine Algorithm with Deep Learning for Arabic Poem Meter DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3676963Online publication date: 10-Jul-2024
  • (2023)A comprehensive review on automatic detection of fake news on social mediaMultimedia Tools and Applications10.1007/s11042-023-17377-483:16(47319-47352)Online publication date: 26-Oct-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian and Low-Resource Language Information Processing
ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 5
May 2024
297 pages
EISSN:2375-4702
DOI:10.1145/3613584
  • Editor:
  • Imed Zitouni
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2024
Online AM: 19 October 2022
Accepted: 11 October 2022
Revised: 03 September 2022
Received: 03 June 2022
Published in TALLIP Volume 23, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Information retrieval
  2. temporal information retrieval
  3. focus time
  4. inverted pyramid
  5. news retrieval

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)94
  • Downloads (Last 6 weeks)10
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Leveraging Hybrid Adaptive Sine Cosine Algorithm with Deep Learning for Arabic Poem Meter DetectionACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3676963Online publication date: 10-Jul-2024
  • (2023)A comprehensive review on automatic detection of fake news on social mediaMultimedia Tools and Applications10.1007/s11042-023-17377-483:16(47319-47352)Online publication date: 26-Oct-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media