skip to main content
research-article

HumourHindiNet: Humour detection in Hindi web series using word embedding and convolutional neural network

Published: 26 June 2024 Publication History

Abstract

Humour is a crucial aspect of human speech, and it is, therefore, imperative to create a system that can offer such detection. While data regarding humour in English speech is plentiful, the same cannot be said for a low-resource language like Hindi. Through this article, we introduce two multimodal datasets for humour detection in the Hindi web series. The dataset was collected from over 500 minutes of conversations amongst the characters of the Hindi web series Kota-Factory and Panchayat. Each dialogue is manually annotated as Humour or Non-Humour. Along with presenting a new Hindi language-based Humour detection dataset, we propose an improved framework for detecting humour in Hindi conversations. We start by preprocessing both datasets to obtain uniformity across the dialogues and datasets. The processed dialogues are then passed through the Skip-gram model for generating Hindi word embedding. The generated Hindi word embedding is then passed onto three convolutional neural network (CNN) architectures simultaneously, each having a different filter size for feature extraction. The extracted features are then passed through stacked Long Short-Term Memory (LSTM) layers for further processing and finally classifying the dialogues as Humour or Non-Humour. We conduct intensive experiments on both proposed Hindi datasets and evaluate several standard performance metrics. The performance of our proposed framework was also compared with several baselines and contemporary algorithms for Humour detection. The results demonstrate the effectiveness of our dataset to be used as a standard dataset for Humour detection in the Hindi web series. The proposed model yields an accuracy of 91.79 and 87.32 while an F1 score of 91.64 and 87.04 in percentage for the Kota-Factory and Panchayat datasets, respectively.

References

[1]
Dario Bertero and Pascale Fung. 2016. Deep learning of audio and language features for humor prediction. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC’16). 496–501.
[2]
Dario Bertero and Pascale Fung. 2016. A long short-term memory framework for predicting humor in dialogues. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 130–135.
[3]
Dushyant Singh Chauhan, Gopendra Vikram Singh, Navonil Majumder, Amir Zadeh, Asif Ekbal, Pushpak Bhattacharyya, Louis-philippe Morency, and Soujanya Poria. 2021. M2H2: A multimodal multiparty Hindi dataset for humor recognition in conversations. In Proceedings of the 2021 International Conference on Multimodal Interaction. 773–777.
[4]
Lei Chen and Chungmin Lee. 2017. Predicting audience’s laughter during presentations using convolutional neural network. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications. 86–90.
[5]
Peng-Yu Chen and Von-Wun Soo. 2018. Humor recognition using deep learning. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 113–117.
[6]
Luke De Oliveira and Alfredo L. Rodrigo. 2015. Humor detection in yelp reviews. Retrieved on December 15 (2015), 2019.
[7]
J. Devlin, M. W. Chang, K. Lee, and K. B. Toutanova. 2019. Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, MN: Association for Computational Linguistics, 4171–86.
[8]
Xiaochao Fan, Hongfei Lin, Liang Yang, Yufeng Diao, Chen Shen, Yonghe Chu, and Yanbo Zou. 2020. Humor detection via an internal and external neural network. Neurocomputing 394 (2020), 105–111.
[9]
Sayani Ghosal and Amita Jain. 2023. HateCircle and unsupervised hate speech detection incorporating emotion and contextual semantics. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 4 (2023), 1–28.
[10]
Md Kamrul Hasan, Wasifur Rahman, Amir Zadeh, Jianyuan Zhong, Md Iftekhar Tanveer, and Louis-Philippe Morency. 2019. UR-FUNNY: A multimodal language dataset for understanding humor. arXiv:1904.06618. Retrieved from https://arxiv.org/abs/1904.06618
[11]
Devamanyu Hazarika, Soujanya Poria, Amir Zadeh, Erik Cambria, Louis-Philippe Morency, and Roger Zimmermann. 2018. Conversational memory network for emotion recognition in dyadic dialogue videos. In Proceedings of the Conference. Association for Computational Linguistics. North American Chapter. Meeting. NIH Public Access, 2122.
[12]
Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2016. Bag of tricks for efficient text classification. arXiv:1607.01759. Retrieved from https://arxiv.org/abs/1607.01759
[13]
Sulaiman Khan, Shah Nazir, and Habib Ullah Khan. 2023. Analysis of cursive text recognition systems: A systematic literature review. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 7 (2023), 1–30.
[14]
Chloe Kiddon and Yuriy Brun. 2011. That’s what she said: Double entendre identification. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 89–94.
[15]
Dhanashree S. Kulkarni and Sunil S. Rodd. 2021. Sentiment analysis in Hindi–A survey on the state-of-the-art techniques. Transactions on Asian and Low-Resource Language Information Processing 21, 1 (2021), 1–46.
[16]
Sanjay Kumar. 2024. Negative stances detection from multilingual data streams in low-resource languages on social media using BERT and CNN-based transfer learning model. ACM Transactions on Asian and Low-Resource Language Information Processing 23, 1 (2024), 1–18.
[17]
Vijay Kumar, Ranjeet Walia, and Shivam Sharma. 2022. DeepHumor: A novel deep learning framework for humor detection. Multimedia Tools and Applications 81, 12 (2022), 16797–16812.
[18]
Paul Pu Liang, Ziyin Liu, Amir Zadeh, and Louis-Philippe Morency. 2018. Multimodal language analysis with recurrent multistage fusion. arXiv:1808.03920. Retrieved from https://arxiv.org/abs/1808.03920
[19]
Bing Liu. 2020. Text sentiment analysis based on CBOW model and deep learning in big data environment. Journal of Ambient Intelligence and Humanized Computing 11, 2 (2020), 451–458.
[20]
Yong Liu, Wei Wei, Aixin Sun, and Chunyan Miao. 2014. Exploiting geographical neighborhood characteristics for location recommendation. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 739–748.
[21]
Rada Mihalcea and Carlo Strapparava. 2005. Making computers laugh: Investigations in automatic humor recognition. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing. 531–538.
[22]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26 (2013).
[23]
Reynier Ortega-Bueno, Carlos E. Muniz-Cuza, José E. Medina Pagola, and Paolo Rosso. 2018. UO UPV: Deep linguistic humor detection in Spanish social media. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2018) Co-located with 34th Conference of the Spanish Society for Natural Language Processing (SEPLN ’18). 204–213.
[24]
Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-Philippe Morency, and Barnabás Póczos. 2019. Found in translation: Learning robust joint representations by cyclic translations between modalities. In Proceedings of the AAAI Conference on Artificial Intelligence. 6892–6899.
[25]
Alison Ross. 2005. The Language of Humour. Routledge.
[26]
Ramsha Saeed, Hammad Afzal, Sadaf Abdul Rauf, and Naima Iltaf. 2023. Detection of offensive language and ITS severity for low resource language. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 6 (2023), 1–27.
[27]
Yansen Wang, Ying Shen, Zhun Liu, Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency. 2019. Words can shift: Dynamically adjusting word representations using nonverbal behaviors. In Proceedings of the AAAI Conference on Artificial Intelligence. 7216–7223.
[28]
Haojie Xu, Weifeng Liu, Jiangwei Liu, Mingzheng Li, Yu Feng, Yasi Peng, Yunwei Shi, Xiao Sun, and Meng Wang. 2022. Hybrid multimodal fusion for humor detection. In Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge. 15–21.
[29]
Diyi Yang, Alon Lavie, Chris Dyer, and Eduard Hovy. 2015. Humor recognition and humor anchor extraction. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2367–2376.
[30]
Amir Zadeh, Paul Pu Liang, Navonil Mazumder, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Memory fusion network for multi-view sequential learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
[31]
Yftah Ziser, Elad Kravi, and David Carmel. 2020. Humor detection in product question answering systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 519–528.

Cited By

View all
  • (2024)QuMIN: quantum multi-modal data fusion for humor detectionMultimedia Tools and Applications10.1007/s11042-024-19790-9Online publication date: 12-Jul-2024

Index Terms

  1. HumourHindiNet: Humour detection in Hindi web series using word embedding and convolutional neural network

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 7
      July 2024
      254 pages
      EISSN:2375-4702
      DOI:10.1145/3613605
      • Editor:
      • Imed Zitouni
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 June 2024
      Online AM: 27 April 2024
      Accepted: 10 April 2024
      Revised: 04 April 2024
      Received: 01 January 2024
      Published in TALLIP Volume 23, Issue 7

      Check for updates

      Author Tags

      1. Convolutional Neural Network (CNN)
      2. Hindi web series
      3. humour detection
      4. Long Short-Term Memory (LSTM)
      5. low-resource languages
      6. social networks
      7. skip-gram Hindi word embedding

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)125
      • Downloads (Last 6 weeks)38
      Reflects downloads up to 14 Sep 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)QuMIN: quantum multi-modal data fusion for humor detectionMultimedia Tools and Applications10.1007/s11042-024-19790-9Online publication date: 12-Jul-2024

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media