skip to main content
research-article

HindiPersonalityNet: Personality Detection in Hindi Conversational Data Using Deep Learning with Static Embedding

Published: 07 August 2024 Publication History

Abstract

Personality detection along with other behavioral and cognitive assessment can essentially explain why people act the way they do and can be useful to various online applications such as recommender systems, job screening, matchmaking, and counseling. Additionally, psychometric natural language processing relying on textual cues and distinctive markers in writing style within conversational utterances reveals signs of individual personalities. This work demonstrates a text-based deep neural model, HindiPersonalityNet, of classifying conversations into three personality categories (ambivert, extrovert, introvert) for detecting personality in Hindi conversational data. The model utilizes a gated recurrent unit with BioWordVec embeddings for text classification and is trained/tested on a novel dataset, शख्सियत (pronounced as Shakhsiyat) curated using dialogues from an Indian crime-thriller drama series, Aarya. The model achieves an F1-score of 0.701 and shows the potential for leveraging conversational data from various sources to understand and predict a person's personality traits. It exhibits the ability to capture both semantic and long-distance dependencies in conversations and establishes the effectiveness of our dataset as a benchmark for personality detection in Hindi dialogue data. Further, a comprehensive comparison of various static and dynamic word embedding is done on our standardized dataset to ascertain the most suitable embedding method for personality detection.

References

[1]
T. Yoneda, T. Lozinski, N. Turiano, T. Booth, E. K. Graham, D. Mroczek, and G. M. Terrera. 2023. The Big Five personality traits and allostatic load in middle to older adulthood: A systematic review and meta-analysis. Neuroscience & Biobehavioral Reviews 148 (2023), 105145.
[2]
V. Ong, A. D. S. Rahmanto, Williem, D. Suhartono, A. E. Nugroho, E. W. Andangsari, and M. N. Suprayogi. 2017. Personality prediction based on Twitter information in Bahasa Indonesia. In Proceedings of the 2017 Federated Conference on Computer Science and Information Systems. 367--372. DOI:
[3]
A. Kumar, R. Beniwal, and D. Jain. 2023. Personality detection using kernel-based ensemble model for leveraging social psychology in online networks. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 5 (2023), Article 151, 20 pages.
[4]
D. Jain, A. Kumar, and R. Beniwal. 2022. Personality BERT: A transformer-based model for personality detection from textual data. In Proceedings of the International Conference on Computing and Communication Networks (ICCCN’21). 515–522.
[5]
Z. Ren, Q. Shen, X. Diao, and H. Xu. 2021. A sentiment-aware deep learning approach for personality detection from text. Information Processing & Management 58, 3 (2021), 102532.
[6]
N. Cerkez, B. Vrdoljak, and S. Skansi. 2021. A method for MBTI classification based on impact of class components. IEEE Access 9 (2021), 146550–146567.
[7]
H. Shafi, A. Sikander, I. M. Jamal, J. Ahmad, and M. A. Aboamer. 2021. A machine learning approach for personality type identification using MBTI framework. Journal of Independent Studies and Research Computing 19, 2 (2021), 6–10.
[8]
M. C. Ashton and K. Lee. 2007. Empirical, theoretical, and practical advantages of the HEXACO model of personality structure. Personality and Social Psychology Review 11, 2 (2007), 150–166.
[9]
N. Aghababaei and A. Arji. 2014. Well-being and the HEXACO model of personality. Personality and Individual Differences 56 (2014), 139–142.
[10]
C. Ross, E. S. Orr, M. Sisic, J. M. Arseneault, M. G. Simmering, and R. R. Orr. 2009. Personality and motivations associated with Facebook use. Computers in Human Behavior 25, 2 (2009), 578–586.
[11]
I. J. Davidson. 2017. The ambivert: A failed attempt at a normal personality. Journal of the History of the Behavioral Sciences 53, 4 (Sept. 2017), 313–331.
[12]
D. R. Riso and R. Hudson. 2000. Understanding the Enneagram: The Practical Guide to Personality Types. Houghton Mifflin Harcourt.
[13]
I. Montag and J. Levin. 1994. The five-factor personality model in applied settings. European Journal of Personality 8, 1 (1994), 1–11.
[14]
J. P. Guilford and K. W. Braly. 1930. Extroversion and introversion. Psychological Bulletin 27, 2 (Feb. 1930), 96.
[15]
A. Kumar and V. H. C. Albuquerque. 2021. Sentiment analysis using XLM-R transformer and zero-shot transfer learning on resource-poor Indian language. Transactions on Asian and Low-Resource Language Information Processing 20, 5 (2021), 1–13.
[16]
D. Jain, A. Kumar, and G. Garg. 2020. Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN. Applied Soft Computing 91 (2020), 106198.
[17]
M. S. Salem, S. S. Ismail, and M. Aref. 2019. Personality traits for Egyptian Twitter users dataset. In Proceedings of the 2019 8th International Conference on Software and Information Engineering. 206–211.
[18]
S. Fatehi, Z. Anvarian, Y. Madani, M. Mehditabar, and S. Eetemadi. 2022. MBTI personality prediction approach on Persian Twitter. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP’22).
[19]
M. S. Anari, K. Rezaee, and A. Ahmadi. 2022. TraitLWNet: A novel predictor of personality trait by analyzing Persian handwriting based on lightweight deep convolutional neural network. Multimedia Tools and Applications 81, 8 (March 2022), 10673–10693.
[20]
G. Y. Adi, M. H. Tandio, V. Ong, and D. Suhartono. 2018.Optimization for automatic personality recognition on Twitter in Bahasa Indonesia. Procedia Computer Science 135 (Jan. 2018), 473–480.
[21]
S. N. Khan, M. Leekha, J. Shukla, and R. R. Shah. 2020. Vyaktitv: A multimodal peer-to-peer Hindi conversations-based dataset for personality assessment. In Proceedings of the 2020 IEEE 6th International Conference on Multimedia Big Data (BigMM’20). IEEE, Los Alamitos, CA, 103–111.
[22]
U. Rudra, A. N. Chy, and M. H. Seddiqui. 2020. Personality traits detection in Bangla: A benchmark dataset with comparative performance analysis of state-of-the-art methods. In Proceedings of the 2020 23rd International Conference on Computer and Information Technology (ICCIT’20). IEEE, Los Alamitos, CA, 1–6.
[23]
Y. Mehta, N. Majumder, A. Gelbukh, and E. Cambria. 2020. Recent trends in deep learning-based personality detection. Artificial Intelligence Review 53 (April 2020), 2313–2339.
[24]
R. L. Vásquez and J. Ochoa-Luna. 2021. Transformer-based approaches for personality detection using the MBTI model. In Proceedings of the 2021 XLVII Latin American Computing Conference (CLEI’21). IEEE, Los Alamitos, CA, 1–7.
[25]
J. K. Singh, G. Misra, and B. De Raad. 2013. Personality structure in the trait lexicon of Hindi, a major language spoken in India. European Journal of Personality 27, 6 (Nov. 2013), 605–620.
[26]
J. K. Singh and B. De Raad. 2017. The personality trait structure in Hindi replicated. International Journal of Personality Psychology 3 (June 2017), 26–35.
[27]
J. Pennington, R. Socher, and C. D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP’14). 1532–1543.
[28]
A. Kumar, K. Srinivasan, W. H. Cheng, and A. Y. Zomaya. 2020. Hybrid context enriched deep learning model for fine-grained sentiment analysis in textual and visual semiotic modality social data. Information Processing & Management 57, 1 (2020), 102141.
[29]
P. K. Sarma, Y. Liang, and W. A. Sethares. 2018. Domain adapted word embeddings for improved sentiment classification. arXiv preprint arXiv:1805.04576 (2018).
[30]
E. Sheehan, C. Meng, M. Tan, B. Uzkent, N. Jean, M. Burke, D. Lobell, and S. Ermon. 2019. Predicting economic development using geolocated Wikipedia articles. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2698–2706.
[31]
Y. Zhang, Q. Chen, Z. Yang, H. Lin, and Z. Lu. 2019. BioWordVec, improving biomedical word embeddings with subword information and MeSH. Scientific Data 6, 1 (2019), 52.
[32]
S. Wang, B. Tseng, and T. Hernandez-Boussard. 2021. Development and evaluation of novel ophthalmology domain-specific neural word embeddings to predict visual prognosis. International Journal of Medical Informatics 150 (2021), 104464.
[33]
Y. Wang, S. Liu, N. Afzal, M. Rastegar-Mojarad, L. Wang, F. Shen, P. Kingsbury, and H. Liu. 2018. A comparison of word embeddings for the biomedical natural language processing. Journal of Biomedical Informatics 87 (2018), 12–20.
[34]
S. Sharma and R. Daniel Jr. 2019. Bioflair: Pretrained pooled contextualized embeddings for biomedical sequence labeling tasks. arXiv preprint arXiv:1908.05760 (2019).
[35]
A. Kumar and N. Sachdeva. 2022. A Bi-GRU with attention and CapsNet hybrid model for cyberbullying detection on social media. World Wide Web 25, 4 (2022), 1537–1550.
[36]
D. K. Jain, A. Kumar, and S. R. Sangwan. 2022. TANA: The amalgam neural architecture for sarcasm detection in Indian indigenous language combining LSTM and SVM with word-emoji embeddings. Pattern Recognition Letters 160 (2022), 11–18.
[37]
S. Hu, A. Kumar, F. Al-Turjman, S. Gupta, and S. Seth. 2020. Reviewer credibility and sentiment analysis based user profile modelling for online product recommendation. IEEE Access 8 (2020), 26172–26189.
[38]
J. Ni, T. Young, V. Pandelea, F. Xue, and E. Cambria. 2023. Recent advances in deep learning based dialogue systems: A systematic survey. Artificial Intelligence Review 56, 4 (2023), 3055–3155.
[39]
J. Dodge, I. Gurevych, R. Schwartz, Schwartz, E. Strubell, and B. van Aken. 2023. Report from Dagstuhl Seminar 22232: Efficient and equitable natural language processing in the age of deep learning. In Dagstuhl Reports, J. Dodge, I. Gurevych, R. Schwartz, and E. Strubell (Eds.). Vol. 12. Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 14–27.
[40]
R. Dey and F. M. Salem. 2017. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS’17). IEEE, Los Alamitos, CA, 1597–1600.

Index Terms

  1. HindiPersonalityNet: Personality Detection in Hindi Conversational Data Using Deep Learning with Static Embedding

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Asian and Low-Resource Language Information Processing
        ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 23, Issue 8
        August 2024
        343 pages
        EISSN:2375-4702
        DOI:10.1145/3613611
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 07 August 2024
        Online AM: 29 September 2023
        Accepted: 16 September 2023
        Revised: 14 August 2023
        Received: 24 May 2023
        Published in TALLIP Volume 23, Issue 8

        Check for updates

        Author Tags

        1. Personality
        2. low resource
        3. deep learning
        4. word embeddings
        5. NLP
        6. personality psychology
        7. natural language
        8. conversational data

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 252
          Total Downloads
        • Downloads (Last 12 months)252
        • Downloads (Last 6 weeks)17
        Reflects downloads up to 13 Sep 2024

        Other Metrics

        Citations

        View Options

        Get Access

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media