skip to main content
research-article

ChatGPT: : Jack of all trades, master of none

Published: 01 November 2023 Publication History
  • Get Citation Alerts
  • Abstract

    OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. The first contact with the chatbot reveals its ability to provide detailed and precise answers in various areas. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT’s capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated GPT-4 model on five selected subsets of NLP tasks. We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses. Our comparison of its results with available State-of-the-Art (SOTA) solutions showed that the average loss in quality of the ChatGPT model was about 25% for zero-shot and few-shot evaluation. For GPT-4 model, a loss for semantic tasks is significantly lower than for ChatGPT. We showed that the more difficult the task (lower SOTA performance), the higher the ChatGPT loss. It especially refers to pragmatic NLP problems like emotion recognition. We also tested the ability to personalize ChatGPT responses for selected subjective tasks via Random Contextual Few-Shot Personalization, and we obtained significantly better user-based predictions. Additional qualitative analysis revealed a ChatGPT bias, most likely due to the rules imposed on human trainers by OpenAI. Our results provide the basis for a fundamental discussion of whether the high quality of recent predictive NLP models can indicate a tool’s usefulness to society and how the learning and validation procedures for such systems should be established.

    Highlights

    The results of ChatGPT and GPT-4 evaluation on 25 tasks using 48k+ prompts.
    Context-awareness and personalization are valuable capabilities of ChatGPT.
    ChatGPT and GPT-4 are always worse compared to SOTA methods from 4% to over 70%.
    ChatGPT loss tends to be higher for more difficult reasoning problems.
    ChatGPT can boost AI development and change our daily lives.

    References

    [1]
    Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., u. Kaiser L., Polosukhin I., Attention is all you need, in: Guyon I., Luxburg U.V., Bengio S., Wallach H., Fergus R., Vishwanathan S., Garnett R. (Eds.), Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., 2017, pp. 6000–6010. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.
    [2]
    Ni J., Young T., Pandelea V., Xue F., Cambria E., Recent advances in deep learning based dialogue systems: a systematic survey, Artif. Intell. Rev. 56 (2023) 3055–3155,.
    [3]
    Lin T., Wang Y., Liu X., Qiu X., A survey of transformers, AI Open 3 (2022) 111–132,. https://www.sciencedirect.com/science/article/pii/S2666651022000146.
    [4]
    Johnson R., Zhang T., Supervised and semi-supervised text categorization using LSTM for region embeddings, in: Balcan M., Weinberger K.Q. (Eds.), Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June (2016) 19-24, in: JMLR Workshop and Conference Proceedings, vol. 48, 2016, pp. 526–534. JMLR.org, http://proceedings.mlr.press/v48/johnson16.html.
    [5]
    Liu W., Wang Z., Liu X., Zeng N., Liu Y., Alsaadi F.E., A survey of deep neural network architectures and their applications, Neurocomputing 234 (2017) 11–26,. https://www.sciencedirect.com/science/article/pii/S0925231216315533. Elsevier.
    [6]
    Alshemali B., Kalita J., Improving the reliability of deep neural networks in NLP: A review, Knowl.-Based Syst. 191 (2020),.
    [7]
    Liu G., Guo J., Bidirectional LSTM with attention mechanism and convolutional layer for text classification, Neurocomputing 337 (2019) 325–338,.
    [8]
    Lipton Z.C., A critical review of recurrent neural networks for sequence learning, 2015, arXiv, http://arxiv.org/abs/1506.00019. arXiv:1506.00019.
    [9]
    Gillioz A., Casas J., Mugellini E., Khaled O.A., Overview of the transformer-based models for NLP tasks, in: Ganzha M., Maciaszek L.A., Paprzycki M. (Eds.), Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, FedCSIS 2020, Sofia, Bulgaria, September (2020) 6-9, in: Annals of Computer Science and Information Systems, vol. 21, 2020, pp. 179–183,.
    [10]
    Rahman W., Hasan M.K., Lee S., Zadeh A.B., Mao C., Morency L., Hoque M.E., Integrating multimodal information in large pretrained transformers, in: Jurafsky D., Chai J., Schluter N., Tetreault J.R. (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July (2020) 5-10, Association for Computational Linguistics, 2020, pp. 2359–2369,.
    [11]
    Ganesan A.V., Matero M., Ravula A.R., Vu H., Schwartz H.A., Empirical evaluation of pre-trained transformers for human-level NLP: the role of sample size and dimensionality, in: Toutanova K., Rumshisky A., Zettlemoyer L., Hakkani-Tür D., Beltagy I., Bethard S., Cotterell R., Chakraborty T., Zhou Y. (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, June (2021) 6-11, Association for Computational Linguistics, 2021, pp. 4515–4532,.
    [12]
    Srivastava A., Rastogi A., Rao A., Shoeb A.A.M., Abid A., Fisch A., Brown A.R., Santoro A., Gupta A., Garriga-Alonso A., et al., Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2022, arXiv, https://doi.org/10.48550/arXiv.2206.04615. arXiv:2206.04615.
    [13]
    Devlin J., Chang M., Lee K., Toutanova K., BERT: pre-training of deep bidirectional transformers for language understanding, in: Burstein J., Doran C., Solorio T. (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June (2019) 2-7, Vol. 1, Long and Short Papers, Association for Computational Linguistics, 2019, pp. 4171–4186,.
    [14]
    Liu P.J., Saleh M., Pot E., Goodrich B., Sepassi R., Kaiser L., Shazeer N., Generating wikipedia by summarizing long sequences, in: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, OpenReview.net, 2018.
    [15]
    Radford A., Narasimhan K., Salimans T., Sutskever I., et al., Improving language understanding by generative pre-training, OpenAI (2018).
    [16]
    Radford A., Wu J., Child R., Luan D., Amodei D., Sutskever I., Language models are unsupervised multitask learners, OpenAI (2019).
    [17]
    Brown T., Mann B., Ryder N., Subbiah M., Kaplan J.D., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., et al., Language models are few-shot learners, in: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December (2020) 6-12, Virtual, Vol. 33, 2020, pp. 1877–1901. https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
    [18]
    Ouyang L., Wu J., Jiang X., Almeida D., Wainwright C.L., Mishkin P., Zhang C., Agarwal S., Slama K., Ray A., et al., Training language models to follow instructions with human feedback, 2022, arXiv, https://doi.org/10.48550/arXiv.2203.02155. arXiv:2203.02155.
    [19]
    OpenAI, GPT-4 technical report, 2023, arXiv:2303.08774.
    [20]
    Kocoń J., Figas A., Gruza M., Puchalska D., Kajdanowicz T., Kazienko P., Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach, Inf. Process. Manage. 58 (2021),. https://www.sciencedirect.com/science/article/pii/S0306457321001333.
    [21]
    Kanclerz K., Figas A., Gruza M., Kajdanowicz T., Kocon J., Puchalska D., Kazienko P., Controversy and conformity: from generalized to personalized aggressiveness detection, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Volume 1: Long Papers, Association for Computational Linguistics, 2021, pp. 5915–5926,. Online. https://aclanthology.org/2021.acl-long.460.
    [22]
    Kazienko P., Bielaniewicz J., Gruza M., Kanclerz K., Karanowski K., Miłkowski P., Kocoń J., Human-centered neural reasoning for subjective content processing: Hate speech, emotions, and humor, Inform. Fusion 94 (2023) 43–65,. https://www.sciencedirect.com/science/article/pii/S1566253523000167.
    [23]
    Schramowski P., Turan C., Andersen N., Rothkopf C.A., Kersting K., Large pre-trained language models contain human-like biases of what is right and wrong to do, Nat. Mach. Intell. 4 (2022) 258–268.
    [24]
    Ferrara E., Should ChatGPT be biased? challenges and risks of bias in large language models, 2023, arXiv preprint arXiv:2304.03738.
    [25]
    Susnjak T., ChatGPT: The end of online exam integrity?, 2022, arXiv, https://doi.org/10.48550/arXiv.2212.09292. arXiv:2212.09292.
    [26]
    Kung T.H., Cheatham M., Medinilla A., ChatGPT, Sillos C., De Leon L., Elepano C., Madriaga M., Aggabao R., Diaz-Candido G., et al., Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models, 2022, medRxiv 2022–12.
    [27]
    Lund B., Ting W., Chatting About ChatGPT: How May AI and GPT Impact Academia and Libraries?, Lund, BD, & Wang, 2023.
    [28]
    Antaki F., Touma S., Milad D., El-Khoury J., Duval R., Evaluating the Performance of Chatgpt in Ophthalmology: an Analysis of Its Successes and Shortcomings, Cold Spring Harbor Laboratory Press, 2023,. medRxiv. https://www.medrxiv.org/content/early/2023/01/26/2023.01.22.23284882.
    [29]
    Perlman A.M., et al., The implications of OpenAI’s assistant for legal services and society, 2022, Available at SSRN.
    [30]
    Goyal T., Li J.J., Durrett G., News summarization and evaluation in the era of GPT-3, 2022, https://arxiv.org/abs/2209.12356. https://doi.org/10.48550/ARXIV.2209.12356.
    [31]
    Zhao L., Alhoshan W., Ferrari A., Letsholo K.J., Classification of natural language processing techniques for requirements engineering, 2022, arXiv, https://doi.org/10.48550/arXiv.2204.04282. arXiv:2204.04282.
    [32]
    Ganegedara T., Natural Language Processing with TensorFlow: Teach Language to Machines Using Python’s Deep Learning Library, Packt Publishing Ltd, 2018.
    [33]
    Guo B., Zhang X., Wang Z., Jiang M., Nie J., Ding Y., Yue J., Wu Y., How close is ChatGPT to human experts? comparison corpus, evaluation, and detection, 2023, https://arxiv.org/abs/2301.07597. https://doi.org/10.48550/ARXIV.2301.07597.
    [34]
    Gilson A., Safranek C., Huang T., Socrates V., Chi L., Taylor R.A., Chartash D., How Does ChatGPT Perform on the Medical Licensing Exams? the Implications of Large Language Models for Medical Education and Knowledge Assessment, Cold Spring Harbor Laboratory Press, 2022,. medRxiv. https://www.medrxiv.org/content/early/2022/12/26/2022.12.23.22283901.
    [35]
    Wenzlaff K., Spaeth S., Smarter than humans? validating how OpenAI’s ChatGPT model explains crowdfunding, alternative finance and community finance, in: Validating How OpenAI’s ChatGPT Model Explains Crowdfunding, Alternative Finance and Community Finance.(December 22, 2022), 2022.
    [36]
    Phillips T., Saleh A., Glazewski K.D., Hmelo-Silver C.E., Mott B., Lester J.C., Exploring the use of GPT-3 as a tool for evaluating text-based collaborative discourse, in: Examining Pedagogical Data Literacy: Results of a Survey Among School Teachers at Upper Secondary Level in Switzerland, 2022, p. 54.
    [37]
    Gao C.A., Howard F.M., Markov N.S., Dyer E.C., Ramesh S., Luo Y., Pearson A.T., Comparing Scientific Abstracts Generated by ChatGPT to Original Abstracts Using an Artificial Intelligence Output Detector, Plagiarism Detector, and Blinded Human Reviewers, Cold Spring Harbor Laboratory, 2022,. bioRxiv. https://www.biorxiv.org/content/early/2022/12/27/2022.12.23.521610.
    [38]
    Aydın Ö., Karaarslan E., OpenAI ChatGPT generated literature review: Digital twin in healthcare, 2022, Available at SSRN 4308687.
    [39]
    Jeblick K., Schachtner B., Dexl J., Mittermeier A., Stüber A.T., Topalis J., Weber T., Wesp P., Sabel B., Ricke J., Ingrisch M., ChatGPT makes medicine easy to swallow: An exploratory case study on simplified radiology reports, 2022, https://doi.org/10.48550/ARXIV.2212.14882. https://arxiv.org/abs/2212.14882.
    [40]
    Chen Y., Eger S., Transformers go for the lols: Generating (humourous) titles from scientific abstracts end-to-end, 2022, arXiv, https://doi.org/10.48550/arXiv.2212.10522. arXiv:2212.10522.
    [41]
    Jiao W., Wang W., t. Huang J., Wang X., Tu Z., Is ChatGPT a good translator? a preliminary study, 2023, https://doi.org/10.48550/ARXIV.2301.08745. https://arxiv.org/abs/2301.08745.
    [42]
    Tabone W., de Winter J., Using ChatGPT for human-computer interaction research: A primer, 2023.
    [43]
    Kutela B., Msechu K., Das S., Kidando E., ChatGPT’s scientific writings: A case study on traffic safety, 2023, Available at SSRN 4329120.
    [44]
    Karanjai R., Targeted phishing campaigns using large scale language models, 2023, arXiv, https://doi.org/10.48550/arXiv.2301.00665. arXiv:2301.00665.
    [45]
    Azaria A., ChatGPT usage and limitations, 2022, https://hal.science/hal-03913837, working paper or preprint.
    [46]
    Amin M.M., Cambria E., Schuller B.W., Will affective computing emerge from foundation models and general AI? a first evaluation on ChatGPT, 2023, arXiv:2303.03186.
    [47]
    Bang Y., Cahyawijaya S., Lee N., Dai W., Su D., Wilie B., Lovenia H., Ji Z., Yu T., Chung W., Do Q.V., Xu Y., Fung P., A multitask, multilingual, multimodal evaluation of ChatGPT on reasoning, hallucination, and interactivity, 2023, arXiv:2302.04023.
    [48]
    Castillo-González W., The importance of human supervision in the use of ChatGPT as a support tool in scientific writing, Metaverse Basic Appl. Res. 2 (2022) 29.
    [49]
    Karfi I.E., Fkihi S.E., An ensemble of arabic transformer-based models for arabic sentiment analysis, Int. J. Adv. Comput. Sci. Appl. 13 (2022).
    [50]
    Liang P., Bommasani R., Lee T., Tsipras D., Soylu D., Yasunaga M., Zhang Y., Narayanan D., Wu Y., Kumar A., Newman B., Yuan B., Yan B., Zhang C., Cosgrove C., Manning C.D., Ré. C., Acosta-Navas D., Hudson D.A., Zelikman E., Durmus E., Ladhak F., Rong F., Ren H., Yao H., Wang J., Santhanam K., Orr L., Zheng L., Yuksekgonul M., Suzgun M., Kim N., Guha N., Chatterji N., Khattab O., Henderson P., Huang Q., Chi R., Xie S.M., Santurkar S., Ganguli S., Hashimoto T., Icard T., Zhang T., Chaudhary V., Wang W., Li X., Mai Y., Zhang Y., Koreeda Y., Holistic evaluation of language models, 2022, arXiv, https://doi.org/10.48550/arXiv.2211.09110. arXiv:2211.09110.
    [51]
    Zhuo T.Y., Huang Y., Chen C., Xing Z., Exploring AI ethics of ChatGPT: A diagnostic analysis, 2023, arXiv, https://doi.org/10.48550/arXiv.2301.12867. arXiv:2301.12867.
    [52]
    Wang J., Hu X., Hou W., Chen H., Zheng R., Wang Y., Yang L., Huang H., Ye W., Geng X., Jiao B., Zhang Y., Xie X., On the robustness of ChatGPT: An adversarial and out-of-distribution perspective, 2023, arXiv:2302.12095.
    [53]
    Peng B., Li C., He P., Galley M., Gao J., Instruction tuning with GPT-4, 2023, arXiv:2304.03277.
    [54]
    Nori H., King N., McKinney S.M., Carignan D., Horvitz E., Capabilities of GPT-4 on medical challenge problems, 2023, arXiv:2303.13375.
    [55]
    au2 M.B.I., Katz D.M., GPT takes the bar exam, 2022, arXiv:2212.14402.
    [56]
    Kosinski M., Theory of mind may have spontaneously emerged in large language models, 2023, arXiv:2302.02083.
    [57]
    Liu P., Yuan W., Fu J., Jiang Z., Hayashi H., Neubig G., Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing, ACM Comput. Surv. 55 (2023) 195:1–195:35,.
    [58]
    Morris C.W., Foundations of the theory of signs, in: International Encyclopedia of Unified Science, Chicago University Press, 1938, pp. 1–59.
    [59]
    Blum-Kulka S., Hamo M., Discourse pragmatics, in: Discourse Studies: A Multidisciplinary Introduction, Vol. 2, 2011, pp. 143–164.
    [60]
    Wulczyn E., Thain N., Dixon L., Ex machina: Personal attacks seen at scale, in: Barrett R., Cummings R., Agichtein E., Gabrilovich E. (Eds.), Proceedings of the 26th International Conference on World Wide Web, WWW 2017, Perth, Australia, April (2017) 3-7, ACM, 2017, pp. 1391–1399,.
    [61]
    Kivlichan I.D., Lin Z., Liu J.Z., Vasserman L., Measuring and improving model-moderator collaboration using uncertainty estimation, 2021, arXiv, https://arxiv.org/abs/2107.04212. arXiv:2107.04212.
    [62]
    Warstadt A., Singh A., Bowman S.R., Neural network acceptability judgments, Trans. Assoc. Comput. Linguist. 7 (2019) 625–641,.
    [63]
    Wang S., Fang H., Khabsa M., Mao H., Ma H., Entailment as few-shot learner, 2021, arXiv, arXiv:2104.14690.
    [64]
    Annamoradnejad I., Colbert: Using BERT sentence embedding for humor detection, 2020, arXiv, https://arxiv.org/abs/2004.12765. arXiv:2004.12765.
    [65]
    Siddiqui R., SARCASMANIA: Sarcasm exposed!, 2019, Online. http://www.kaggle.com/rmsharks4/sarcasmania-dataset. (Accessed 02 February 2023).
    [66]
    Kumar P., Sarin G., WELMSD - word embedding and language model based sarcasm detection, Online Inf. Rev. 46 (2022) 1242–1256,.
    [67]
    Hidalgo J.M.G., Almeida T.A., Yamakami A., On the validity of a new SMS spam collection, in: 11th International Conference on Machine Learning and Applications, ICMLA, Boca Raton, FL, USA, December (2012) 12-15, Vol. 2, IEEE, 2012, pp. 240–245,.
    [68]
    Sahmoud T., Mikki M., Spam detection using BERT, 2022, arXiv, https://doi.org/10.48550/arXiv.2206.02443. arXiv:2206.02443.
    [69]
    Pilehvar M.T., Camacho-Collados J., Wic: the word-in-context dataset for evaluating context-sensitive meaning representations, in: Burstein J., Doran C., Solorio T. (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June (2019) 2-7, Vol. 1, Long and Short Papers, Association for Computational Linguistics, 2019, pp. 1267–1273,.
    [70]
    Zoph B., Bello I., Kumar S., Du N., Huang Y., Dean J., Shazeer N., Fedus W., St-moe: Designing stable and transferable sparse expert models, 2022, https://arxiv.org/abs/2202.08906. https://doi.org/10.48550/ARXIV.2202.08906.
    [71]
    Wang A., Pruksachatkun Y., Nangia N., Singh A., Michael J., Hill F., Levy O., Bowman S., Superglue: A stickier benchmark for general-purpose language understanding systems, in: Wallach H., Larochelle H., Beygelzimer A., d’Alche-Buć F., Fox E., Garnett R. (Eds.), Advances in Neural Information Processing Systems, Vol. 32, Curran Associates, Inc, 2019, https://proceedings.neurips.cc/paper_files/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf.
    [72]
    Wang A., Singh A., Michael J., Hill F., Levy O., Bowman S., GLUE: A multi-task benchmark and analysis platform for natural language understanding, in: Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics, Brussels, Belgium, 2018, pp. 353–355,. https://aclanthology.org/W18-5446.
    [73]
    Patra B., Singhal S., Huang S., Chi Z., Dong L., Wei F., Chaudhary V., Song X., Beyond English-centric bitexts for better multilingual language representation learning, 2022, arXiv, https://doi.org/10.48550/arXiv.2210.14867. arXiv:2210.14867.
    [74]
    Rajpurkar P., Jia R., Liang P., Know what you don’t know: Unanswerable questions for squad, 2018, arXiv, http://arxiv.org/abs/1806.03822. arXiv:1806.03822.
    [75]
    He P., Gao J., Chen W., Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing, 2021, arXiv, https://arxiv.org/abs/2111.09543. arXiv:2111.09543.
    [76]
    Cobbe K., Kosaraju V., Bavarian M., Hilton J., Nakano R., Hesse C., Schulman J., Training verifiers to solve math word problems, 2021, arXiv, https://arxiv.org/abs/2110.14168. arXiv:2110.14168.
    [77]
    Li Y., Lin Z., Zhang S., Fu Q., Chen B., Lou J., Chen W., On the advance of making language models better reasoners, 2022, arXiv, https://doi.org/10.48550/arXiv.2206.02336. arXiv:2206.02336.
    [78]
    Demszky D., Movshovitz-Attias D., Ko J., Cowen A.S., Nemade G., Ravi S., Goemotions: A dataset of fine-grained emotions, in: Jurafsky D., Chai J., Schluter N., Tetreault J.R. (Eds.), Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July (2020) 5-10, Association for Computational Linguistics, 2020, pp. 4040–4054,.
    [79]
    Ngo A., Candri A., Ferdinan T., Kocon J., Korczynski W., StudEmo: A non-aggregated review dataset for personalized emotion recognition, in: Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, European Language Resources Association, Marseille, France, 2022, pp. 46–55. https://aclanthology.org/2022.nlperspectives-1.7.
    [80]
    Price I., Gifford-Moore J., Flemming J., Musker S., Roichman M., Sylvain G., Thain N., Dixon L., Sorensen J., Six attributes of unhealthy conversations, in: Proceedings of the Fourth Workshop on Online Abuse and Harms, Association for Computational Linguistics, 2020, pp. 114–124,. Online. https://aclanthology.org/2020.alw-1.15.
    [81]
    Kocon J., Milkowski P., Zasko-Zielinska M., Multi-level sentiment analysis of polemo 2.0: Extended corpus of multi-domain consumer reviews, in: Bansal M., Villavicencio A. (Eds.), Proceedings of the 23rd Conference on Computational Natural Language Learning, CoNLL 2019, Hong Kong, China, November (2019) 3-4, Association for Computational Linguistics, 2019, pp. 980–991,.
    [82]
    Barbieri F., Camacho-Collados J., Espinosa Anke L., Neves L., TweetEval: Unified benchmark and comparative evaluation for tweet classification, in: Findings of the Association for Computational Linguistics: EMNLP 2020, Association for Computational Linguistics, Online, 2020, pp. 1644–1650,. https://aclanthology.org/2020.findings-emnlp.148.
    [83]
    Loureiro D., Barbieri F., Neves L., Espinosa Anke L., Camacho-collados J., TimeLMs: Diachronic language models from Twitter, in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics, Dublin, Ireland, 2022, pp. 251–260,. https://aclanthology.org/2022.acl-demo.25.
    [84]
    Xu Y., Liu J., Gao J., Shen Y., Liu X., Towards human-level machine reading comprehension: reasoning and inference with multiple strategies, 2017, arXiv, http://arxiv.org/abs/1711.04964. arXiv:1711.04964.
    [85]
    Puerto H., Sahin G.G., Gurevych I., Metaqa: Combining expert agents for multi-skill question answering, 2021, arXiv, https://arxiv.org/abs/2112.01922. arXiv:2112.01922.
    [86]
    Raganato A., Camacho-Collados J., Navigli R., Word sense disambiguation: A unified evaluation framework and empirical comparison, in: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, Association for Computational Linguistics, Valencia, Spain, 2017, pp. 99–110. https://aclanthology.org/E17-1010.
    [87]
    Barba E., Procopio L., Navigli R., Consec: Word sense disambiguation as continuous sense comprehension, in: Moens M., Huang X., Specia L., Yih S.W. (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Association for Computational Linguistics, 2021, pp. 1492–1503,.
    [88]
    Firth J., A Synopsis of Linguistic Theory 1930-1955, Philological Society, Oxford, 1957, Reprinted in Palmer, F. (ed. 1968) Selected Papers of J. R. Firth, Longman, Harlow.
    [89]
    Levesque H.J., Davis E., Morgenstern L., The winograd schema challenge, in: Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning, KR’12, AAAI Press, Rome, Italy, 2012, pp. 552–561. https://cs.nyu.edu/faculty/davise/papers/WSKR2012.pdf.
    [90]
    Edmonds P., Cotton S., SENSEVAL-2: Overview, in: Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems, Association for Computational Linguistics, Toulouse, France, 2001, pp. 1–5. https://aclanthology.org/S01-1001.
    [91]
    Snyder B., Palmer M., The English all-words task, in: Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, Association for Computational Linguistics, Barcelona, Spain, 2004, pp. 41–43. https://aclanthology.org/W04-0811.
    [92]
    Pradhan S., Loper E., Dligach D., Palmer M., SemEval-2007 task-17: English lexical sample, SRL and all words, in: Proceedings of the Fourth International Workshop on Semantic Evaluations, SemEVal-2007, Association for Computational Linguistics, Prague, Czech Republic, 2007, pp. 87–92. https://aclanthology.org/S07-1016.
    [93]
    Navigli R., Jurgens D., Vannella D., SemEval-2013 task 12: Multilingual word sense disambiguation, in: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation, SemEVal 2013, Association for Computational Linguistics, Atlanta, Georgia, USA, 2013, pp. 222–231. https://aclanthology.org/S13-2040.
    [94]
    Moro A., Navigli R., SemEval-2015 task 13: Multilingual all-words sense disambiguation and entity linking, in: Proceedings of the 9th International Workshop on Semantic Evaluation, SemEVal 2015, Association for Computational Linguistics, Denver, Colorado, 2015, pp. 288–297,. https://aclanthology.org/S15-2049.
    [95]
    Fellbaum C., Wordnet: An electronic lexical database, Comput. Linguist. (1998) 292–296.
    [96]
    Kocoń J., Gruza M., Bielaniewicz J., Grimling D., Kanclerz K., Miłkowski P., Kazienko P., Learning personal human biases and representations for subjective tasks in natural language processing, in: 2021 IEEE International Conference on Data Mining, ICDM, IEEE, 2021, pp. 1168–1173,.
    [97]
    Bielaniewicz J., Kanclerz K., Miłkowski P., Gruza M., Karanowski K., Kazienko P., Kocoń J., Deep-sheep: Sense of humor extraction from embeddings in the personalized context, in: 2022 IEEE International Conference on Data Mining Workshops, ICDMW, IEEE, 2022, pp. 967–974.
    [98]
    Kanclerz K., Gruza M., Karanowski K., Bielaniewicz J., Milkowski P., Kocon J., Kazienko P., What if ground truth is subjective? personalized deep neural hate speech detection, in: Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022, European Language Resources Association, Marseille, France, 2022, pp. 37–45. https://aclanthology.org/2022.nlperspectives-1.6.
    [99]
    Miłkowski P., Saganowski S., Gruza M., Kazienko P., Piasecki M., Kocoń J., Multitask personalized recognition of emotions evoked by textual content, in: 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and Other Affiliated Events, PerCom Workshops, IEEE, 2022, pp. 347–352.
    [100]
    Milkowski P., Gruza M., Kanclerz K., Kazienko P., Grimling D., Kocon J., Personal bias in prediction of emotions elicited by textual opinions, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing: Student Research Workshop, Association for Computational Linguistics, Online, 2021, pp. 248–259,. https://aclanthology.org/2021.acl-srw.26.
    [101]
    Gao T., Fisch A., Chen D., Making pre-trained language models better few-shot learners, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Volume 1: Long Papers, Association for Computational Linguistics, Online, 2021, pp. 3816–3830,. https://aclanthology.org/2021.acl-long.295.
    [102]
    White J., Fu Q., Hays S., Sandborn M., Olea C., Gilbert H., Elnashar A., Spencer-Smith J., Schmidt D.C., A prompt pattern catalog to enhance prompt engineering with ChatGPT, 2023, arXiv:2302.11382.
    [103]
    Hagendorff T., The ethics of AI ethics: An evaluation of guidelines, Minds Mach. 30 (2020) 99–120,.
    [104]
    Borji A., A categorical archive of ChatGPT failures, 2023, arXiv, https://doi.org/10.48550/arXiv.2302.03494. arXiv:2302.03494.
    [105]
    Korczyński W., Kocoń J., Compression methods for transformers in multidomain sentiment analysis, in: 2022 IEEE International Conference on Data Mining Workshops, ICDMW, IEEE, 2022, pp. 419–426.

    Cited By

    View all
    • (2024)Faceless Adversary, Feckless Colleague: The Many Sides of ChatGPTProceedings of the 26th Western Canadian Conference on Computing Education10.1145/3660650.3660656(1-6)Online publication date: 2-May-2024
    • (2024)An Assessment of ML-based Sentiment Analysis for Intelligent Web FilteringProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3652039(80-87)Online publication date: 26-Jun-2024
    • (2024)Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI CollaborationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645198(370-384)Online publication date: 18-Mar-2024
    • Show More Cited By

    Index Terms

    1. ChatGPT: Jack of all trades, master of none
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Please enable JavaScript to view thecomments powered by Disqus.

            Information & Contributors

            Information

            Published In

            cover image Information Fusion
            Information Fusion  Volume 99, Issue C
            Nov 2023
            688 pages

            Publisher

            Elsevier Science Publishers B. V.

            Netherlands

            Publication History

            Published: 01 November 2023

            Author Tags

            1. ChatGPT
            2. GPT-4
            3. Natural language processing (NLP)
            4. Semantic NLP tasks
            5. Pragmatic NLP tasks
            6. Subjective NLP tasks
            7. Natural language inference (NLI)
            8. Sentiment analysis
            9. Offensive content
            10. Emotion recognition
            11. Humor detection
            12. Stance detection
            13. Word sense disambiguation (WSD)
            14. Question answering (QA)
            15. Model personalization
            16. Text classification
            17. SOTA analysis
            18. Large language model
            19. Prompting

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 14 Aug 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Faceless Adversary, Feckless Colleague: The Many Sides of ChatGPTProceedings of the 26th Western Canadian Conference on Computing Education10.1145/3660650.3660656(1-6)Online publication date: 2-May-2024
            • (2024)An Assessment of ML-based Sentiment Analysis for Intelligent Web FilteringProceedings of the 17th International Conference on PErvasive Technologies Related to Assistive Environments10.1145/3652037.3652039(80-87)Online publication date: 26-Jun-2024
            • (2024)Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI CollaborationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645198(370-384)Online publication date: 18-Mar-2024
            • (2024)Shedding Light on Software Engineering-specific Metaphors and IdiomsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639585(1-13)Online publication date: 20-May-2024
            • (2024)Uncovering the Causes of Emotions in Software Developer Communication Using Zero-shot LLMsProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3639223(1-13)Online publication date: 20-May-2024
            • (2024)Automated Claim Matching with Large Language Models: Empowering Fact-Checkers in the Fight Against MisinformationCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3651910(1441-1449)Online publication date: 13-May-2024
            • (2024)FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMsCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3651504(883-886)Online publication date: 13-May-2024
            • (2024)TraQuLA: Transparent Question Answering Over RDF Through Linguistic AnalysisWeb Engineering10.1007/978-3-031-62362-2_2(19-33)Online publication date: 17-Jun-2024
            • (2024)The Appeal, Efficacy, and Ethics of Using Text- and Video-Generating AI in the Learning Process of College Students: Predictive Insights and Student PerceptionsSocial Computing and Social Media10.1007/978-3-031-61305-0_2(23-42)Online publication date: 29-Jun-2024
            • (2024)WisCompanion: Integrating the Socratic Method with ChatGPT-Based AI for Enhanced Explainability in Emotional Support for Older AdultsArtificial Intelligence in HCI10.1007/978-3-031-60606-9_11(179-198)Online publication date: 29-Jun-2024
            • Show More Cited By

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media