Evaluation of the performance of GPT-3.5 and GPT-4 on the Medical Final Examination

M Rosoł, JS Gąsior, J Łaba, K Korzeniewski… - MedRxiv, 2023 - medrxiv.org
… Rao et al. reported that GPT-3.5 achieved … GPT-3.5 and GPT-4 in terms of European-based
medical final examinations. In this paper, we hence aimed to investigate the utility of GPT-3.5

Openfact at checkthat! 2023: head-to-head gpt vs. bert-a comparative study of transformers language models for the detection of check-worthy claims

M Sawiński, K Węcel, EP Księżniak… - CEUR Workshop …, 2023 - bazawiedzy.ue.poznan.pl
2023, is also fine-tuned for instruction-following and for dialogue. It brought a significant
improvement over GPT-3.5 … used for fine-tuning and the GPT-3.5 and GPT-4 models were used …

Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination

M Rosoł, JS Gąsior, J Łaba, K Korzeniewski… - Scientific Reports, 2023 - nature.com
… In this paper, we hence aimed to investigate the utility of GPT-3.5 and GPT-4 in the context
of the Polish Medical Final Examination in two language versions—Polish and English. By …

Can GPT-3.5 generate and code discharge summaries?

M Falis, AP Gema, H Dong, L Daines… - Journal of the …, 2024 - academic.oup.com
… the model version and the API version as “gpt-3.5-turbo-0613” and “2023-03-15-preview,” …
Second, we assessed the likelihood of harmful misuse to be low given the sensitive nature …

Conversational AI in education: use cases, challenges and opportunities

ΔΜ Κοσετσίδου - 2023 - dspace.uowm.gr
… 14 , on the other hand, is a series of fine-tuned GPT-3.5 models designed to follow prompt
instructions by users and give relevant replies. It enhances the human alignment capacity of …

A Systematic Literature Review on Large Language Models for Automated Program Repair

Q Zhang, C Fang, Y Xie, YX Ma, W Sun… - arXiv preprint arXiv …, 2024 - arxiv.org
… 2020 to 2023 shows a significant increase, with the number reaching 65 papers in 2023. It
… [110] conduct an empirical study to investigate the ability of CodeLlama, GPT-3.5, and GPT-…

Evaluating open-qa evaluation

C Wang, S Cheng, Q Guo, Y Yue… - Advances in …, 2024 - proceedings.neurips.cc
2023. For ChatGPT-4 and Bing Chat, the Open-QA experiments were primarily conducted
in April 2023 … can be accessed with APIs (text-davinci-003 for GPT-3.5 and gpt-3.5-turbo for …

Evaluating Large Language Models as computer programming teaching assistants

MM Pol Pujadas - 2024 - diposit.ub.edu
online data on a range of topics. The model has a contextual window of 4,000 tokens and was
trained between January 2023 and July 2023… conducted with gpt-3.5 and qwen. There are …

Zero-shot Chain-of-Thought Reasoning across Datasets and Models/Author Konstantin Hebenstreit

K Hebenstreit - 2023 - epub.jku.at
GPT-3.5-turbo e is an improved version of text-davinci-003, fine-tuned for performing best
in chat dialogue situations. This model is currently used in the free version of ChatGPT. …

Towards Optimal NLP Solutions: Analyzing GPT and LLaMA-2 Models Across Model Scale, Dataset Size, and Task Diversity

A Kumar, R Sharma, P Bedi - Engineering, Technology & Applied Science …, 2024 - etasr.com
… The series continued to evolve with studies introducing a chatbased version, GPT-3.5 [8],
in 2022 and a larger multimodal model, GPT-4 [9], in 2023. Each iteration refined and …