From the course: GPT-4: The New GPT Release and What You Need to Know

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

How was GPT-4 trained?

How was GPT-4 trained?

- [Instructor] OpenAI hasn't released much information on what is under the hood for GPT-4, what data sets were used, or how it was trained, but the GPT-4 Technical Report does reveal that the model was fine-tuned using reinforcement learning from human feedback. Most models are trained by trying to minimize some loss function. So when we look at large language models and try to determine if text that is generated is good or bad, that's not an easy quantity to measure. What if we could use feedback from people to measure how good the text generated by a large language model is and then use that as a loss to optimize a model? That's the idea behind reinforcement learning from human feedback or RLHF. It's likely that RLHF was used along with supervised learning when training GPT-4. Let's get into the details. In their 2022 paper titled "Training Language Models to Follow Instructions With Human Feedback," OpenAI…

Contents