From the course: Introduction to Large Language Models

Unlock the full course today

Join today to access over 23,200 courses taught by industry experts.

Comparing large language models

Comparing large language models

- [Instructor] How do we even compare large language models? That's a great question. I don't think we have a perfect answer yet, but we have made great progress over the last few months. Usually we only focus on how good a model is at a task, but we don't know if that same model generates false information. So instead of just looking at one metric, a Stanford University research team proposed HELM, or the Holistic Evaluation of Language Models. With HELM, the Stanford research team worked together with the main large language model providers and they were able to benchmark the models across a variety of datasets and get a more holistic view of model performance. The helm benchmark is a living benchmark and should change as new models are released. I'll just cover the first couple of benchmarks and you can explore the rest further if you're interested. So let me just go ahead and scroll down a little bit. So here, each row…

Contents