From the course: Using Lightweight AI with Small Language Models

What are small language models?

- First things first. What are small language models and how do they differ from the large language models we've gotten used to? The clue is in their names, which are quite literal here. Large and small refer to the parameter size for each model. Open AI's large language model, GPT-4, has 1.76 trillion parameters. While Microsoft's small language model, Phi-3 Mini, has 3.8 billion parameters. So GPT-4 has hundreds of times more parameters and is therefore orders of magnitude larger than Phi-3 Mini. Visualized like this, you can see the difference is enormous. The dot for Phi-3 Mini is so small you can hardly see it. And the dot for GPT-4 is so big it takes up more space than the screen has on offer. In plain language, this means large language models have bigger brains, and those bigger brains require more computing power, more electric power, and more time to process more complex completions, which is why they need specialized hardware and usually run in the cloud. Small language models by contrast, require less computing and electrical power and perform operations much faster, meaning they can run on edge services and even locally on devices like smartphones. Size really is a succinct and accurate way of describing the difference between these models. The number of parameters in a language model is a measure of the breadth, depth, and complexity of data and connections encoded in that model. The more parameters, the better the model is at mimicking natural language and processing completions that look to us like meaning. In other words, the bigger the parameter number, the more advanced language the system can process and produce, and the smarter the AI system seems to be. Except that's not the whole story. It turns out small language models with smaller parameters can appear just as smart, just in smaller scopes and contexts. A useful mental model here is to think of a large language model as a university library with vast breadths and depths of reference materials on hand, and a small language model as a small community library with a curated archive of the most common and some types specialized material. For specific questions and tasks, the small language model can be just as correct as the large language model, but it has less training and resources to draw from. Where the large language model is an expert generalist capable of handling a wide range of topics with depth and nuance, the small language model is more of a focused specialist adept at managing a narrower scope of subjects with efficiency and effectiveness. The smaller scope of the small language models also means they have less room for variance and therefore can be more consistent and deterministic in the responses. And for enterprise applications, this is often exactly what we're looking for. Smaller, less compute and energy demanding, faster and more consistent. When described like this, I think you can see why small language models are starting to loom large in the AI world.

Contents