When it comes to enterprise content, sharper focus through fewer parameters gives small language models (SLMs) an edge over their larger counterparts. Generative AI agents powered by SLMs can be more accurate, more secure, and require fewer computing resources to operate. And all of those benefits can lead to fewer costs to implement and maintain.
To understand language models and how they’re used, it’s helpful to first understand the notion of generative artificial intelligence (gen AI). Gen AI is a type of deep learning AI that relies heavily on neural networks. Gen AI is “trained” to analyze existing content (also known as training data or the training corpus) and use statistical probability to generate new content in the form of text, imagery, sound, and other formats. Gen AI isn’t looking only at individual words, sounds, or images in existing content; it’s also looking for patterns and how the elements of language work together to create meaning (also known as the language model).
Gen AI analyzes what has happened before (as described in the training corpus) so that it can apply the language model to generate new content. How a gen AI tool responds to a request depends on how it’s been trained and the language model, or models, that it’s using.
Take this sentence as an example: “Bob went out to walk the __.” If we ask Microsoft CoPilot, a generative AI tool, to finish the sentence for us, CoPilot responds:
“…iguana. Why not, right? Bob could be a trendsetter in exotic pet ownership! Or perhaps there’s another creative spin you’d like to put on this—what do you think Bob’s walking?”
A different generative AI tool, trained on different content and/or using different content models, might come back with a more common suggestion:
“dog. Bob went out to walk the dog.”
The output of today’s generative AI tools is limited to the content they’ve been trained on, the language models underlying that content, and the task they’ve been asked to complete. Generative AI agents can’t come up with ideas they haven’t encountered in some way before.
In The Content Advantage, Colleen Jones explains that a large language model (LLM) is:
An important concept in generative AI is the token. A token is simply a unit of content: a word, a punctuation mark, a character, an image, etc.
Another important concept is the parameter. A parameter is a variable or setting that language models use to “understand” what each token in the training corpus “means,” and to generate a response to a request (e.g., asking Microsoft CoPilot to finish the sentence about Bob).
Small language models (SLMs) aren’t just like LLMs; they used to be LLMs. To create an SLM, developers use techniques such as distillation, pruning, and quantization to make an LLM smaller and more manageable while preserving as much of its original power as possible. Examples of SLMs and their LLM ancestors, include:
SLM | LLM Ancestor | Developer |
DistilBERT | BERT | |
GPT-4o mini | GPT-4 | OpenAI |
Gemma | Gemini | |
Haiku | Claude | Anthropic |
xGen-Sales | Agentforce | Salesforce |
Granite 3.0 | Granite | IBM |
Llama 3.2 | Llama | Meta |
Phi-3 | Phi | Microsoft |
Ministral 3B | Les Ministraux | Mistral AI |
For example, from Google’s Gemini came the SLM Gemma, and from BERT, DistilBERT. Microsoft’s Phi and Les Ministraux gave rise to Phi-3 and Ministral 3B, respectively.
The difference between an LLM and an SLM is in the number of parameters the model uses to do its job. While an LLM may have 140 billion parameters, an SLM may have fewer than 5 billion, for example.
As is the case with any effort, focus is a force multiplier. The more focused the task, the more likely it is that an SLM will be more well-suited to the job than an LLM. SLMs are ideal for powering narrowly defined tasks, such as:
Unless it’s deployed carefully, responsibly, and intentionally, generative AI has the potential to introduce significant risks to the enterprise. Their smaller size means that SLMs provide organizations with important opportunities to mitigate those risks.
For every parameter a language model has, the more expensive it is to operate. LLMs are notoriously resource intensive, requiring massive amounts of computing power. For example, while an LLM can require specialized hardware and hundreds of gigabytes (GB) of random-access memory (RAM), an SLM can run on a single computer and a few GB of memory. In one recent study that compared the BERT, DistilBERT, and TinyBERT language models, the LLM required more than 3 times as much energy as the smallest to generate similarly accurate responses.
For organizations concerned with sustainability and efficient stewardship, generative AI driven by SLMs simply costs less to operate and wastes fewer precious resources.
Their smaller size means that SLMs can be deployed in local environments, on premises, and even on devices. Most of the essential privacy and security protocols are within reach of an organization’s tech team rather than being outsourced to the cloud and subject to interference by unknown actors.
It also means that organizations can keep their assets (and those of their partners and customers) close, reducing the risk of accidentally sharing information that shouldn’t be shared.
While LLMs make generative AI powerful, they also make it unpredictable. As Colleen Jones observes in The Content Advantage:
[Generative AI] has potential to solve bigger problems at a wider scale with less human intervention over time, but it is less predictable, susceptible to biases and inaccuracies within its LLM, and requires large amounts of text and computing power.
Task-focused SLMs can be more accurate and relevant than LLMs because they don’t have to concern themselves with anything other than the domain of interest. An SLM that powers how-to guidance in setting up a smart TV, for example, doesn’t need to contemplate the history of Renaissance artists or other potentially confounding topics.
Smaller training data sets and smaller domains of concern mean that SLMs are easier to train and maintain. This is good news for organizations that need to keep up with content that changes often, quickly, or both. It also means fewer opportunities for the model to hallucinate because there are simply fewer concepts to understand.
Because SLMs analyze a smaller training corpus and have fewer parameters, their latency is lower than that of an LLM. Said another way, an SLM can be faster than an LLM because it doesn’t have to analyze everything under the sun.
Enterprises have been slow to adopt generative AI in general and SLMs in particular. But while SLMs may be small, they offer mighty transformation potential. Consider the possibility of these real-world applications:
Consider these articles delving into how organizations can start to use SLMs to unlock progress in applying generative AI.
We also cover SLMs in our AI + Enterprise Content Certification with Content Science Academy.
And as SLMs become more widely used, you can be sure the Content Science team will share more developments in their application.
Discover why + how an end-to-end approach is critical in the age of AI with this comprehensive white paper.
The much-anticipated third edition of the highly rated book by Colleen Jones is available at book retailers worldwide. Learn more!
Use this white paper to diagnose the problem so you can achieve the right solution faster.
Training for modern content roles through on-demand certifications + courses or live workshops.
Comments
We invite you to share your perspective in a constructive way. To comment, please sign in or register. Our moderating team will review all comments and may edit them for clarity. Our team also may delete comments that are off-topic or disrespectful. All postings become the property of
Content Science Review.