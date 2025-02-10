AI research
Jonathan Kemper

Hugging Face releases small language model that beats Qwen and Llama most of the time

Hugging Face
Hugging Face releases small language model that beats Qwen and Llama most of the time
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Profile
Content
summary Summary

The research team at Hugging Face has introduced SmolLM2, their latest language model. While it doesn't break new ground, it represents a solid addition to the company's AI portfolio.

Ad

The model's effectiveness comes from carefully combining different sources for its 11 trillion token dataset and using a methodical training approach. The team started with a balanced mix of web content and programming examples, later adding specialized datasets for mathematics and coding tasks.

The researchers evaluated the model's performance after each training phase to identify gaps, then adjusted the training data accordingly. They created custom datasets including FineMath for complex mathematical problems, Stack-Edu for well-documented code, and SmolTalk for conversation-related tasks.

Flussdiagramm: SmolLM2-Ökosystem mit Datenquellen (Cosmopedia, FineWeb-Edu, etc.), Modellvarianten und Instruction-tuning-Pfaden.
Hugging Face has developed its own data sets for the SmolLM2 models and makes them available as open source. | Image: Hugging Face

After initial training, the team refined SmolLM2 through instruction fine-tuning and example-based learning to improve its task comprehension. They used reinforcement learning to help the model generate more user-aligned responses.

Ad
Ad

Competitive results show promise for specific use cases

In knowledge and comprehension benchmarks, SmolLM2 performs better than similar-sized models like Qwen2.5-1.5B and Llama3.2-1B in several areas, though not across the board.

Vergleichstabelle: Leistungsdaten der Sprachmodelle SmolLM2, Llama3.2 und Qwen2.5 in verschiedenen Benchmark-Tests, Parameter 1-2B.
In many benchmarks, the hugging face model outperforms its Meta and Qwen competitors, but performs rather poorly in mathematical problem solving, for example. | Image: Hugging Face

Besides the main 1.7 billion parameter version, the team developed two smaller variants with 360 and 135 million parameters, both showing solid results for their size.

Hugging Face has become essential to open-source AI development through its extensive model weight repository. The company aims to actively advance research rather than just store data for others.

The company, backed by Google, recently released an AI agent library and created an open-source alternative to OpenAI's Deep Research. SmolLM2 uses proven approaches for efficient language models through its high-quality data mix and multi-stage training. While it matches similar models from Meta and Qwen, its practical value likely lies in handling smaller tasks on devices with limited processing power, like smartphones.

This development seems like a natural step for Hugging Face as a major AI player. Unlike Meta and Qwen, which only share model weights, Hugging Face maintains a complete open-source approach, making their training data available to everyone.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
AI research

OpenAI's stunning video generation debut Sora feels like a GPT-4 moment

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Hugging Face introduces SmolLM2, a new language model that combines carefully curated datasets and a methodical training approach to deliver solid performance in specific tasks.
  • The model's training data includes a balanced mix of web content, programming examples, and custom datasets for mathematics, coding, and conversation, with the team refining the model through instruction fine-tuning, example-based learning, and reinforcement learning.
  • SmolLM2 outperforms similar-sized models from Meta and Qwen in several knowledge and comprehension benchmarks, with smaller variants showing promising results for handling tasks on devices with limited processing power, reflecting Hugging Face's commitment to open-source AI development.
Sources
Arxiv Hugging Face
Jonathan works as a freelance tech journalist for THE DECODER, focusing on AI tools and how GenAI can be used in everyday work.
Profile
AI research

Study shows: 'Test-time compute scaling' is a path to better AI systems

News, tests and reports about VR, AR and MIXED Reality.
Students & teachers can rent Snap Spectacles 5 at half price Go and get this Quest 3 mixed reality game for free, you won't regret it Can Quest 3 headsets be used outdoors? MIXED-NEWS.com
AI in practice

Hugging Face's new dollar-per-hour service helps companies break free from third-party AI

AI in practice

Hugging Face is growing fast, with users creating new AI repositories every 10 seconds

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Hugging Face releases small language model that beats Qwen and Llama most of the time

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI and society

Study warns: creeping AI development could lead to our 'gradual disempowerment'

AI in practice
Update

OpenAI adds web search to ChatGPT free for all, and may just kill the WWW as we know it

AI in practice

OpenAI launches new reasoning model o3-mini for free ChatGPT and API

Google News