AI research

Apr 21, 2025Apr 21, 2025

Go read this to learn how reinforcement learning makes LLMs better at reasoning

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.

Profile

E-Mail

AI researcher Sebastian Raschka has published a new analysis that looks at how reinforcement learning is used to improve reasoning in large language models (LRMs). In a blog post, he describes how algorithms are used in combination with training methods such as Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Verifiable Rewards (RLVR). Raschka focuses on DeepSeek-R1, a model trained using verifiable rewards instead of human labels, to explain in detail how reinforcement learning can improve problem-solving performance.

While reasoning alone isn’t a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks (so far). And I expect reasoning-focused post-training to become standard practice in future LLM pipelines.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:

Bank transfer

Sources

Sebastian Raschka

Matthias Bastian

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.

Profile

E-Mail

AI and society

Jul 18, 2025Jul 18, 2025

Meta refuses to sign EU's AI Code of Practice, citing legal uncertainty

News, tests and reports about VR, AR and MIXED Reality.

What happens next with MIXED My personal farewell to MIXED Meta and Anduril are now jointly developing XR headsets for the US military MIXED-NEWS.com

AI in practice

Jul 18, 2025Jul 18, 2025

Perplexity's valuation soared to $18 billion after its latest funding round

AI in practice

Jul 18, 2025Jul 18, 2025

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

Google News

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Go read this to learn how reinforcement learning makes LLMs better at reasoning

Meta refuses to sign EU's AI Code of Practice, citing legal uncertainty

Perplexity's valuation soared to $18 billion after its latest funding round

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Go read this to learn how reinforcement learning makes LLMs better at reasoning

Meta refuses to sign EU's AI Code of Practice, citing legal uncertainty

Perplexity's valuation soared to $18 billion after its latest funding round

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data