AI researcher Sebastian Raschka has published a new analysis that looks at how reinforcement learning is used to improve reasoning in large language models (LRMs). In a blog post, he describes how algorithms are used in combination with training methods such as Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from Verifiable Rewards (RLVR). Raschka focuses on DeepSeek-R1, a model trained using verifiable rewards instead of human labels, to explain in detail how reinforcement learning can improve problem-solving performance.

Ad

While reasoning alone isn’t a silver bullet, it reliably improves model accuracy and problem-solving capabilities on challenging tasks (so far). And I expect reasoning-focused post-training to become standard practice in future LLM pipelines.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.