Researchers improve AI logic by focusing on critical tokens

Midjourney prompted by THE DECODER

Researchers have developed a new method that improves the reasoning ability of AI language models. The key lies in identifying particularly important tokens.

Scientists at Tsinghua University and Tencent AI Lab have created a new method called "cDPO" (contrastive Direct Preference Optimization) that makes AI language models better at logical reasoning by focusing on specific words that affect their performance.

The research team found that certain words, which they call "critical tokens," have an outsized impact on how well AI systems can reason through problems. Their tests showed that changing these specific words can significantly increase the likelihood of getting correct answers. For example, when an AI encounters the word "owed," it often leads to wrong conclusions. But using alternative words like "paid" helps the AI reach more accurate results.

The token "owed" leads to incorrect conclusions and therefore incorrect answers. In contrast, alternative tokens such as "paid" significantly improve the accuracy of the conclusions. | Image: Lin, Liang, Xu et al.

cDPO automatically identifies critical tokens

The new method automatically spots these important words during the AI training process. It does this by training two different models - one that learns from correct reasoning examples and another that learns from incorrect ones. By comparing how these models handle different words, the system can identify which words are most likely to cause problems.

The difference between how these two models process specific words helps show which ones are most critical. The bigger the difference, the more likely that word is to cause reasoning errors.

cDPO beats other alignment methods

The research team tested cDPO using several AI models, including Llama-3 (8B and 70B) and deepseek-math (7B). They ran tests using the benchmarks GSM8K and MATH500. The results showed that cDPO performed better than existing methods for improving AI reasoning.

While the improvements were modest - just a few percentage points better than current best practices - the results show that identifying and managing these critical words can help make AI systems more reliable at reasoning tasks. However, this approach alone won't solve all the logical limitations that current AI language models face.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Researchers improve AI logic by focusing on critical tokens

cDPO automatically identifies critical tokens

cDPO beats other alignment methods

Apple will upgrade ChatGPT in Apple Intelligence to GPT-5 in the next software update

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

An invisible prompt in a Google Doc made ChatGPT access data from a victim’s Google Drive

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

Google Deepmind's Genie 3 creates interactive 3D worlds that stay consistent for "multiple minutes"

Google upgrades Gemini with Deep Think and flags early warning risks

Researchers improve AI logic by focusing on critical tokens

cDPO automatically identifies critical tokens

cDPO beats other alignment methods

Apple will upgrade ChatGPT in Apple Intelligence to GPT-5 in the next software update

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

An invisible prompt in a Google Doc made ChatGPT access data from a victim’s Google Drive