Open-weight reasoning models often use far more tokens than closed models, making them less efficient per query, according to Nous Research. Models like DeepSeek and Qwen use 1.5 to 4 times more tokens than OpenAI and Grok-4—and up to 10 times more for simple knowledge tasks. Mistral's Magistral models stand out for especially high token use.
Ad
Average tokens used per task by different AI models. | Image: Nous ResearchIn contrast, OpenAI's gpt-oss-120b, with very short reasoning paths, shows that open models can be efficient, especially for math problems. Token usage depends heavily on the type of task. Full details and charts are available at Nous Research.
Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.