Google introduces "implicit caching" in Gemini 2.5, aiming to cut developer costs by as much as 75 percent. The new feature automatically detects and stores recurring content, so repeated prompts are only processed once. According to Google, this can lead to significant savings compared to the old explicit caching method, where users had to set up their own cache. To maximize the benefits, Google recommends putting the stable part of a prompt—like system instructions—at the start, and adding user-specific input, such as questions, afterwards. Implicit caching kicks in for Gemini 2.5 Flash starting at 1,024 tokens, and for Pro versions from 2,048 tokens onwards. More details and best practices are available in the Gemini API documentation.

Ad
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.