Ad
Skip to content

Math needs thinking time, everyday knowledge needs memory, and a new Transformer architecture aims to deliver both

A German research team lets Transformer models decide for themselves how many times they think about a problem. Combined with additional memory, the approach outperforms larger models on math problems.