Content
summary Summary

Meta developed a method for large language models to iteratively improve their ability to follow instructions, without relying on human annotation or distillation from more powerful models.

Meta's research proposes a new technique called "instruction backtranslation" that allows large language models like LLaMa to be fine-tuned to follow instructions without relying on expensive human annotations or distillation from more powerful models like GPT-4.

Instruction backtranslation is the self-play of instruction tuning

Instruction backtranslation is a two-step process combining self-augmentation and self-curation. In the self-augmentation phase, the language model is used to generate candidate instruction-response pairs from the unlabeled text corpus. For each unlabeled text, the model tries to predict what instruction would elicit that response. This results in a large set of synthesized examples.

The self-curation phase then uses the model to score these candidate pairs and filter out low-quality ones. The model ranks the examples and keeps only the highest-scoring subset. These steps of generating candidates and curating the best data are repeated. Each iteration produces a better model that can in turn improve the quality of the data it selects for the next round.

Ad
Ad

Through this iterative self-training process, the model learns to generate better instructions and also becomes better at discriminating high-quality demonstration examples.

Meta's Humpback model beats Anthropics Claude in instruction-following benchmarks

Metas researchers show that this approach leads to strong instruction-tracking performance, outperforming previous work using the same scale LLaMa model. The resulting model Humpback 65B achieves state-of-the-art results among non-distilled LLaMa methods on the Alpaca instruction-following benchmark, surpassing the performance of models such as Anthropics Claude, Guanaco, LIMA, and Falcon-Instruct.

In future work, the team plans to further scale this method "by considering larger unlabeled corpora, which our analysis suggests should yield further gains.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Meta's researchers have developed instruction backtranslation, which allows large language models to iteratively improve their ability to follow instructions without relying on human annotation or more powerful models such as GPT-4.
  • Instruction backtranslation involves a two-step process of self-augmentation and self-curation, generating candidate instruction-response pairs, which are then ranked and filtered for quality to produce a better model with each iteration.
  • The method's "Humpback" language model achieves state-of-the-art results on the Alpaca instruction-following benchmark, outperforming competing models like Anthropics Claude, Guanaco, LIMA, and Falcon-Instruct.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.