Meta developed a method for large language models to iteratively improve their ability to follow instructions, without relying on human annotation or distillation from more powerful models.
Meta's research proposes a new technique called "instruction backtranslation" that allows large language models like LLaMa to be fine-tuned to follow instructions without relying on expensive human annotations or distillation from more powerful models like GPT-4.
Instruction backtranslation is the self-play of instruction tuning
Instruction backtranslation is a two-step process combining self-augmentation and self-curation. In the self-augmentation phase, the language model is used to generate candidate instruction-response pairs from the unlabeled text corpus. For each unlabeled text, the model tries to predict what instruction would elicit that response. This results in a large set of synthesized examples.
The self-curation phase then uses the model to score these candidate pairs and filter out low-quality ones. The model ranks the examples and keeps only the highest-scoring subset. These steps of generating candidates and curating the best data are repeated. Each iteration produces a better model that can in turn improve the quality of the data it selects for the next round.
Through this iterative self-training process, the model learns to generate better instructions and also becomes better at discriminating high-quality demonstration examples.
Meta's Humpback model beats Anthropics Claude in instruction-following benchmarks
Metas researchers show that this approach leads to strong instruction-tracking performance, outperforming previous work using the same scale LLaMa model. The resulting model Humpback 65B achieves state-of-the-art results among non-distilled LLaMa methods on the Alpaca instruction-following benchmark, surpassing the performance of models such as Anthropics Claude, Guanaco, LIMA, and Falcon-Instruct.
In future work, the team plans to further scale this method "by considering larger unlabeled corpora, which our analysis suggests should yield further gains.