Researchers train a language model from Meta with text generated by OpenAI's GPT-3.5 for less than $600 - and achieve similar performance.
Training large language models is expensive, and powerful models remain the monopoly of large technology companies - right?
Perhaps not.
Researchers at Stanford used 52,000 instruction-following demonstrations generated by OpenAI's GPT-3.5 (text-davinci-003) to fine-tune a seven-billion-parameter variant of Meta's recently announced LLaMA model.
Instruction training is one of the key techniques that make GPT-3.5 superior to the original GPT-3 model, and the training data used is proprietary to OpenAI.
While RLHF is critical for tuning models like ChatGPT or even GPT-4, the essential capabilities of the models are based on their original training - i.e., training with instructions as well.
Stanford's Alpaca trains with OpenAI output
In their work, the Stanford group used the AI-generated instructions to train Alpaca 7B, a language model that the researchers say exhibits many GPT-3.5-like behaviors. In a blind test using input from the Self-Instruct Evaluation Set both models performed comparably, the team says.
Alpaca has problems common to other language models, such as hallucinations, toxicity, and stereotyping. In particular, hallucinations occur more frequently than in the OpenAI model.
The team is releasing an interactive demo, the training dataset, and the training code. They have also asked Meta for permission to release the model. With the release, the team hopes to enable research on language models trained with instructions. To prevent misuse, they have included a content filter via the OpenAI API and a watermark in the demo.
The model cannot be used for commercial purposes. In addition to safety concerns and the non-commercial license of Meta's LLaMA model, the team points to the OpenAI GPT-3.5 terms of use, which state that the model may not be used to develop AI models that compete with OpenAI.
Alpaca's training was so cheap that OpenAI has a problem
The last point is an indication that OpenAI is aware that the output of its own models can be used as a data source for potential replicas. With the leak of the larger LLaMA models with up to 65 billion parameters, it is conceivable that such projects are already in the works - and could also use the output of GPT-4.
In addition to its impressive performance for such a small model, Alpaca also shows how affordable AI training has become: the team trained Alpaca 7B for less than $600. Larger models will be more expensive, but the expected cost should be in a range that can be easily funded by companies or crowdsourced projects.
Alignment researcher Eliezer Yudkowsky summarizes the problem this poses for companies like OpenAI:" If you allow any sufficiently wide-ranging access to your AI model, even by paid API, you're giving away your business crown jewels to competitors that can then nearly-clone your model without all the hard work you did to build up your own fine-tuning dataset."
What can OpenAI do about that? Not much, says Yudkowsky: "If you successfully enforce a restriction against commercializing an imitation trained on your I/O - a legal prospect that's never been tested, at this point - that means the competing checkpoints go up on BitTorrent."
I don't think people realize what a big deal it is that Stanford retrained a LLaMA model, into an instruction-following form, by **cheaply** fine-tuning it on inputs and outputs **from text-davinci-003**.
It means: If you allow any sufficiently wide-ranging access to your AI... https://t. co/rr5zag6C8Z
- Eliezer Yudkowsky (@ESYudkowsky) March 14, 2023
You can try Stanford's Alpaca 7B for free.