A leaked document from an anonymous Google employee discusses the role of open-source AI projects and their potential impact on industry giants like Google and OpenAI.
Please note: This document likely represents a single opinion within Google. There is no indication that the content of the paper will influence Google's business strategy, and it is just one of many perspectives. Nevertheless, it is interesting to see what is being discussed within Google about open-source AI.
The document highlights the impressive achievements of the open-source community recently, and questions whether the major players can maintain a competitive edge in this fast-moving space. The document was leaked on Discord and verified by Semianalysis.
Open-source AI outpacing the big players
"We have no moat. And neither does OpenAI," is a key statement in the paper. Given the rapid progress of the open-source community, neither Google nor OpenAI could gain a sustainable competitive advantage.
The open-source AI community has made remarkable progress recently, the paper says. Large Language Models (LLMs) can now run on smartphones, and personalized AI systems can be adapted to laptops within hours or trained at impressive speeds.
These developments have been made possible by the ability of the open-source community to rapidly iterate and improve AI models, the paper says.
"The barrier to entry for training and experimentation has dropped from the total output of a major research organization to one person, an evening, and a beefy laptop."
The "Stable Diffusion Moment" of large language models
Google's models still have a small lead, the paper says, but that lead is rapidly closing. The open-source models are "pound-for-pound more capable," as well as faster, easier to customize, and more privacy-friendly.
According to the paper, an important factor contributing to this rapid innovation is low-rank adaptation (LoRA), a method for fine-tuning AI models quickly and cheaply. LoRA, combined with the LLaMA leak, has enabled a flood of ideas and iterations from individuals and institutions around the world that have quickly outpaced the major players in the field, according to the paper.
The paper stresses that large models that have to be built from scratch slow down the innovation process. Smaller models that can be iterated more quickly may ultimately be more beneficial. To improve model performance, high-quality, curated datasets are more essential than the sheer volume of data, the paper says.
Meta is the big winner of the recent open-source movement
The paper suggests that direct competition with open-source projects is not a viable long-term path for Google and OpenAI. Instead, it recommends that Google collaborates with the open-source community, learns from its innovations, and adapts its strategies.
The paper also points out that OpenAI's closed policy is at odds with the open-source movement, but that OpenAI ultimately "doesn't matter" because it's making the same mistakes as Google and will be overtaken by open-source solutions. Besides, Google would share information with OpenAI anyway due to personnel changes. Keeping secrets or competing with closed source is therefore moot.
The big winner of open-source development in recent times has been Meta, according to the paper. All the innovations of the open-source community are based on Meta's LLaMA architecture. There is nothing to stop Meta from incorporating these innovations into its products, so the company has, in effect, got a job done for free.
The paper's advice: Google should position itself as a pioneer of the open-source movement, similar to Android and Chrome, and thus take a leading position in the AI ecosystem. To achieve this, the company must be willing to relinquish control over its AI models.
"We cannot hope to both drive innovation and control it," the paper says.
From open to closed source: ChatGPT reportedly prompts strategy shift at Google
In addition to the leaked document, the Washington Post reports that Google's AI chief, Jeff Dean, announced internally in February that Google's AI researchers would no longer be allowed to share their work publicly as they had done in previous years. Research would only be shared if it was already part of a product. The reason for this change in strategy was reportedly the massive success of ChatGPT.
Since ChatGPT and GPT-4, OpenAI has published only rudimentary information about its models. Key information such as the size of the model, the exact training process, or the training data used has not been published by OpenAI for competitive reasons.
The leaked document refers specifically to progress in open-source AI since the leak of Meta's LLaMA, the first major foundational AI model. It could therefore be seen as a direct response to Dean's change in strategy. In any case, these two events illustrate the different currents within Google and also paint a picture of the complexity of the current AI landscape.