OpenAI bans TikTok company Bytedance from ChatGPT due to possible data theft
Key Points
- ByteDance, the parent company of TikTok, was suspended from ChatGPT by OpenAI for allegedly secretly using OpenAI technology to develop a competing AI model called Project Seed, in violation of its terms of service.
- According to internal documents, ByteDance used OpenAI's API at nearly every stage of Project Seed's development, and employees discussed how to obscure evidence through "data desensitization."
- OpenAI has suspended ByteDance's account and is investigating the allegations. ByteDance states that GPT-generated data was only used at the beginning of Project Seed's development and was later removed.
TikTok's parent company ByteDance has been suspended from ChatGPT by OpenAI after it was revealed that the company secretly used OpenAI technology to develop a competing AI model called Project Seed.
According to internal ByteDance documents leaked to The Verge editor Alex Heath, ByteDance used OpenAI's API at nearly every stage of Project Seed's development, including training and evaluating the model.
Employees were aware of the implications, and discussed on Lark, ByteDance's internal communication platform, how they could obfuscate the evidence through "data desensitization."
Using training data to train competing AI models with OpenAI's AI technology is a direct violation of OpenAI's terms of service. Bytedance had access to GPT-4 through Microsoft's Azure service, which is subject to the same rules.
Such data sourcing could help competitors get high-quality data, and thus better AI models, much faster. But it also risks spreading errors and biases in the generating model to other AI models, affecting the quality of the overall generation and training data.
OpenAI investigates possible terms of service violation by Bytedance
OpenAI spokesperson Niko Felix confirmed to Heath that Bytedance's account has been suspended and that the allegations are being investigated. Bytedance has made minimal use of the API to date, Felix said. If Bytedance's use of the API is found to be outside the rules, it will have to make changes or its account will be deleted.
Bytedance spokeswoman Jodi Seth told Heath that GPT-generated data was used to annotate the model early in Project Seed's development and that this data was removed from Bytedance's training data in the middle of the year. Bytedance is a licensed Microsoft partner and uses GPT models for products outside of China, she said.
In Project Seed, ByteDance is developing language models for the Doubao chatbot and a business chatbot that is to be commercialized as a cloud product.
The main goal of Project Seed is to become China's ChatGPT as soon as possible. The team has been tasked with achieving GPT 3.5 performance by the end of this year and GPT 4 performance by mid-2024, Heath reported.
The current seed model reportedly has 200 billion parameters. GPT-3 had 175 billion parameters, while the combined GPT-4 model is estimated to have approximately 1.8 trillion parameters. However, the number of parameters as the sole indicator of a model's performance has become less important since the release of GPT-3.
AI News Without the Hype – Curated by Humans
As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.
Subscribe now