OpenAI bans TikTok company Bytedance from ChatGPT due to possible data theft

Dec 16, 2023

Bytedance, Screenshot doubao.com

TikTok's parent company ByteDance has been suspended from ChatGPT by OpenAI after it was revealed that the company secretly used OpenAI technology to develop a competing AI model called Project Seed.

According to internal ByteDance documents leaked to The Verge editor Alex Heath, ByteDance used OpenAI's API at nearly every stage of Project Seed's development, including training and evaluating the model.

Employees were aware of the implications, and discussed on Lark, ByteDance's internal communication platform, how they could obfuscate the evidence through "data desensitization."

Using training data to train competing AI models with OpenAI's AI technology is a direct violation of OpenAI's terms of service. Bytedance had access to GPT-4 through Microsoft's Azure service, which is subject to the same rules.

Such data sourcing could help competitors get high-quality data, and thus better AI models, much faster. But it also risks spreading errors and biases in the generating model to other AI models, affecting the quality of the overall generation and training data.

OpenAI investigates possible terms of service violation by Bytedance

OpenAI spokesperson Niko Felix confirmed to Heath that Bytedance's account has been suspended and that the allegations are being investigated. Bytedance has made minimal use of the API to date, Felix said. If Bytedance's use of the API is found to be outside the rules, it will have to make changes or its account will be deleted.

Bytedance spokeswoman Jodi Seth told Heath that GPT-generated data was used to annotate the model early in Project Seed's development and that this data was removed from Bytedance's training data in the middle of the year. Bytedance is a licensed Microsoft partner and uses GPT models for products outside of China, she said.

In Project Seed, ByteDance is developing language models for the Doubao chatbot and a business chatbot that is to be commercialized as a cloud product.

The main goal of Project Seed is to become China's ChatGPT as soon as possible. The team has been tasked with achieving GPT 3.5 performance by the end of this year and GPT 4 performance by mid-2024, Heath reported.

The current seed model reportedly has 200 billion parameters. GPT-3 had 175 billion parameters, while the combined GPT-4 model is estimated to have approximately 1.8 trillion parameters. However, the number of parameters as the sole indicator of a model's performance has become less important since the release of GPT-3.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

OpenAI bans TikTok company Bytedance from ChatGPT due to possible data theft

OpenAI investigates possible terms of service violation by Bytedance

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.