Content
summary Summary

TikTok's parent company ByteDance has been suspended from ChatGPT by OpenAI after it was revealed that the company secretly used OpenAI technology to develop a competing AI model called Project Seed.

According to internal ByteDance documents leaked to The Verge editor Alex Heath, ByteDance used OpenAI's API at nearly every stage of Project Seed's development, including training and evaluating the model.

Employees were aware of the implications, and discussed on Lark, ByteDance's internal communication platform, how they could obfuscate the evidence through "data desensitization."

Using training data to train competing AI models with OpenAI's AI technology is a direct violation of OpenAI's terms of service. Bytedance had access to GPT-4 through Microsoft's Azure service, which is subject to the same rules.

Ad
Ad

Such data sourcing could help competitors get high-quality data, and thus better AI models, much faster. But it also risks spreading errors and biases in the generating model to other AI models, affecting the quality of the overall generation and training data.

OpenAI investigates possible terms of service violation by Bytedance

OpenAI spokesperson Niko Felix confirmed to Heath that Bytedance's account has been suspended and that the allegations are being investigated. Bytedance has made minimal use of the API to date, Felix said. If Bytedance's use of the API is found to be outside the rules, it will have to make changes or its account will be deleted.

Bytedance spokeswoman Jodi Seth told Heath that GPT-generated data was used to annotate the model early in Project Seed's development and that this data was removed from Bytedance's training data in the middle of the year. Bytedance is a licensed Microsoft partner and uses GPT models for products outside of China, she said.

In Project Seed, ByteDance is developing language models for the Doubao chatbot and a business chatbot that is to be commercialized as a cloud product.

The main goal of Project Seed is to become China's ChatGPT as soon as possible. The team has been tasked with achieving GPT 3.5 performance by the end of this year and GPT 4 performance by mid-2024, Heath reported.

Recommendation

The current seed model reportedly has 200 billion parameters. GPT-3 had 175 billion parameters, while the combined GPT-4 model is estimated to have approximately 1.8 trillion parameters. However, the number of parameters as the sole indicator of a model's performance has become less important since the release of GPT-3.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • ByteDance, the parent company of TikTok, was suspended from ChatGPT by OpenAI for allegedly secretly using OpenAI technology to develop a competing AI model called Project Seed, in violation of its terms of service.
  • According to internal documents, ByteDance used OpenAI's API at nearly every stage of Project Seed's development, and employees discussed how to obscure evidence through "data desensitization."
  • OpenAI has suspended ByteDance's account and is investigating the allegations. ByteDance states that GPT-generated data was only used at the beginning of Project Seed's development and was later removed.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.