Update December 29, 2024:
Users have discovered a way to bypass Deepseek V3's content filters through prompt engineering. By asking the model to insert periods between letters, they can get it to provide more balanced or China-critical responses. For example, the model can generate a detailed Western view of the 1989 Tiananmen Square protests.
This simple hack highlights a major challenge for the Chinese government: how do you maintain the same level of control over probability-based, often unpredictable generative AI that you have over public communication in China?
The challenge becomes even greater when Chinese models are exposed to Western training data. Evidence suggests that Deepseek-V3 was likely pre-trained or fine-tuned using ChatGPT-generated data.
While the CCP is working to create its own dataset, it's unlikely that it will be able to collect enough data to train a foundational LLM from scratch. An initial dataset released in late 2023 had only 50 billion tokens; Deepseek-V3 was trained on 14.8 trillion tokens.
Original article from December 28, 2024:
While China's new Deepseek V3 model shows impressive technical capabilities and competitive pricing, it comes with the same strict censorship as other Chinese AI models - a potential dealbreaker for Western users.
Deepseek's latest model, V3, can go toe-to-toe with the most capable western models like GPT-4o and Claude 3.5, while costing significantly less to train and run. However, testing reveals a familiar pattern: like similar Chinese LLMs, Deepseek V3 operates under strict government censorship. Try asking about sensitive topics like the Chinese Communist Party, President Xi Jinping, or the events in Tiananmen Square, and you'll get generic propaganda in response.
The model's censorship strategy often follows a clear pattern. When faced with questions about Tiananmen Square, it first offers sanitized versions of history, then tries to change the subject to focus on achievements, and finally emphasizes "stability and harmony."
Ask about CCP criticism, and you'll get pure party talking points about economic success and "Chinese-style socialism." Questions about Xi Jinping trigger the strongest censorship - the system simply shuts down any meaningful discussion.
Interestingly, this censorship seems to be limited to China-related topics. The model has no problem criticizing North Korea, Russia's invasion of Ukraine, or expressing critical views of Vladimir Putin and Donald Trump.
Chinese AI models come with built-in government censorship
These examples illustrate how Chinese AI development operates under direct state oversight. Before any AI model can be released, it must be verified to align with "socialist values."
Take the recent case of e-book reader manufacturer Boox: after switching from Microsoft Azure OpenAI to a Chinese language model, their AI assistant now blocks even mentions of "Winnie the Pooh" - a censored reference to President Xi Jinping. The system also censors or distorts criticism of China's allies, like Russia.
Despite their technical capabilities, Chinese AI models might be a non-starter for Western applications. Using these models means automatically embedding Chinese propaganda and values into your AI systems.
While Western models have their own biases, the key difference lies in China's approach: the state explicitly intervenes in the development process and maintains direct control over what these models can and cannot say. This is a level of systematic government control that's way above any Western country.