OpenAI seeks large, socially relevant datasets for future AI models

OpenAI has launched a new initiative called OpenAI Data Partnerships. The goal is to build AI models that deeply understand all subjects, industries, cultures, and languages.

Big AI models learn skills and aspects of the world by interpreting the data they are trained on. To create an AGI that is safe and useful for all of humanity, AI models need a rich training dataset, OpenAI writes.

By incorporating diverse content, AI models could be better able to understand specific domains, which is crucial for their practical applications.

Data diversity is crucial

OpenAI is already working with several partners, including the Icelandic government and the non-profit Free Law Project, who are interested in representing data from their country or sector. The Free Law Project's goal is to improve access to legal knowledge.

OpenAI is particularly interested in large datasets that reflect human society and are not already easily accessible to the public. The data can be text, images, audio, or video. Of particular interest is data that expresses human intent, regardless of language, subject, or format.

There are currently two ways to work with OpenAI:

1. Open-source archive: the goal is to create an open-source language training dataset that is publicly available and can be used to train AI models. OpenAI will investigate how this dataset can be used to safely train other open-source models.

2. Private datasets: For organizations that want to keep their data private but still want AI models to better understand their domain, OpenAI prepares private datasets for training proprietary AI models, including base models and fine-tuned custom models. The company says it handles the data with the level of sensitivity and access controls desired by the partner.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

OpenAI seeks large, socially relevant datasets for future AI models

Data diversity is crucial

Meta refuses to sign EU's AI Code of Practice, citing legal uncertainty

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

OpenAI seeks large, socially relevant datasets for future AI models

Data diversity is crucial

Meta refuses to sign EU's AI Code of Practice, citing legal uncertainty

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team