This company tries to solve community moderation with a "data engine for AI"

Dataloop VP of Product Shlomi Avigdor explains how data engines play a part in scaling and deploying AI to solve the content moderation problem in online social experiences.

Keeping discussions safe in online communities is a challenge. That’s already putting it mildly when it comes to more traditional text-based social media sites. When it comes to platforms that use live voice, like online games, the task can become nearly impossible. At least, for humans. Dataloop offers a solution that could help companies use AI to monitor and manage their online environments.

Data engines and AI models

Dataloop isn’t a complete AI product in itself. Rather, the company describes its product as a “data engine for AI”. It provides companies with tools that make training AI models possible, including storage and annotation resources and automation pipelines that make integrating AI into an existing process easier.

“Companies are accumulating data for their industries, and they have trouble storing them while monitoring their context,” Dataloop VP of Product Shlomi Avigdor told THE DECODER. “In between arranging and searching their data through AI development, that’s where Dataloop comes in.”

According to Avigdor, AI alone can’t solve the moderation problem anyway because most words or phrases aren’t problematic in themselves. They’re problematic in context. Dataloop’s tools can work with an AI that is learning to recognize problematic words or phrases to provide an annotated snippet to a human moderator, who can then make the final decision.

“The content that you actually need is a problem in itself,” said Avigdor. “If you have 50,000 audio files in one minute, can you really go through all of that? … Dataloop has that in its platform.”

Where humans come into the loop

Moderation with AI alone might already be possible, but it won’t already be good. We have already learned this from social media companies that use AI content moderation strategies initially but offer a human appeal process. A lot of content that should be flagged isn’t, and a lot of content that shouldn’t be flagged is.

Using a data engine fed by and feeding into AI but maintaining the human middleman prevents these issues but also helps to improve the AI. Operators can tell the AI model when its contributions were helpful and when they weren’t.

“What happens in some cases is the public model doesn’t do so well at the beginning, but it gets better and better over time, and you can get to 90 percent accuracy very quickly,” said Avigdor. “Refining that to 99 percent, that’s the difficult part. That’s where the humans come into the loop.”

Recommendation

AI in practice

OpenAI unveils o3, its most advanced reasoning model yet

Building an AI moderation tool from scratch or based on a public model can also help a company develop more useful models for its specific use case and geographic markets. The more localized the source of the training data, the better the AI will become at recognizing regional cues – which is particularly important when talking about moderating audio content.

“You can develop a great model for content moderation in the U.S. Take it to Japan, and it’s useless,” said Avigdor. “What we designed is a solution to really speed up this process.”

A lot of conversation in the AI space has to do with how AI will replace humans. At least for most tasks right now, the conversation really should be about how AI will help humans do a better job of things that need to be done. Dataloop’s steps toward AI-powered community moderation are a prime example.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

This company tries to solve community moderation with a "data engine for AI"

Data engines and AI models

Where humans come into the loop

OpenAI unveils o3, its most advanced reasoning model yet

Google brings Gemini for Education and Gemini in Classroom AI tools to schools

After Meta's recruiting push, OpenAI tries to retain talent

LLM search optimization seems to mirror strategies used in classic SEO, study finds

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

This company tries to solve community moderation with a "data engine for AI"

Data engines and AI models

Where humans come into the loop

Share

Bank details