For the past three years or so, we’ve seen remarkable advances in Artificial Intelligence for text and images. What is the next stage?
In addition to OpenAI’s stunning image AI DALL-E 2, Midjourney is also currently making a splash. The Discord-based AI system does not achieve the consistency and detail of DALL-E 2, and certainly not its ability for photorealism. It can, however, produce appealing artistic motifs, for which the model is optimized. Midjourney has been available in open beta for a few days.
AI for everyone? “There aren’t enough computers”
David Holz (founder of Leap Motion, now Ultraleap) is CEO of Midjourney, which Holz says currently has several 100,000 customers generating millions of images a day on about 10,000 servers. Despite this enormous project size, Midjourney currently has only about ten employees.
About one million users are active on Midjourney’s Discord server alone. This community is part of the concept – in a collective you are more creative and can inspire each other, explains Holz.
According to Holz, “thousands of trillions of operations” are required per image, a computational effort unprecedented for an online service. A training run for the image AI costs around $50,000, according to Holz, and multiple (“3 to 20”) runs are needed per training process until the model is ready.
Despite these high computing costs, Midjourney is already profitable, according to Holz. The start-up is independently financed; investors are not involved.
If ten million people wanted to use a technology like Midjourney, however, “there actually aren’t enough computers,” Holz says. “There aren’t a million free servers to do AI in the world. I think the world will run out of computers before the technology actually gets to everybody who wants to use it.”
AI real-time content coming soon – but will be expensive
Holz expects AI-generated media to continue making rapid progress. In two years, it should be possible to generate content in real-time at 30 frames per second in high resolution.
“It’ll be expensive, but it’ll be possible,” Holz says. The first step in that direction is Apple’s recently unveiled GAUDI AI, which generates interactive 3D scenes from sentences. In ten years, there will be an Xbox with an AI processor that dreams all games in real-time, Holz believes.
“From a raw technology standpoint, those are just kind of facts, and there’s no way to get around that. But from a human standpoint, what the hell does that mean? All the games are dreams, and everything is malleable, and we’re going to have AR headsets” — what the hell does that mean? So the humanistic element of that is quite unfathomable,” Holz says.
The software for this AI future, which Holz says is still “completely off the map” today, is one of the main focuses of his startup.