ChatGPT programs AR app using only natural language - ChatARKit

Dec 31, 2022

Bart Trzynadlowski

Humanity has been exploring the depths of OpenAI's ChatGPT neural network since early December. One developer got the dialog AI to spit out working AR code.

OpenAI's ChatGPT dialog AI is optimized for generating texts and answering questions. But initial tests from early December quickly showed that there's more to the system than just a few neatly worded sentences. Programming code, for example.

ChatARKit - AR app generated by ChatGPT

Developer Bart Trzynadlowski wanted to find out if he could use ChatGPT to develop an AR app that autonomously places digital 3D objects in the environment using only voice commands. He also recognizes the voice commands using an AI model - OpenAI's Whipser - and then brings them into the JavaScript environment of the ChatARKit app as an AI prompt.

As a result, ChatGPT selects 3D objects from Sketchfab that match the voice command and places them on the desktop or floor as prompted. If you prompt it, ChatGPT even scales and rotates the 3D models. The AI system generates the code for this on its own.

These are some working sample prompts according to Trzynadlowski:

"Place a cube on the nearest plane."
"Place a spinning cube on the floor."
"Place a sports car on the table and rotate it 90 degrees."
"Place a school bus on the nearest plane and make it drive back and forth along the surface."

According to Trzynadlowski, ChatGPT does not work reliably. For identical commands, the AI model generates very different output and places incorrect JavaScript code lines in the app. Occasionally, ChatGPT turns object descriptions into code identifiers, which means that the 3D models can no longer be retrieved from Sketchfab.

Trzynadlowski makes his ChatGPT AR app available for free as open source on Github.

Generate 3D objects in VR with natural language

For VR, developer Jasmine Roberts recently demonstrated an implementation of OpenAI's new 3D AI Point-E: like the image AI DALL-E 2, it can generate content based solely on text input. Instead of images, however, Point-E generates 3D point clouds that represent a 3D model. Per generation, Point-E takes only about one to two minutes on a single Nvidia V100 GPU. Roberts' demo runs in real time.

thrilled our preliminary research on #generativeai in #vr was accepted into #neurips2022
Ad

here is a bare-bones "holodeck" #3d implementation real-time in #unity. spoken user commands are compiled at runtime. 3d objects are created, loaded and manipulated via @openai's gpt-3. pic.twitter.com/4xg0FKYkEk

— jasmine roberts (@jasminezroberts) October 20, 2022

Point-E is a starting point for OpenAI for further work in text-to-3D synthesis. Google with Dreamfusion or Nvidia with Magic3D also recently introduced text-to-3D systems, which could play an important role in the further spread of 3D content - a fundamental assumption of the metaverse thesis - in the future.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

AI news without the hype
Curated by humans.

Over 20 percent launch discount.
Read without distractions – no Google ads.
Access to comments and community discussions.
Weekly AI newsletter.
6 times a year: “AI Radar” – deep dives on key AI topics.
Up to 25 % off on KI Pro online events.
Access to our full ten-year archive.
Get the latest AI news from The Decoder.

Subscribe to The Decoder

ChatGPT programs AR app using only natural language - ChatARKit

ChatARKit - AR app generated by ChatGPT

Generate 3D objects in VR with natural language

AI News Without the Hype – Curated by Humans

AI news without the hypeCurated by humans.

AI news without the hype
Curated by humans.