This is a multimodal AI wearable coded by GPT-4

Project Ring combines language and image models in an AI wearable that looks at the world through a camera and comments on it with an AI-generated voice.

The simplest way to describe Project Ring is as a wearable Google Lens with voice controls. According to developer Mina Fahmi, the project aims to "demonstrate low-friction interactions which blend physical & digital information between humans & AI."

To that end, Fahmi built a wrist-worn minicomputer with a camera and joystick that can visually analyze the environment in real-time using a Replicate image-to-text model, describe it in text, and comment on it via a ChatGPT.

The text is converted to speech using Eleven Labs' text-to-speech service, which is then transmitted to bone-conduction headphones via an Android smartphone. The headphones have a built-in microphone that allows the user to speak back to the wearable, for example, to ask questions about the environment. The user's voice is converted to text using OpenAI's Whisper so that ChatGPT can chime in with some more or less intelligent remarks. All data is processed in the Google Cloud.

Image: Midjourney prompted by THE DECODER

"Project Ring feels like having a curious friend on your shoulder - one who sees the world as you do and unobtrusively whispers thoughts in your ear," Fahmi writes.

GPT-4 writes code for the wearable, but "it wasn't easy"

Fahmi says he did all the code generation for Project Ring with GPT-4. In total, the language model generated about 750 lines of code. That includes a Python script for the Raspberry Pi, a cloud application, a website, and an Android application.

Fahmi has a background in coding, but he says that he hasn't written any code in years. He believes his project shows that it is possible, though not easy, to use GPT-4 to program complete software prototypes.

His coding background helped him get GPT-4 to make corrections in the right places or to assemble the code correctly by copying and pasting. According to Fahmi, GPT-4 occasionally lost context and needed to be realigned. The code was also unstable and neither performant nor production-ready, he said.

Despite these shortcomings, AI "may be capable of automating a large majority of coding tasks in a relatively short time period," Fahmi speculates.

Recommendation

AI in practice

OpenAI's new Realtime API lets developers add realistic conversations to their apps

Fahmi works on AI and human-computer interfaces at Meta, and previously worked at CTRL-Labs, the startup Meta acquired in 2019. Meta is developing a wristband based on CTRL-Labs' technology, which can translate brainwaves into precise computer commands in real time.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

This is a multimodal AI wearable coded by GPT-4

GPT-4 writes code for the wearable, but "it wasn't easy"

OpenAI's new Realtime API lets developers add realistic conversations to their apps

OpenAI's GPT-4 retires at the end of April

OpenAI's new "Orion" model reportedly shows small gains over GPT-4

Language models like GPT-4 memorize more than they reason, study finds

New data from OpenAI and Anthropic show how people actually use ChatGPT and Claude

Leading AI chatbots are now twice as likely to spread false information as last year, study finds

Anthropic confirms technical bugs after weeks of complaints about declining Claude code quality

This is a multimodal AI wearable coded by GPT-4

GPT-4 writes code for the wearable, but "it wasn't easy"

Share

Bank details