This is a multimodal AI wearable coded by GPT-4

Project Ring combines language and image models in an AI wearable that looks at the world through a camera and comments on it with an AI-generated voice.

The simplest way to describe Project Ring is as a wearable Google Lens with voice controls. According to developer Mina Fahmi, the project aims to "demonstrate low-friction interactions which blend physical & digital information between humans & AI."

To that end, Fahmi built a wrist-worn minicomputer with a camera and joystick that can visually analyze the environment in real-time using a Replicate image-to-text model, describe it in text, and comment on it via a ChatGPT.

The text is converted to speech using Eleven Labs' text-to-speech service, which is then transmitted to bone-conduction headphones via an Android smartphone. The headphones have a built-in microphone that allows the user to speak back to the wearable, for example, to ask questions about the environment. The user's voice is converted to text using OpenAI's Whisper so that ChatGPT can chime in with some more or less intelligent remarks. All data is processed in the Google Cloud.

Image: Midjourney prompted by THE DECODER

"Project Ring feels like having a curious friend on your shoulder - one who sees the world as you do and unobtrusively whispers thoughts in your ear," Fahmi writes.

GPT-4 writes code for the wearable, but "it wasn't easy"

Fahmi says he did all the code generation for Project Ring with GPT-4. In total, the language model generated about 750 lines of code. That includes a Python script for the Raspberry Pi, a cloud application, a website, and an Android application.

Fahmi has a background in coding, but he says that he hasn't written any code in years. He believes his project shows that it is possible, though not easy, to use GPT-4 to program complete software prototypes.

His coding background helped him get GPT-4 to make corrections in the right places or to assemble the code correctly by copying and pasting. According to Fahmi, GPT-4 occasionally lost context and needed to be realigned. The code was also unstable and neither performant nor production-ready, he said.

Despite these shortcomings, AI "may be capable of automating a large majority of coding tasks in a relatively short time period," Fahmi speculates.

Recommendation

AI in practice

Nvidia positions GR00T N1 to dominate robotics ecosystem

Fahmi works on AI and human-computer interfaces at Meta, and previously worked at CTRL-Labs, the startup Meta acquired in 2019. Meta is developing a wristband based on CTRL-Labs' technology, which can translate brainwaves into precise computer commands in real time.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

This is a multimodal AI wearable coded by GPT-4

GPT-4 writes code for the wearable, but "it wasn't easy"

Nvidia positions GR00T N1 to dominate robotics ecosystem

OpenAI's GPT-4 retires at the end of April

OpenAI's new "Orion" model reportedly shows small gains over GPT-4

Language models like GPT-4 memorize more than they reason, study finds

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

This is a multimodal AI wearable coded by GPT-4

GPT-4 writes code for the wearable, but "it wasn't easy"

Share

Bank details