AI in practice
Matthias Bastian

SenseTime unveils SenseNova 5o, China's first real-time multimodal AI model to rival GPT-4o

Sensetime
SenseTime unveils SenseNova 5o, China's first real-time multimodal AI model to rival GPT-4o
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Profile
E-Mail
Content
summary Summary

Chinese AI company SenseTime introduced its new multimodal AI model SenseNova 5o and the improved language model SenseNova 5.5 at the World Artificial Intelligence Conference.

Ad

SenseTime claims that SenseNova 5o is China's first real-time multimodal model that provides multimodal AI interaction comparable to GPT-4o. It can process audio, text, image and video data, allowing users to interact with the model simply by talking to it.

SenseTime says the model is particularly well suited for real-time conversations. The company showed a demo reminiscent of OpenAI's GPT-4o demo from early May, which also showcased the model's vision capabilities. For example, SenseNova 5o can recognize and describe individual objects by simply pointing a smartphone camera at the object while the AI app is running.

Video: via Sensetime

Ad
Ad

However, while OpenAI demonstrated many other multimodal capabilities beyond speech, particularly in image generation, SenseTime did not mention these for SenseNova.

SenseTime has also updated its SenseNova language model. According to the company, the new version 5.5 achieves a 30 percent increase in performance over version 5.0, which was released just two months ago. The training data included more than ten terabytes of high-quality data, with many synthetically generated reasoning chains to improve its reasoning capabilities.

SenseTime claims that with significantly improved skills in Mathematical Reasoning (+31.5%), English (+53.8%), and Prompt Following (+26.8%), interactivity and many core indicators are on par with GPT-4o.

Image: via Sensetime

The SenseNova Large Model is currently being used by more than 3,000 government and corporate customers in industries such as technology, healthcare, finance, and programming.

SenseTime is also investing in edge-based language models that are fast and cost-effective. With SenseChat Lite-5.5, inference time has been reduced to 0.19 seconds, 40 percent faster than version 5.0, and inference speed has increased by 15 percent to 90.2 words per second.

Recommendation
AI in practice

Anthropic study reveals how malicious examples can bypass LLM safety measures at scale

The Vimi AI avatar video generator, part of SenseNova 5.5, can generate up to one-minute clips from a single photo while providing control over facial expressions, lighting and background.

Video: via Sensetime

Dr. Xu Li, CEO of SenseTime, believes 2024 will be a decisive year for large models, which will transition from unimodal to multimodal. SenseTime is focusing on increasing the interactivity of AI models. Xu Li promises "unprecedented transformations in human-AI interactions."

Founded in 2014, Hong Kong-based SenseTime is one of the best-funded Chinese AI companies. In the past, the company has made headlines primarily for its visual surveillance software that uses facial recognition.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Chinese AI company SenseTime introduced its new multimodal AI model SenseNova 5o at the World Artificial Intelligence Conference, which SenseTime claims is China's first GPT-4o-level multimodal real-time model.
  • It processes audio, text, image and video data to interact with users as if they were in a conversation. In addition, the LLM SenseNova 5.5 has been improved in key indicators such as mathematical reasoning, English and prompt following.
  • SenseTime is also investing in the development of edge-based LLMs such as SenseChat Lite-5.5 for fast and cost-effective inference, and the Vimi AI avatar video generator, which is designed to generate up to one-minute clips from a single photo with precise control.
Sources
Sensetime EN Sensetime CN Sensetime Nova
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Profile
E-Mail
AI and society

Japan's defense ministry releases first AI policy to tackle shrinking population and stay competitive

News, tests and reports about VR, AR and MIXED Reality.
Quest Games Optimizer gets many improvements in v10 New accessories for Quest 3: KIWI design introduces head strap & face pad Mare is a beautiful upcoming PSVR 2 title that you can play only with your eyes MIXED-NEWS.com
AI in practice
Update

Apple Intelligence faces setbacks in Europe and China

AI and society

China tests "Xi Jinping LLM" as a politically aligned language model

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

SenseTime unveils SenseNova 5o, China's first real-time multimodal AI model to rival GPT-4o

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI research

French AI lab Kyutai unveils conversational AI assistant Moshi, plans open-source release

AI research

Tencent researchers unleash an army of AI-generated personas for data generation

AI research

Meta's new AI can create 3D objects from text in under a minute

Google News