Researchers have unveiled Janus, a novel AI system that excels at both analyzing and creating images. The model uses an innovative architecture to handle multiple types of visual tasks.
A team of researchers has developed Janus, an AI model that combines multimodal understanding and visual generation in a single system. According to the developers, Janus is characterized by its flexibility and performance, which are based on a novel approach to processing visual information.
The main feature of Janus is the decoupling of visual coding for comprehension and generation tasks. The architecture of Janus is based on an autoregressive transformer model. However, unlike comparable models, Janus uses separate encoders for different input types such as text, images for comprehension and images for generation. These encoders convert the raw data into features, which are then processed by the transformer.
According to the researchers, Janus outperforms models of similar size in several benchmarks for multimodal understanding and visual generation. In multimodal comprehension tasks, Janus even outperforms some task-specific models with significantly more parameters, with only 1.3 billion parameters.
Janus also shows strong capabilities in image generation, surpassing well-known models like DALL-E 2. While its output quality falls short of cutting-edge models like FLUX, Janus is significantly smaller and could likely improve with further scaling.
Flexibility as a key feature
The researchers highlight Janus's adaptability as a key strength. By separating visual encoding, the model can use optimized encoders for both comprehension and generation without compromises.
Janus can also be readily expanded to work with additional data types like 3D point clouds, tactile information, or EEG signals. This flexibility gives Janus the potential to become an even more capable multimodal AI system, according to the development team.
The researchers believe Janus's combination of strong performance, high adaptability, and room for expansion makes it a promising candidate for next-generation unified multimodal AI models. The Janus model and additional details are available on GitHub.