Amazon's new Nova AI models for text, image, and video aim to balance cost and performance

Dec 3, 2024

Amazon

Key Points

Amazon has introduced the Nova family of basic AI models for text, image, and video analysis, designed to optimize the price/performance ratio.
The most powerful model, Nova Pro, keeps up with the competition in benchmarks but does not set new standards. A more powerful model is planned for early 2025 and could compete with OpenAI's o1.
By developing its own models, Amazon aims to close the strategic gap with cloud competitors Microsoft and Google. The Nova models are initially available exclusively in three US regions via Amazon Web Services.

Amazon announced its Nova family of AI models today for generating and analyzing text, images, and videos. While testing shows the top-tier Nova Pro model keeps up with competitors in several areas, it doesn't push any new boundaries in what AI can do.

Amazon says its Nova models balance cost and performance as it competes with OpenAI and Google. The lineup includes two types of models: "understanding models" that process text, images, or videos to generate text responses, and creative models that turn text and image inputs into new images or videos.

From the basics to multimedia

The entry-level Nova Micro handles text only, focusing on speed and low cost. It processes up to 128,000 tokens at a time and can summarize text, translate languages, solve basic math problems, and generate code.

Nova Lite and Nova Pro can analyze text, images, and video without audio. Nova Pro processes up to 300,000 tokens and integrates with APIs and external tools for complex tasks. These models perform similarly to competing systems in visual and agent-based testing according to Amazon benchmarks.

Comparison chart: Six visual AI skills from five AI models with percentages, Nova Pro consistently performs well. — Nova Pro is on a par with the other large multimodal models. | Image: via Amazon

A fourth model, Nova Premier, is coming in early 2025. Amazon says it will handle complex reasoning and generate synthetic data for other AI systems, potentially competing with OpenAI's o1.

Users can fine-tune the Nova Intelligence models with their own text, image, and video data to better match specific industries and use cases. Technical specifications are available here.

For creative generation tasks, Amazon offers Nova Canvas to generate images and Nova Reel to create videos. Both tools automatically add digital watermarks to everything they create.

Limited rollout

Amazon is starting small with Nova, offering the models in just three U.S. regions through AWS with pay-as-you-go pricing. The Nova Pro model is significantly cheaper than similarly benchmarked models such as Anthropic's Sonnet 3.5. Time will tell how well the models perform in the real world.

Pricing tables for Amazon Nova Services: Text models (Micro/Lite/Pro), image generation (Canvas) and video generation (Reel) with detailed costs. — Amazon's Nova Pricing. |Image: via Amazon

The company says the models can handle over 200 languages but perform best with 15 major ones like English, German, Spanish, French, and Chinese. For now, the image and video models only work with English input.

The move comes as Amazon tries to catch up in the AI race. While its cloud platform offers many AI models, Microsoft has pulled ahead through its OpenAI partnership, and Google has gained ground with its own Gemini system.

AI News Without the Hype – Curated by Humans

As a THE DECODER subscriber, you get ad-free reading, our weekly AI newsletter, the exclusive "AI Radar" Frontier Report 6× per year, access to comments, and our complete archive.

Source: AWS