Amazon announced its Nova family of AI models today for generating and analyzing text, images, and videos. While testing shows the top-tier Nova Pro model keeps up with competitors in several areas, it doesn't push any new boundaries in what AI can do.
Amazon says its Nova models balance cost and performance as it competes with OpenAI and Google. The lineup includes two types of models: "understanding models" that process text, images, or videos to generate text responses, and creative models that turn text and image inputs into new images or videos.
From the basics to multimedia
The entry-level Nova Micro handles text only, focusing on speed and low cost. It processes up to 128,000 tokens at a time and can summarize text, translate languages, solve basic math problems, and generate code.
Nova Lite and Nova Pro can analyze text, images, and video without audio. Nova Pro processes up to 300,000 tokens and integrates with APIs and external tools for complex tasks. These models perform similarly to competing systems in visual and agent-based testing according to Amazon benchmarks.
A fourth model, Nova Premier, is coming in early 2025. Amazon says it will handle complex reasoning and generate synthetic data for other AI systems, potentially competing with OpenAI's o1.
Users can fine-tune the Nova Intelligence models with their own text, image, and video data to better match specific industries and use cases. Technical specifications are available here.
For creative generation tasks, Amazon offers Nova Canvas to generate images and Nova Reel to create videos. Both tools automatically add digital watermarks to everything they create.
Limited rollout
Amazon is starting small with Nova, offering the models in just three U.S. regions through AWS with pay-as-you-go pricing. The Nova Pro model is significantly cheaper than similarly benchmarked models such as Anthropic's Sonnet 3.5. Time will tell how well the models perform in the real world.
The company says the models can handle over 200 languages but perform best with 15 major ones like English, German, Spanish, French, and Chinese. For now, the image and video models only work with English input.
The move comes as Amazon tries to catch up in the AI race. While its cloud platform offers many AI models, Microsoft has pulled ahead through its OpenAI partnership, and Google has gained ground with its own Gemini system.