Amazon has acqui-hired a non-exclusive license for Covariant's robotics models and hired about a quarter of the startup's workforce in a strategic move to boost AI innovation in automation.
The deal gives Amazon access to Covariant's Robotics Foundation models and brings on board key talent, including founders Pieter Abbeel, Peter Chen and Rocky Duan. About a quarter of Covariant's current engineers will join Amazon's AI and robotics team in Silicon Valley.
Amazon, which says it already uses 750,000 robots in its warehouses, aims to make its automation safer and more adaptable through this collaboration. The company expects this move to drive innovation in automation and attract top AI talent.
While Covariant will continue supporting its existing customers and developing its warehouse automation technology, the departure of its entire founding team suggests this is effectively an acquisition. Financial details were not disclosed. Covariant was valued at $625 million in its last funding round in April 2023.
Combining robotics with GenAI is the latest tech gamble
Amazon's move aligns with a broader industry trend, as other tech giants like Apple, OpenAI, and Google DeepMind increase their investments in robotics and AI.
Apple is reportedly developing a tabletop device with a robotic arm and a large, iPad-like display, slated for release in 2026 or 2027. The robotic arm aims to simplify everyday tasks, such as adjusting the screen for video calls or browsing recipes.
Robotics company Figure, in collaboration with OpenAI, recently unveiled its most advanced humanoid robot to date, Figure 02. Figure envisions this new robot as a pioneer for humanoid robots in workplaces and homes. OpenAI plans to increase its investment in robots powered by multimodal AI models.
Google DeepMind develops both robots and the models to control them. The company has set new standards with its RT robot models, and recently demonstrated how a robot can navigate complex, unfamiliar environments using Gemini 1.5 Pro and multimodal input. A simple smartphone video is enough to give the robot an overview of its surroundings.
General-purpose robots benefit greatly from the ability of large language models to process images, sound, text, and speech together, and to draw simple logical inferences from the data. It remains to be seen how reliably this combination will work outside the lab. Technology companies hope that this integration will improve efficiency and enable new applications in industry and everyday life.