BEHAVIOR-1K is set to become for robotics what ImageNet was for computer vision

Stanford University has released a new robotics benchmark called BEHAVIOR-1K. The goal is similar to what ImageNet did for computer vision or MMLU did for language models - to give researchers a common baseline for measuring progress.

Until now, robotics has lacked that kind of shared standard. In areas like language and image generation, benchmarks such as MMLU and ImageNet spurred competition and breakthroughs. In robotics, however, nearly every research group has used its own test setups, which has made results difficult to compare.

The Stanford Vision and Learning Group hopes BEHAVIOR-1K will change that. The project includes AI researcher Fei-Fei Li, who is best known for her work on ImageNet. BEHAVIOR-1K defines 1,000 realistic household tasks based on survey data about where people most want help from robots. Many of these are long-horizon scenarios that require chaining together multiple steps, such as cooking or cleaning.

1,000 tasks across 50 environments

The benchmark simulates more than 50 interactive 3D environments, including homes, offices, and restaurants, and integrates over 10,000 objects. Each task is defined in the Behavior Domain Definition Language (BDDL), which specifies start and goal conditions using symbolic logic. Through a sampling process, tasks are placed into specific scenes with the right objects in their initial and target configurations.

Objects are organized using an extended synset hierarchy modeled on WordNet. This setup allows tasks to be assigned flexibly. For example, if a task calls for a fruit synset, it could involve specific objects like an apple or an orange.

Simulation built on Isaac Sim and OmniGibson

The technical foundation is Nvidia’s Isaac Sim, a simulator built on the Omniverse platform with the PhysX physics engine. On top of that runs OmniGibson, open-source simulation software developed at Stanford. OmniGibson supports realistic interactions with fluids, fabrics, heat, transparency, and both rigid and soft objects.

The benchmark also supports a wide range of robot platforms, including Franka, Fetch, and Tiago, which can carry out tasks in these interactive environments. The BEHAVIOR dataset provides all the objects, scenes, and particle systems needed to run the benchmark.

BEHAVIOR Challenge 2025

Alongside the benchmark, Stanford is launching the BEHAVIOR Challenge 2025, where researchers can test their methods against one another on identical tasks. For the first time, there will be an official leaderboard to make progress in robotics more directly comparable - much like ImageNet once did for computer vision.

Recommendation

AI research

OpenAI outperforms humans and Google at the world's top collegiate programming contest

Jim Fan, Nvidia’s Director of AI and a co-developer of robotics systems like Gr00t, argues that BEHAVIOR could provide the "hill-climbing signal" robotics research has been missing. If widely adopted, it could become the basis for building practical, general-purpose robots capable of handling everyday tasks.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

BEHAVIOR-1K is set to become for robotics what ImageNet was for computer vision

1,000 tasks across 50 environments

Simulation built on Isaac Sim and OmniGibson

BEHAVIOR Challenge 2025

OpenAI outperforms humans and Google at the world's top collegiate programming contest

Google Deepmind taps Boston Dynamics' former CTO to build the 'Android' of robots

Nvidia wants to turn the data problem in robotics into a compute problem

Figure AI claims its Figure 03 robot can wash dishes, clean floors, and handle chores

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

German court deepens the split on AI and copyright with its latest ruling

BEHAVIOR-1K is set to become for robotics what ImageNet was for computer vision

1,000 tasks across 50 environments

Simulation built on Isaac Sim and OmniGibson

BEHAVIOR Challenge 2025

Share

Bank details