Intelligence Domains Comparison
From ACE Brains paper: Table 2–5, four domains, six bar charts per domain
Overview
ACE-Brain-0 is a generalist multimodal foundation model designed to unify perception, reasoning, and decision-making across diverse embodied domains, including spatial cognition, autonomous driving, low-altitude sensing and embodied interaction. Built upon a unified multimodal large language model (MLLM) architecture, ACE-Brain-0 learns a shared spatial reasoning substrate that enables generalization across heterogeneous physical environments and agent embodiments.
Extensive evaluation across 24 benchmarks demonstrates that ACE-Brain-0 achieves state-of-the-art or competitive performance across multiple domains, validating its effectiveness as a unified embodied intelligence model.
Key Features
Spatial Intelligence as Scaffold
Built around a unified spatial representation that bridges perception and thinking across heterogeneous embodiments, enabling robust 3D scene understanding as the core cognitive backbone.
Scaffold-Specialize-Reconcile
SSR training paradigm first establishes a shared spatial foundation, then cultivates domain-specific experts in isolation, and finally harmonizes them through data-free model merging, eliminating gradient interference and catastrophic forgetting.
Universal Embodiment
A single foundation brain that generalizes across spatial cognition, autonomous driving, low-altitude sensing, and embodied interaction, covering four distinct intelligence domains within one unified architecture and training paradigm.
Performance Highlights
Comprehensive comparison of ACE-Brain-0 against state-of-the-art models across 24 benchmarks spanning four intelligence domains. Bold values denote the best result per benchmark; ↓ denotes lower-is-better.
| Benchmark | VeBrain | Pelican-VL | MiMo-Embodied | RoboBrain2.5 | Vlaser | ACE-Brain-0 |
|---|---|---|---|---|---|---|
| Spatial Cognition | ||||||
| VSIBench | 39.9 | 52.8 | 48.5 | 41.0 | 60.3 | 63.3 |
| MMSI-Bench | 27.3 | 26.0 | 31.7 | 29.3 | 27.2 | 32.2 |
| BLINK | 79.7 | 56.8 | 0.0 | 84.3 | 84.9 | 83.9 |
| SITE | 51.4 | 52.3 | 44.8 | 52.6 | 47.5 | 53.1 |
| SAT | 73.3 | 67.3 | 78.7 | 63.3 | 66.7 | 92.0 |
| MindCube | 30.1 | 31.0 | 32.3 | 28.1 | 34.6 | 82.1 |
| Multi3DRef | 67.8 | 7.9 | 8.2 | 8.2 | 8.2 | 59.6 |
| Autonomous Driving | ||||||
| MME-RealWorld | 60.1 | 57.9 | 60.3 | 60.0 | 41.6 | 71.2 |
| MAPLM | 22.9 | 24.9 | 74.5 | 22.5 | 29.1 | 77.8 |
| DriveAction | 78.3 | 77.2 | 81.0 | 80.5 | 78.1 | 81.3 |
| NuscenesQA | 29.3 | 14.8 | 56.7 | 33.2 | 33.1 | 58.8 |
| NuPlanQA | 82.9 | 83.4 | 73.7 | 79.3 | 78.3 | 91.7 |
| LingoQA | 55.0 | 56.0 | 69.9 | 48.0 | 59.6 | 65.8 |
| Low-Altitude Sensing | ||||||
| UrbanVideo-Bench | 36.5 | 37.1 | 26.0 | 37.5 | 30.4 | 56.9 |
| AirCop | 51.9 | 50.8 | 50.2 | 49.9 | 25.3 | 70.3 |
| AVI-Math | 25.4 | 22.5 | 33.7 | 26.1 | 19.3 | 35.0 |
| Airspatial ↓ | 1583.4 | 1586.6 | 289.4 | 1509.3 | 1597.7 | 258.0 |
| HRVQA | 37.9 | 38.6 | 22.2 | 13.4 | 27.0 | 61.2 |
| Embodied Interaction | ||||||
| ERQA | 40.3 | 39.8 | 46.8 | 44.3 | 41.0 | 41.5 |
| RoboVQA | 29.2 | 28.1 | 0.9 | 32.9 | 7.9 | 64.6 |
| OpenEQA | 63.8 | 63.3 | 74.1 | 62.6 | 56.3 | 70.0 |
| EgoPlan2 | 27.3 | 39.4 | 43.0 | 44.9 | 53.4 | 55.3 |
| EmbSpatial | 70.5 | 73.2 | 76.2 | 75.6 | 75.3 | 77.3 |
| EB-Habitat | 15.0 | 16.3 | 16.7 | 26.3 | 40.0 | 42.3 |
Notes:
- Bold values indicate the best result in each row.
- ↓ (Airspatial) is a lower-is-better metric; ACE-Brain-0 achieves the lowest error.
- Results sourced from Table 2–5 of the ACE-Brain-0 paper.
BibTeX
@misc{gong2026acebrain0spatialintelligenceshared,
title={ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments},
author={Ziyang Gong and Zehang Luo and Anke Tang and Zhe Liu and Shi Fu and Zhi Hou and Ganlin Yang and Weiyun Wang and Xiaofeng Wang and Jianbo Liu and Gen Luo and Haolan Kang and Shuang Luo and Yue Zhou and Yong Luo and Li Shen and Xiaosong Jia and Yao Mu and Xue Yang and Chunxiao Liu and Junchi Yan and Hengshuang Zhao and Dacheng Tao and Xiaogang Wang},
year={2026},
eprint={2603.03198},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.03198},
}