Abstract

As artificial intelligence (AI) rapidly advances, especially in multimodal large language models (MLLMs), research focus is shifting from single-modality text processing to the more complex domains of multimodal and embodied AI. Embodied intelligence focuses on training agents within realistic simulated environments, leveraging physical interaction and action feedback rather than conventionally labeled datasets. Yet, most existing simulation platforms remain narrowly designed, each tailored to specific tasks. A versatile, general-purpose training environment that can support everything from low-level embodied navigation to high-level composite activities, such as multi-agent social simulation and human-AI collaboration, remains largely unavailable. To bridge this gap, we introduce TongSIM, a high-fidelity, general-purpose platform for training and evaluating embodied agents. TongSIM offers practical advantages by providing over 100 diverse, multi-room indoor scenarios as well as an open-ended, interaction-rich outdoor town simulation, ensuring broad applicability across research needs. Its comprehensive evaluation framework and benchmarks enable precise assessment of agent capabilities, such as perception, cognition, decision-making, human-robot cooperation, and spatial and social reasoning. With features like customized scenes, task-adaptive fidelity, diverse agent types, and dynamic environmental simulation, TongSIM delivers flexibility and scalability for researchers, serving as a unified platform that accelerates training, evaluation, and advancement toward general embodied intelligence.

System Architecture

TongSIM System Architecture

Figure 1: Overview of the TongSIM System Architecture. The platform consists of a UE5-based simulator and a Python controller. It supports multimodal data sensors, high-fidelity simulation, large-scale NPC systems, and parallel training, integrated with a robust evaluation system.

Platform Capabilities

Features TongSIM
(Ours)
GRUtopia OmniGibson Habitat VirtualHome Virtual Community
Core Engine Base UE 5 Isaac Sim Isaac Sim Custom (C++) Unity3D Genesis
Environment
& Scenes
Scene Categories 115 ~100 (Annotated) 51 (8 Types) 211 6 Homes 35 Urban Areas
Indoor Scope
Outdoor Scope
City-level Interaction
Platform
Features
Parallel Training
Task-oriented fidelity
NPC Control
Sim-to-Real Support
Supported
Tasks
Single-Agent
Multi-Agent
Human-Robot Teaming
🏙️

Scalable & Interactive Environments

  • 115+ Diverse Scenes
  • Procedural Generation
  • Asset Import
🤖

Universal Agent & Crowd Simulation

  • Logic & LLM-Driven Agents
  • Text-Driven Motion Generation
  • Large-Scale Crowd Dynamics

Efficient Training & Sim2Real

  • Multi-Env Parallel Training
  • Near-Linear SPS Scaling
  • Sim2Real (MuJoCo/Isaac Sim)

Benchmark Gallery

Spatial Exploration Task

Spatial Exploration and Navigation

A single-agent benchmark requiring autonomous navigation and obstacle avoidance to clean up scattered debris in complex multi-room indoor environments.

MACS Task

Multi-Agent Cooperative Search (MACS)

A multi-agent collaboration benchmark where agents collaborate to collect supplies while dodging dynamic hazards in a partially observable post-flood environment.

Robot Social Navigation

Human-Robot Hybrid Scenario

A social navigation benchmark testing a robot's ability to move safely and socially compliantly through dense, dynamic human crowds in urban settings.

Household Tasks

Primary Family Composite Tasks

A comprehensive evaluation of MLLM-driven agents on diverse household activities spanning object understanding, spatial reasoning, and social interaction.

S3IT Task

Spatially Situated Social Intelligence Test (S³IT)

A benchmark designed to evaluate Embodied Social Intelligence. It requires agents within 3D environments to engage in proactive dialogue with NPCs—characterized by complex preferences and social relationships—while conducting autonomous exploration to fulfill seating arrangement tasks under multi-objective constraints.

BibTeX

@article{sun2025tongsim, title={TongSIM: A General Platform for Simulating Intelligent Machines}, author={Sun, Zhe and Wu, Kunlun and Fu, Chuanjian and Song, Zeming and Shi, Langyong and Xue, Zihe and Jing, Bohan and Yang, Ying and Gao, Xiaomeng and Li, Aijia and others}, journal={arXiv preprint arXiv:2512.20206}, year={2025} }