Ag2x2

Robust Agent-Agnostic Visual Representations for

Zero-Shot Bimanual Manipulation

Ziyin Xiong*1,2,3,4,5, Yinghan Chen*1,2,3,4,5,6, Puhao Li1, Yixin Zhu2,3,4,5,7, Tengyu Liu1†, Siyuan Huang1†
1National Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence (BIGAI), 2School of Psychological and Cognitive Sciences, Peking University, 3Institute for Artificial Intelligence, Peking University, 4Beijing Key Laboratory of Behavior and Mental Health, Peking University, 5Yuanpei College, Peking University, 6Department of Computer Science and Technology, University of Cambridge, 7Embodied Intelligence Lab, PKU-Wuhan Institute for Artificial Intelligence.
teaser

Ag2x2 learns bimanual manipulation without task-specific knowledge.

Abstract

Bimanual manipulation, fundamental to human daily activities, remains a challenging task due to its inherent complexity of coordinated control. Recent advances have enabled zero-shot learning of single-arm manipulation skills through agent-agnostic visual representations derived from human videos; however, these methods overlook crucial agent-specific information necessary for bimanual coordination, such as end-effector positions. We propose Ag2x2, a computational framework for bimanual manipulation through coordination-aware visual representations that jointly encode object states and hand motion patterns while maintaining agent-agnosticism. Extensive experiments demonstrate that Ag2x2 achieves a 73.5% success rate across 13 diverse bimanual tasks from Bi-DexHands and PerAct2, including challenging scenarios with deformable objects like ropes. This performance outperforms baseline methods and even surpasses the success rate of policies trained with expert-engineered rewards. Furthermore, we show that representations learned through Ag2x2 can be effectively leveraged for imitation learning, establishing a scalable pipeline for skill acquisition without expert supervision. By maintaining robust performance across diverse tasks without human demonstrations or engineered rewards, Ag2x2 represents a step toward scalable learning of complex bimanual robotic skills.

Pipeline of Ag2x2

model

Simulation Results

Comparison and Ablation Results

Method Bi-DexHands PerAct2 Overall
(a)(b)(c)(d)(e)(f)Avg. (g)(h)(i)(j)(k)(l)(m)Avg.
Eureka00021514.8%010072015.9%15.4%
R3M0030107.4%204233022.2%15.4%
VIP13172025.9%004553027.0%26.5%
Ag2Manip69743766.7%233396447.6%56.4%
Expert Reward89668985.2%506353644.4%63.2%
Ours (w/o hands)74774970.4%543583346.0%57.3%
Ours (full)76987985.2%652796563.5%73.5%

Imitation Learning Results

BibTeX

@article{TODO2025ag2x2,
  author  = {TODO: Author list},
  title   = {TODO: Title of the paper},
  journal = {TODO: Conference or Journal},
  year    = {2025},
}