Bimanual manipulation, fundamental to human daily activities, remains a challenging task due to its inherent complexity of coordinated control. Recent advances have enabled zero-shot learning of single-arm manipulation skills through agent-agnostic visual representations derived from human videos; however, these methods overlook crucial agent-specific information necessary for bimanual coordination, such as end-effector positions. We propose Ag2x2, a computational framework for bimanual manipulation through coordination-aware visual representations that jointly encode object states and hand motion patterns while maintaining agent-agnosticism. Extensive experiments demonstrate that Ag2x2 achieves a 73.5% success rate across 13 diverse bimanual tasks from Bi-DexHands and PerAct2, including challenging scenarios with deformable objects like ropes. This performance outperforms baseline methods and even surpasses the success rate of policies trained with expert-engineered rewards. Furthermore, we show that representations learned through Ag2x2 can be effectively leveraged for imitation learning, establishing a scalable pipeline for skill acquisition without expert supervision. By maintaining robust performance across diverse tasks without human demonstrations or engineered rewards, Ag2x2 represents a step toward scalable learning of complex bimanual robotic skills.
Comparison and Ablation Results
Method | Bi-DexHands | PerAct2 | Overall | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(a) | (b) | (c) | (d) | (e) | (f) | Avg. | (g) | (h) | (i) | (j) | (k) | (l) | (m) | Avg. | |||
Eureka | 0 | 0 | 0 | 2 | 1 | 5 | 14.8% | 0 | 1 | 0 | 0 | 7 | 2 | 0 | 15.9% | 15.4% | |
R3M | 0 | 0 | 3 | 0 | 1 | 0 | 7.4% | 2 | 0 | 4 | 2 | 3 | 3 | 0 | 22.2% | 15.4% | |
VIP | 1 | 3 | 1 | 7 | 2 | 0 | 25.9% | 0 | 0 | 4 | 5 | 5 | 3 | 0 | 27.0% | 26.5% | |
Ag2Manip | 6 | 9 | 7 | 4 | 3 | 7 | 66.7% | 2 | 3 | 3 | 3 | 9 | 6 | 4 | 47.6% | 56.4% | |
Expert Reward | 8 | 9 | 6 | 6 | 8 | 9 | 85.2% | 5 | 0 | 6 | 3 | 5 | 3 | 6 | 44.4% | 63.2% | |
Ours (w/o hands) | 7 | 4 | 7 | 7 | 4 | 9 | 70.4% | 5 | 4 | 3 | 5 | 8 | 3 | 3 | 46.0% | 57.3% | |
Ours (full) | 7 | 6 | 9 | 8 | 7 | 9 | 85.2% | 6 | 5 | 2 | 7 | 9 | 6 | 5 | 63.5% | 73.5% |
@article{TODO2025ag2x2,
author = {TODO: Author list},
title = {TODO: Title of the paper},
journal = {TODO: Conference or Journal},
year = {2025},
}