Ziyin Xiong

Ziyin Xiong | 熊梓因

I am a first-year MS Research student at the Robotics Institute, Carnegie Mellon University, advised by Prof. Katerina Fragkiadaki. I received my Bachelor's degree in Artificial Intelligence from Tong Class (led by Prof. Song-Chun Zhu) in Yuanpei College, Peking University (2025).

At Peking University, I was advised by Prof. Yixin Zhu. I also collaborated with Dr. Siyuan Huang and Dr. Tengyu Liu at BIGAI. In 2023 fall, I visited UC Berkeley and was advised by Prof. Masayoshi Tomizuka. My research interest lies in the intersection of robot learning and robot vision.

Email / Google Scholar / Github / X / LinkedIn

Research Interest

My research goal is to develop generalizable robot skill learning and equip robots with human-like abilities to reason and solve complex tasks. It is a popular view that generalization can be achieved by learning at scale, which ignores resource cost. To address this challenge, I am exploring three aspects:

Learning from off-domain data sources. Human internet data provides both high-level insights into task completion and a broad task space, while implicitly revealing low-level robotic skills. Existing human data is sufficient to build representations embedding human knowledge of the world's structure and laws, which enable robots to infer spatial and physical dynamics. While this data lacks annotations, principles like consistency can be employed for effective self-supervision.

Building reward models with general prior. RL rewards serve as a pivotal signal to the agent, guiding it towards desirable behaviors. However, manually crafted rewards are labor intensive and challenging to scale for unstructured real-world settings. Reward models learned from apprioriate human/robot prior could specify tasks and generalize across embodiments, which enables efficient trial-and-error learning.

Acquiring common sense from foundation models. Foundation models could enhance various components of complex long-horizon robotic tasks, including perception, decision-making and control. They offer robots rich, unstructured prior knowledge, bringing my vision of imbuing robots with human-like cognitive abilities one step closer to reality.

Selected Papers

Ag2x2: Robust Agent-Agnostic Visual Representations for Zero-Shot Bimanual Manipulation
Ziyin Xiong*, Yinghan Chen*, Puhao Li, Yixin Zhu, Tengyu Liu, Siyuan Huang
IROS, 2025
Paper / Project / Code

We present Ag2x2, a framework that advances the autonomous acquisition of bimanual manipulation skills through agent-agnostic and coordination-aware visual representations that jointly encode object and hand motion patterns.

Selected Projects

AnyManip: Learning Generalizable Open-vocabulary Manipulation through Dense Optical Flows
Beijing Institute for General Artificial Intelligence
In progress, 2024

Built on the idea that optical flow, which reveals the motion dynamics of the end effector and objects, can guide robots in performing novel tasks in unfamiliar environments. Developed a two-stage model: an optical flow prediction module leveraging diffusion models, trained on diverse datasets, and a motion prediction module integrating RGB observations, optical flow, and proprioception. Achieved accurate flow prediction, action planning, and object manipulation.

Symmetry Regularization for Quadruped Locomotion
University of California, Berkeley
Code

In nature, quadruped animals achieve high-speed locomotion with stable, inertial, and energy-efficient postures. Inspired by these principles, this work aimed to enhance running velocities while ensuring stable robot locomotion by leveraging motion dynamics such as limb movement diagonal symmetry and time-reversal symmetry. Built on Legged Gym, my implementation contributed to a significant improvement in maximum tracking velocity, surpassing the reported state-of-the-art speed of the Go1 robot in simulation.

Yuanpei Intelligence Campus
Yuanpei College, Peking University
Website / Code

An online college service system developed by Yuanpei College students, offering a convenient platform for activity room reservations, interest groups, courses, and library services. I am responsible for migrating and integrating the standalone activity room reservation system with the college YPPF website and enhancing the system’s messaging functionality, working with Python.

Miscellaneous

I believe sound is the medium closest to the soul. Honest and meaningful communication can bring the world closer together.

I’m enthusiastic about podcasts for their efficiency and intimate way of sharing knowledge and ideas. I'm preparing my personal podcast channel, exploring topics including geopolitical history, traditional customs and popular culture. Contact me if you're also passionate about these subjects or interested in starting a podcast!

Design and source code from Jon Barron's website.