Jiyao Zhang
I am a second-year Ph.D. candidate in the Center on Frontiers of Computing Studies (CFCS) at the School of Computer Science, Peking University, China, advised by Prof. Hao Dong.
Before this, I was a research assistant at the EPIC Lab, advised by Prof. He Wang.
My research focuses on 3D computer vision and robotics, with an emphasis on embodied perception and manipulation. I aim to enable robots to autonomously perceive, understand, and interact with the world.
We are looking for interns in the areas of robot perception, synthetic data generation, and robot manipulation. If you are interested in our research, don't hesitate to contact me.
Email: jiyaozhang@stu.pku.edu.cn /
Google Scholar
|
|
📣 News
[2025/02] 🎉 We release the groundbreaking simulation framework AgiBot Digital World using OmniManip as the embodied data engine.
[2025/02] 🎉 Two paper gets accepted to CVPR 2025, including OmniManip, which is an embodied agent for open-vocabulary manipulation in a zero-training manner.
[2025/01] 🎉 Two paper gets accepted to ICRA 2025.
[2024/07] 🎉 Omni6DPose, the largest and most diverse universal 6D object pose estimation benchmark, gets accepted to ECCV 2024. Omni6DPose makes the application of 6D pose estimation in various downstream tasks truly feasible.
[2024/07] 🎉 One paper gets accepted to RAL.
[2024/04] 🎉 One paper gets accepted to RAL.
[2024/02] 🎉 RoboKeyGen gets accepted to ICRA 2024.
[2023/09] 🎉 GenPose gets accepted to NeurIPS 2023. We introduce an innovative category-level object pose estimation paradigm, leveraging generative modeling to effectively address the multi-hypothesis issue.
[2023/09] 🎉 GraspGF gets accepted to NeurIPS 2023.
[2023/02] 🎉 SGTAPose gets accepted to CVPR 2023, enabling online hand-eye calibration.
[2022/07] 🎉 DREDS gets accepted to ECCV 2022, closing depth sim2real gap by physics-based depth sensor simulation.
|
📕 Publications( * : equal contribution)
|
|
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Jiyao Zhang*, Weiyao Huang*, Bo Peng*, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong
[ECCV 2024] European Conference on Computer Vision, 2024
Paper /
Project Page /
Bibtex /
Code
@article{zhang2024omni6dpose,
title={Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking},
author={Zhang, Jiyao and Huang, Weiyao and Peng, Bo and Wu, Mingdong and Hu, Fei and Chen, Zijian and Zhao, Bo and Dong, Hao},
booktitle={European Conference on Computer Vision},
year={2024},
organization={Springer}
}
We introduce Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. To address the substantial variations and ambiguities of Omni6DPose, we introduce GenPose++, a SOTA category-level pose estimation framework.
|
|
RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation
Yang Tian*, Jiyao Zhang*, Guowei Huang, Bin Wang, Ping Wang, Jiangmiao Pang, Hao Dong
[ICRA 2024] IEEE International Conference on Robotics and Automation, 2024
Paper /
Project Page /
Bibtex /
Code
@article{tian2024robokeygen,
title={RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation},
author={Tian, Yang and Zhang, Jiyao and Huang, Guowei and Wang, Bin and Wang, Ping and Pang, Jiangmiao and Dong, Hao},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
year={2024}
}
We present a novel framework to predict robot pose and joint angles, bifurcating the high-dimensional prediction task into two manageable subtasks: 2D keypoints detection and lifting 2D keypoints to 3D.
|
|
RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields
Chang Liu*, Kejian Shi*, Kaichen Zhou*, Haoxiao Wang, Jiyao Zhang, Hao Dong
[RAL 2024] IEEE Robotics and Automation Letters, 2024
Paper /
Project Page /
Bibtex /
Code
@article{liu2024rgbgrasp,
title={RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields},
author={Liu, Chang and Shi, Kejian and Zhou, Kaichen and Wang, Haoxiao and Zhang, Jiyao and Dong, Hao},
journal={IEEE Robotics and Automation Letters},
year={2024},
publisher={IEEE}
}
We introduce a pioneering approach called RGBGrasp. This method depends on a limited set of RGB views to perceive the 3D surroundings containing transparent and specular objects and achieve accurate grasping.
|
|
LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor
Yiming Zeng*, Mingdong Wu*, Long Yang, Jiyao Zhang, Hao Ding, Hui Cheng, Hao Dong
[RAL 2024] IEEE Robotics and Automation Letters, 2024
Paper /
Project Page /
Bibtex /
Code
@article{zeng2023distilling,
title={Distilling Functional Rearrangement Priors from Large Models},
author={Zeng, Yiming and Wu, Mingdong and Yang, Long and Zhang, Jiyao and Ding, Hao and Cheng, Hui and Dong, Hao},
journal={IEEE Robotics and Automation Letters},
year={2024},
publisher={IEEE}
}
We propose a novel approach that leverages large models to distill functional rearrangement priors.
|
|
GenPose: Generative Category-level Object Pose Estimation via Diffusion Models
Jiyao Zhang*, Mingdong Wu*, Hao Dong
[NeurIPS 2023] Advances in Neural Information Processing Systems, 2023
Paper /
Project Page /
Bibtex /
Code
@article{zhang2023genpose,
title = {GenPose: Generative Category-level Object Pose Estimation via Diffusion Models},
author = {Zhang, Jiyao and Wu, Mingdong and Dong, Hao},
journal = {Advances in Neural Information Processing Systems},
year = {2023}
}
We explore a pure generative approach to tackle the multi-hypothesis issue in 6D Category-level Object Pose Estimation. The key idea is to generate pose candidates using a score-based diffusion model and aggregate poses using an energy-based diffusion model. By aggregating the remaining candidates, we can obtain a robust and high-quality output pose.
|
|
Learning Score-based Grasping Primitive for Human-assisting Dexterous Grasping
Tianhao Wu*, Mingdong Wu*, Jiyao Zhang, Yunchong Gan, Hao Dong
[NeurIPS 2023] Advances in Neural Information Processing Systems, 2023
Paper /
Project Page /
Bibtex /
Code
@article{wu2023learning,
title = {Learning Score-based Grasping Primitive for Human-assisting Dexterous Grasping},
author = {Wu, Tianhao and Wu, Mingdong and Zhang, Jiyao and Gan, Yunchong and Dong, Hao},
journal = {Advances in Neural Information Processing Systems},
year = {2023}
}
We propose a novel task called human-assisting dexterous grasping that aims to train a policy for controlling a robotic hand's fingers to assist users in grasping objects.
|
|
SGTAPose: Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence
Yang Tian*, Jiyao Zhang*, Zekai Yin*, Hao Dong
[CVPR 2023] Conference on Computer Vision and Pattern Recognition, 2023
Paper /
Project Page /
Bibtex /
Code
@inproceedings{tian2023robot,
title={Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation From Image Sequence},
author={Tian, Yang and Zhang, Jiyao and Yin, Zekai and Dong, Hao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={8917--8926},
year={2023}
}
We propose Structure Prior Guided Temporal Attention for online Camera-to-Robot Pose estimation (SGTAPose) from successive frames of an image sequence.
|
|
Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects
Qiyu Dai*, Jiyao Zhang*, Qiwei Li, Tianhao Wu, Hao Dong, Ziyuan Liu, Ping Tan, He Wang
[ECCV 2022] European Conference on Computer Vision, 2022
Paper /
Project Page /
Bibtex /
Code
@inproceedings{dai2022domain,
title={Domain randomization-enhanced depth simulation and restoration for perceiving and grasping specular and transparent objects},
author={Dai, Qiyu and Zhang, Jiyao and Li, Qiwei and Wu, Tianhao and Dong, Hao and Liu, Ziyuan and Tan, Ping and Wang, He},
booktitle={European Conference on Computer Vision},
pages={374--391},
year={2022},
organization={Springer}
}
We propose Domain Randomization Enhanced Depth Simulation (DREDS) approach to simulate an active stereo depth system using physically based rendering and demonstrate that the proposed DREDS bridges the sim-to-real domain gap.
|
🏅 Honors
Outstanding Student of Center on Frontiers of Computing Studies (CFCS), Peking University, 2023
National Scholarship, 2021
Merit Student, Xi'an JiaoTong University, 2021
|
|