Jiyao Zhang

I am a second-year Ph.D. candidate in the Center on Frontiers of Computing Studies (CFCS) at the School of Computer Science, Peking University, China, advised by Prof. Hao Dong. My research focuses on 3D computer vision and robotics, with an emphasis on embodied perception and manipulation. I aim to enable robots to autonomously perceive, understand, and interact with the world.

Email  /  Google Scholar

profile photo

📣 News

  • [2024/07] 🎉 Omni6DPose, the largest and most diverse universal 6D object pose estimation benchmark, gets accepted to ECCV 2024. Omni6DPose makes the application of 6D pose estimation in various downstream tasks truly feasible.
  • [2024/07] 🎉 One paper gets accepted to RAL.
  • [2024/04] 🎉 One paper gets accepted to RAL.
  • [2024/02] 🎉 RoboKeyGen gets accepted to ICRA 2024.
  • [2023/09] 🎉 GenPose gets accepted to NeurIPS 2023. We introduce an innovative category-level object pose estimation paradigm, leveraging generative modeling to effectively address the multi-hypothesis issue.
  • [2023/09] 🎉 GraspGF gets accepted to NeurIPS 2023.
  • [2023/02] 🎉 SGTAPose gets accepted to CVPR 2023, enabling online hand-eye calibration.
  • [2022/07] 🎉 DREDS gets accepted to ECCV 2022, closing depth sim2real gap by physics-based depth sensor simulation.
  • 📕 Publications( * : equal contribution)

    Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
    Jiyao Zhang*, Weiyao Huang*, Bo Peng*, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong
    [ECCV 2024] European Conference on Computer Vision, 2024
    Paper / Project Page / Bibtex / Code code

    We introduce Omni6DPose, a substantial dataset characterized by its diversity in object categories, large scale, and variety in object materials. To address the substantial variations and ambiguities of Omni6DPose, we introduce GenPose++, a SOTA category-level pose estimation framework.

    RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation
    Yang Tian*, Jiyao Zhang*, Guowei Huang, Bin Wang, Ping Wang, Jiangmiao Pang, Hao Dong
    [ICRA 2024] IEEE International Conference on Robotics and Automation, 2024
    Paper / Project Page / Bibtex / Code

    We present a novel framework to predict robot pose and joint angles, bifurcating the high-dimensional prediction task into two manageable subtasks: 2D keypoints detection and lifting 2D keypoints to 3D.

    RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields
    Chang Liu*, Kejian Shi*, Kaichen Zhou*, Haoxiao Wang, Jiyao Zhang, Hao Dong
    [RAL 2024] IEEE Robotics and Automation Letters, 2024
    Paper / Project Page / Bibtex / Code

    We introduce a pioneering approach called RGBGrasp. This method depends on a limited set of RGB views to perceive the 3D surroundings containing transparent and specular objects and achieve accurate grasping.

    LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor
    Yiming Zeng*, Mingdong Wu*, Long Yang, Jiyao Zhang, Hao Ding, Hui Cheng, Hao Dong
    [RAL 2024] IEEE Robotics and Automation Letters, 2024
    Paper / Project Page / Bibtex / Code

    We propose a novel approach that leverages large models to distill functional rearrangement priors.

    GenPose: Generative Category-level Object Pose Estimation via Diffusion Models
    Jiyao Zhang*, Mingdong Wu*, Hao Dong
    [NeurIPS 2023] Advances in Neural Information Processing Systems, 2023
    Paper / Project Page / Bibtex / Code code

    We explore a pure generative approach to tackle the multi-hypothesis issue in 6D Category-level Object Pose Estimation. The key idea is to generate pose candidates using a score-based diffusion model and aggregate poses using an energy-based diffusion model. By aggregating the remaining candidates, we can obtain a robust and high-quality output pose.

    Learning Score-based Grasping Primitive for Human-assisting Dexterous Grasping
    Tianhao Wu*, Mingdong Wu*, Jiyao Zhang, Yunchong Gan, Hao Dong
    [NeurIPS 2023] Advances in Neural Information Processing Systems, 2023
    Paper / Project Page / Bibtex / Code

    We propose a novel task called human-assisting dexterous grasping that aims to train a policy for controlling a robotic hand's fingers to assist users in grasping objects.

    SGTAPose: Robot Structure Prior Guided Temporal Attention for Camera-to-Robot Pose Estimation from Image Sequence
    Yang Tian*, Jiyao Zhang*, Zekai Yin*, Hao Dong
    [CVPR 2023] Conference on Computer Vision and Pattern Recognition, 2023
    Paper / Project Page / Bibtex / Code

    We propose Structure Prior Guided Temporal Attention for online Camera-to-Robot Pose estimation (SGTAPose) from successive frames of an image sequence.

    Domain Randomization-Enhanced Depth Simulation and Restoration for Perceiving and Grasping Specular and Transparent Objects
    Qiyu Dai*, Jiyao Zhang*, Qiwei Li, Tianhao Wu, Hao Dong, Ziyuan Liu, Ping Tan, He Wang
    [ECCV 2022] European Conference on Computer Vision, 2022
    Paper / Project Page / Bibtex / Code code

    We propose Domain Randomization Enhanced Depth Simulation (DREDS) approach to simulate an active stereo depth system using physically based rendering and demonstrate that the proposed DREDS bridges the sim-to-real domain gap.

    🏅 Honors

  • Outstanding Student of Center on Frontiers of Computing Studies (CFCS), Peking University, 2023
  • National Scholarship, 2021
  • Merit Student, Xi'an JiaoTong University, 2021

  • Template adapted from Jon Barron