Pengyuan Wang

I am a PhD student at TUM CAMP computer vision group supervised by Prof. Dr. Nassir Navab. My research focuses on object pose estimations and robotic manipulations.

Email  /  Scholar  /  Linkedin  /  Github

profile photo

Research

My research interest is in the pose estimation of category-level objects and novel objects, which is important for robotic applications. Also I am interested in vision-language-action models for generic robotic manipulations.

GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence
Pengyuan Wang, Takuya Ikeda, Robert Lee, Koichi Nishiwaki
ECCV, 2024
project page(soon) / code(soon) / arXiv

GS-Pose enables training category-level object pose estimations from only a few synthetic CAD models , and achieves comparable results with baselines trained on real pose dataset.

HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios
HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp, Guangyao Zhai, Hannah Schieber, Giulia Rizzoli, Pengyuan Wang, Hongcheng Zhao, Lorenzo Garattoni, Sven Meier, Daniel Roth, Nassir Navab, Benjamin Busam,
CVPR , 2024   (Highlight)
project page / dataset / arXiv

A large-scale category-level pose dataset with accurate ground truth pose annotations. The dataset features photometrically-challenging categories including transparent and metallic objects, and more diverse object poses in the real scenes. Feel free to test your pose estimation methods on our benchmark.

Improving Self-Supervised Learning of Transparent Category Poses with Language Guidance and Implicit Physical Constraints
Pengyuan Wang, Lorenzo Garattoni, Sven Meier, Nassir Navab Benjamin Busam
RA-L, 2024
arXiv

We propose novel self-supervision losses to self-supervise the category-level pose estimations for transparent objects. Only RGB-D with polarization images and no real pose annotations are needed in the self-supervision stage.

CCD-3DR: Consistent Conditioning in Diffusion for Single-Image 3D Reconstruction
Yan Di , Chenyangguang Zhang, Pengyuan Wang, Guangyao Zhai, Ruida Zhang , Fabian Manhardt , Benjamin Busam, Xiangyang Ji, Federico Tombari
arXiv, 2023
arXiv

CCD-3DR works on generative networks imagining object point clouds from a single RGB image. Accurate point clouds are generated by diffusion networks with consistent object center conditioning.

CroCPS: Addressing Photometric Challenges in Self-Supervised Category-Level 6D Object Poses with Cross-Modal Learning
Pengyuan Wang, Lorenzo Garattoni, Sven Meier, Nassir Navab, Benjamin Busam
BMVC, 2022
paper

How to estimate the object pose of metallic objects from RGB-D inputs in presence of depth artifacts? CroCPS proposes a multi-modal approach to improve pose estimation accuracy in real scenes for metallic category.

Polarimetric Pose Prediction
Daoyi Gao, Yitong Li , Patrick Ruhkamp, Iuliia Skobleva, Magdalena Wysock, HyunJun Jung, Pengyuan Wang, Arturo Guridi, Benjamin Busam
ECCV, 2022
project page / paper / code

PPP-Net proposes a hybrid model that utilizes polarizaiotn information with physical priors in a data-driven learning strategy to improve the accuracy of pose predictions for photometric challenging objects.

PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation with Photometrically Challenging Objects
Pengyuan Wang*, HyunJun Jung*, Yitong Li, Siyuan Shen, Rahul Parthasarathy Srikanth, Lorenzo Garattoni, Sven Meier, Nassir Navab, Benjamin Busam
CVPR, 2022
project page / dataset / arXiv

We propose a novel method to annotate object poses accurately with a robot arm, especially we design a calibration pipeline for hand-eye calibration of the cameras. Our method annotates the object poses accurately, even for photometrically-challenging objects such as glasses and cutlery.

DemoGrasp: Few-Shot Learning for Robotic Grasping with Human Demonstration
Pengyuan Wang*, Fabian Manhardt*, Luca Minciullo, Lorenzo Garattoni, Sven Meier, Nassir Navab, Benjamin Busam
IROS, 2021
arXiv

DemoGrasp proposes to learn robotic grasping from human demonstrations. In the method we learn the grasping pose and object shape jointly in a demonstration sequence. Afterwards we propose a novel object pose estimator to get object poses in the real scenes as well as the grasping point.

Side Projects

Unofficial Implementation of CLIP Visual Prompting
Github Code

Visual prompting recognizes the keypoint positions in the images by specifying the keypoint names with CLIP. The method is achieved by drawing circles around the keypoints. We unofficially implement the method with CLIP models and optimal transport functions, and provide a testing example in github.


Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.