Object pose estimation is crucial for robotic applications and augmented reality. To provide a benchmark with high-quality ground truth annotations to the community, we introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL. PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects. We developed a novel robot-supported multi-modal (RGB, depth, polarisation) data acquisition and annotation process. It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation.
The dataset contains 24 sequences and object models. In each sequence, the dataset contains multimodality inputs from ToF camera and polarization camera. The rgb and depth folders are captured from ToF camera and the corresponding object masks and nocs maps are also rendered in the folders. The polarization folder contains images from polarization camera which are resized to 640x480. The object ground truth annotations for both cameras are saved in rgb_scene_gt.json and pol_scene_gt.json in BOP format. The object class id and instance id is documented in the class_obj_taxonomy.json file.