报告题目：Connecting Visual Representation and Robot Manipulation
Visual pre-training with large-scale real-world data has made great progress in recent years. However, the recipes of visual pre-training for robot manipulations are yet to be built. In this talk, I present two works contributing to this topic. First, I present iBOT, a self-supervised visual representation work. iBOT performs masked image modeling via self-distillation, achieving state-of-the-art results on most downstream tasks related to semantic reasoning. Furthermore, I present a visual pretraining scheme for robot manipulation (Vi-PRoM). In Vi-PRoM, we investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives: datasets, model architectures and training methods. Vi-PRoM employs contrastive learning, visual semantics learning and temporal dynamics learning to facilitate robot manipulation tasks in the real-world.
Tao Kong (孔涛) is a Senior Researcher at ByteDance AI Lab. He received his Ph.D. degree from Tsinghua University, advised by Fuchun Sun. He also visited the University of Pennsylvania, working with Jianbo Shi. His research mission is to develop robot techniques and systems to perform intelligent perception and interaction in the real-world. Dr. Kong has published over 30 papers at top-tier AI/robot conferences and journals, receiving over 6,000 citations so far. He is the recipient of the CAAI Excellent Doctoral Dissertation Nomination Award 2020, IROS Robotic Grasping and Manipulation Competition Winner Award 2016, and Habitat ObjectNav Challenge Winner Award 2022.