beat365体育亚洲官网在线下载 - 歡迎您!


学术前沿讲座第23期:Connecting Visual Representation and Robot Manipulation


高瓴人工智能学院学术前沿讲座总第42期 2022年第23期


报告题目:Connecting Visual Representation and Robot Manipulation


讲座地点:线下:文化大厦2101  线上:腾讯会议:779-366-651


Visual pre-training with large-scale real-world data has made great progress in recent years. However, the recipes of visual pre-training for robot manipulations are yet to be built. In this talk, I present two works contributing to this topic. First, I present iBOT, a self-supervised visual representation work. iBOT performs masked image modeling via self-distillation, achieving state-of-the-art results on most downstream tasks related to semantic reasoning. Furthermore, I present a visual pretraining scheme for robot manipulation (Vi-PRoM). In Vi-PRoM, we investigate the effects of visual pre-training strategies on robot manipulation tasks from three fundamental perspectives:  datasets, model architectures and training methods. Vi-PRoM employs contrastive learning, visual semantics learning and temporal dynamics learning to facilitate robot manipulation tasks in the real-world.


Tao Kong (孔涛) is a Senior Researcher at ByteDance AI Lab. He received his Ph.D. degree from Tsinghua University, advised by Fuchun Sun. He also visited the University of Pennsylvania, working with Jianbo Shi. His research mission is to develop robot techniques and systems to perform intelligent perception and interaction in the real-world.  Dr. Kong has published over 30 papers at top-tier AI/robot conferences and journals, receiving over 6,000 citations so far. He is the recipient of the CAAI Excellent Doctoral Dissertation Nomination Award 2020, IROS Robotic Grasping and Manipulation Competition Winner Award 2016, and Habitat ObjectNav Challenge Winner Award 2022.