Multimodal Decision-Making


A typical robot is equipped with a variety of different sensors that may provide visual, haptic or auditory cues about the environment. While controllers are traditionally closed around explicit state representations, it is often hard to infer this kind of information from high-dimensional, noisy and heterogenous sensory data. We have worked on fusing the different sensor modalities for learning better representations to improve decision making. Check out some of our papers below!

Exploration and Exploitation with Vision and Touch

Lee, M., Zhu, Y., Srinivasan, K., Shah, P., Savarese, S., Fei-Fei, L., Garg, A., Bohg, J. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks. Accepted at ICRA '19. Nominated for Best Paper and Best Paper Award on Cognitive Robotics.

Bohg, J., Johnson-Roberson, M., Björkman, M., Kragic, D. Strategies for multi-modal scene exploration In Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference on, pages: 4509-4515, October 2010.

Active Information Gathering

Toussaint, M., Ratliff, N., Bohg, J., Righetti, L., Englert, P., Schaal, S. Dual Execution of Optimized Contact Interaction Trajectories In Proceedings of the International Conference on Intelligent Robots and Systems, Chicago, IL, October 2014.

Johnson-Roberson, M., Bohg, J., Skantze, G., Gustafson, J., Carlson, R., Rasolzadeh, B., Kragic, D. Enhanced visual scene understanding through human-robot dialog In Intelligent Robots and Systems (IROS), 2011 IEEE/RSJ International Conference on, pages: 3342-3348, 2011.