Welcome to iCubWorld
How many objects can iCub recognize? This question led us in 2013 to start the iCubWorld project, with the main goal of benchmarking the development of the visual recognition capabilities of the iCub robot. The project is the result of a collaboration between the Istituto Italiano di Tecnologia (IIT) - iCub Facility, the University of Genoa - SlipGURU research group, and the Laboratory for Computational and Statistical Learning.
iCubWorld datasets are collections of images recording the visual experience of iCub while observing objects in its typical environment, a laboratory or an office. The acquisition setting is devised to allow a natural human-robot interaction, where a teacher verbally provides the label of the object of interest and shows it to the robot, by holding it in the hand; the iCub can either track the object while the teacher moves it, or take it in its hand.
Since 2013, we published four iCubWorld releases of increasing size (described in detail below), aimed at investigating complementary aspects of robotic visual recognition. These image collections allow for extensive analysis of the behaviour of recognition systems when trained in different conditions, offering a faithful and reproducible benchmark for the performance that we can expect from the real system.
Images in iCubWorld datasets are annotated with the label of the object represented and a bounding box around it. We developed a Human-Robot-Interaction application to acquire annotated images by exploiting the real-world context and the interaction with the robot. This setup allows to build large annotated datasets in a fast and natural way.
The only external supervision is in the form of a human teacher providing verbally the label of the object that is going to be acquired. The teacher approaches the robot and shows the object in his/her hand; during the acquisition, localization of the object in the visual field of the robot is performed by exploting self-supervision techniques.
Two acquisition modalities are possible: human or robot mode.
The teacher moves the object holding it in the hand and the robot tracks it by exploiting either motion or depth cues.
The robot takes the object in the hand and focuses on it by using knowledge of its own kinematics.
We record incoming frames from the robot cameras, togheter with the information on the bounding box.
DatasetsiCubWorld is an ongoing project. Latest release is iCubWorld Transformations.
CodeThis section is under construction.
We are working to provide as soon as possible documentation and support for the following code:
- iCub application to acquire iCubWorld releases
- MATLAB code to automatically format the acquired images in a directory tree (similar to the one we released for iCubWorld Transformations)
- MATLAB functions providing utilities to train Caffe deep networks on arbitrary subsets of the acquired dataset, by setting model and back-propagation hyperparameters (e.g. layer learning rates, solver type) programmatically through configuration files