State-Consistency Loss for Learning Spatial Perception Tasks from Partial Labels

When learning models for real-world robot spatial perception tasks, one might have access only to partial labels: this occurs for example in semi-supervised scenarios (in which labels are not available for a subset of the training instances) or in some types of self-supervised robot learning (where the robot autonomously acquires a labeled training set, but only acquires labels for a subset of the output variables in each instance). We formalize a general approach to deal with this class of problems by an auxiliary loss enforcing the expectation that the perceived environment state should not abruptly change; then, we directly instantiate the approach to solve two different realistic visual perception problems: a simulated ground robot learning long-range obstacle mapping as a 400-binary-label classification task in a self-supervised way in a static environment; and a real nano-quadrotor learning human pose estimation as a 3-variable regression task in a semi-supervised way in a dynamic environment. In both cases, our approach yields significant quantitative performance improvements (average increase of 6 AUC percentage points in the former; relative improvement of the R^2 metric ranging from 7% to 33% in the latter) over baselines. Additional real-world tests on the nano-quadrotor show improved human-tracking behavior.