Authors: D. Gehrig, M. Rüegg, M. Gehrig, J. Hidalgo-Carrió, D. Scaramuzza
Event cameras are novel vision sensors that re-port per-pixel brightness changes as a stream of asynchronous“events”. They offer significant advantages compared to standardcameras due to their high temporal resolution, high dynamicrange and lack of motion blur. However, events only measurethe varying component of the visual signal, which limits theirability to encode scene context. By contrast, standard camerasmeasure absolute intensity frames, which capture a much richerrepresentation of the scene. Both sensors are thus complementary.However, due to the asynchronous nature of events, combiningthem with synchronous images remains challenging, especiallyfor learning-based methods. This is because traditional recurrentneural networks (RNNs) are not designed for asynchronousand irregular data from additional sensors. To address thischallenge, we introduce Recurrent Asynchronous Multimodal(RAM) networks, which generalize traditional RNNs to handleasynchronous and irregular data from multiple sensors. Inspiredby traditional RNNs, RAM networks maintain a hidden statethat is updated asynchronously and can be queried at any timeto generate a prediction. We apply this novel architecture tomonocular depth estimation with events and frames where weshow an improvement over state-of-the-art methods by up to 30%in terms of mean absolute depth error. To enable further researchon multimodal learning with events, we release EventScape, a newdataset with events, intensity frames, semantic labels, and depthmaps recorded in the CARLA simulator.
- Published in: IEEE Robotics and Automation Letters (RA-L)
- DOI: —
- Read paper
- Date: 2021