Audio-visual sensing from a quadcopter: dataset and baselines for source localization and sound enhancement

Published in Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Macau, China, Nov 4-8, 2019

Recommended citation: L. Wang, R. Sanchez-Matilla and A. Cavallaro. "Audio-visual sensing from a quadcopter: dataset and baselines for source localization and sound enhancement." Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS).

Abstract We present an audio-visual dataset recorded outdoors from a quadcopter and discuss baseline results for multiple applications. The dataset includes a scenario for source localization and sound enhancement with up to two static sources, and a scenario for source localization and tracking with a moving sound source. These sensing tasks are made challenging by the strong and time-varying ego-noise generated by the rotating motors and propellers. The dataset was collected using a small circular array with 8 microphones and a camera mounted on the quadcopter. The camera view was used to facilitate the annotation of the sound-source positions and can also be used for multi-modal sensing tasks. We discuss the audio-visual calibration procedure that is needed to generate the annotation for the dataset, which we make available to the research community here.

Sample image Sample image

Links Dataset website Paper Presentation

Video