Audio-visual sensing from a quadcopter: dataset and baselines for source localization and sound enhancement
Published in Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), Macau, China, Nov 4-8, 2019
Recommended citation: L. Wang, R. Sanchez-Matilla and A. Cavallaro. "Audio-visual sensing from a quadcopter: dataset and baselines for source localization and sound enhancement." Proc. of IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS).
Abstract We present an audio-visual dataset recorded outdoors from a quadcopter and discuss baseline results for multiple applications. The dataset includes a scenario for source localization and sound enhancement with up to two static sources, and a scenario for source localization and tracking with a moving sound source. These sensing tasks are made challenging by the strong and time-varying ego-noise generated by the rotating motors and propellers. The dataset was collected using a small circular array with 8 microphones and a camera mounted on the quadcopter. The camera view was used to facilitate the annotation of the sound-source positions and can also be used for multi-modal sensing tasks. We discuss the audio-visual calibration procedure that is needed to generate the annotation for the dataset, which we make available to the research community here.
Sample image
Links Dataset website Paper Presentation
Video