Abstract We present an audio-visual dataset recorded outdoors from a quadcopter and discuss baseline results for multiple applications. The dataset includes a scenario for source localization and sound enhancement with up to two static sources, and a scenario for source localization and tracking with a moving sound source. These sensing tasks are made challenging by the strong and time-varying ego-noise generated by the rotating motors and propellers. The dataset was collected using a small circular array with 8 microphones and a camera mounted on the quadcopter. The camera view was used to facilitate the annotation of the sound-source positions and can also be used for multi-modal sensing tasks. We discuss the audio-visual calibration procedure that is needed to generate the annotation for the dataset, which we make available to the research community here.