Datasets

AmbiSep Spherical RIR Dataset

A script for generating the spherical room impulse response (SRIR) dataset that was used in [1] to train and test a deep learning-based Ambisonics-domain source separation model. The script uses the SMIR Generator to compute SRIRs and converts them to first-order AmbiX format. The generation pipeline is parameterised via YAML configuration files and supports parallel execution on multicore CPUs.

The code is available here.

  1. A. Herzog, S.R. Chetupalli and E.A.P. Habets, AmbiSep: Joint Ambisonic-to-Ambisonic speech separation and noise reduction, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 31, pp. 3081–3094, 2023.

Anechoic Interferer Dataset

A dataset of anechoic, non-stationary noise signals intended for use as interferer signals in speech enhancement and noise suppression experiments. The accompanying code repository provides a Python utility for generating random mixtures from the dataset as described in [2].

The dataset is available on Zenodo and the code is available here.

  1. P. Goetz, C. Tuna, A. Walther and E.A.P. Habets, AID: Open-source anechoic interferer dataset, Proc. International Workshop on Acoustic Signal Enhancement (IWAENC), 2022.

Multi-Room Transition Dataset

A multimodal dataset of room impulse responses and 360° photographs of each measurement position, acquired across multiple rooms and transition zones between them [3]. It is designed for blind estimation of energy decay functions in scenarios where the acoustic environment changes over time. The accompanying code repository provides the blind energy decay estimation model and training routines.

The dataset is available on Zenodo and the code is available here.

  1. P. Götz, G. Götz, N. Meyer-Kahlen, K.Y. Lee, K. Prawda, E.A.P. Habets, S.J. Schlecht, A Multi-Room Transition Dataset for Blind Estimation of Energy Decay, Proc. IWAENC, 2024.

QoEVAVE — 360° Scene Dataset

A dataset of 360° audiovisual recordings for perceptual, cognitive, and behavioral research, consisting of 26 sequences across 12 different scenes with an average duration of 60 seconds. Video is available up to 8K resolution (as audio-only .wav, video-only .mkv, or muxed .mp4). Audio is provided up to 4th-order Ambisonics in AmbiX format (ACN channel ordering, SN3D normalisation). Spatial and temporal video information is available for all sequences, and previews are available on YouTube with 1st-order Ambisonics audio.

The database is available here.

QoEVAVE — CGI Scene Dataset

Three interactive CGI scenes for 6-degrees-of-freedom (6-DoF) VR, designed for task-based exploration and behavior analysis. The scenes Cave, Cinema, and Mansion are built in Unity with the High-Definition Rendering Pipeline (HDRP), with plug-and-play audio implementation using the Meta XR Audio SDK. Acoustic geometry models are included for higher-fidelity audio rendering.

The scenes are available as individual Unity packages or as a complete Unity project here.

QoEVAVE — Saliency Dataset

Head-tracking and scene description data collected during 360° video viewing under three audio conditions (no audio, mono audio, and 4th-order Ambisonics). Head-tracking data was captured at 50 Hz. Cybersickness scores were assessed via the Simulator Sickness Questionnaire, and scene descriptions were collected via a post-sequence verbalization task.

The dataset is available here.