In the ISAD2 project, we develop model-based and data-driven techniques for learning and detecting characteristic sound events in acoustic data including music recordings and environmental sounds. The project is funded by the German Research Foundation. On this website, we summarize the project's main objectives and provide links to project-related resources (data, demonstrators, websites) and publications.
Informed Sound Activity Detection in Music and Audio Signals
In music information retrieval (MIR), the development of computational methods for analyzing, segmenting, and classifying music signals is of fundamental importance. In the project's first phase (2017-2020), we explored fundamental techniques for detecting characteristic sound events present in a given music recording. Here, our focus was on informed approaches that exploit musical knowledge in the form of score information, instrument samples, or musically salient sections. We considered concrete tasks such as locating audio sections with a specific timbre or instrument, identifying monophonic themes in complex polyphonic music recordings, and classifying music genres or playing styles based on melodic contours. We tested our approaches within complex music scenarios, including instrumental Western classical music, jazz, and opera recordings. In this second phase of the project, our goals are significantly extended. First, we go beyond the music scenario by considering environmental sounds as a second challenging audio domain. As a central methodology, we explore and combine the benefits of model-based and data-driven techniques to learn task-specific sound event representations. Furthermore, we investigate hierarchical approaches to simultaneously incorporate, exploit, learn, and capture sound events that manifest on different temporal scales and belong to hierarchically ordered categories. An overarching goal of the project's second phase is to develop explainable deep learning models that provide a better understanding of the structural and acoustic properties of sound events.
Informierte Klangquellenerkennung in Musik- und Audiosignalen
Im Bereich des Music Information Retrieval (MIR) ist die Entwicklung von computergestützten Methoden zur Analyse, Segmentierung und Klassifizierung von Musiksignalen von grundlegender Bedeutung. In der ersten Projektephase (2017-2020) untersuchten wir grundlegende Techniken zur Erkennung charakteristischer Klangereignisse, die in einer gegebenen Musikaufnahme vorhanden sind. Dabei lag unser Fokus auf Ansätzen, die musikalisches Wissen in Form von Notentextinformationen, Klangbeispielen oder musikalisch repräsentativen Musikpassagen nutzen. Zentrale Aufgabenstellungen bestanden im Auffinden von Audioabschnitten mit einer bestimmten Klangfarbe oder Instrumentierung, die Erkennung monophoner Themen in polyphonen Musikaufnahmen und die Klassifizierung von Musikstilen oder Spielweisen anhand melodischer Konturmerkmale. Die entwickelten Erkennungsverfahren wurden im Rahmen komplexer Musikszenarien (u.a. klassische Musik, Jazzmusik und Opernaufnahmen) experimentell getestet und ausgewertet. In der zweiten Projektphase erweitern wir unsere Ziele erheblich. Erstens betrachten wir neben dem Musikszenario die Erkennung von Umwelt- und Umgebungsgeräusche als zweite komplexe Audiodomäne. Zweitens kombinieren wir, als unsere zentrale Methodik, Aspekte von modellbasierten und datengetriebenen Verfahren, um aufgabenspezifische Darstellungsformen von Klangereignissen zu lernen. Darüber hinaus verfolgen wir integrative und hierarchische Strategien, um Schallereignisse auf verschiedenen Zeitskalen und hinsichtlich hierarchisch angeordneter Kategorien zu erfassen und zu analysieren. Unser übergeordnetes Ziel der zweiten Projektphase ist es, erklärbare und nachvollziehbare Deep-Learning-Modelle zu entwickeln, die ein besseres Verständnis der strukturellen und akustischen Eigenschaften von Klangquellen ermöglichen.
The following list provides an overview of the most important publicly accessible sources created in the ISAD2 project:
The following publications reflect the main scientific contributions of the work carried out in the ISAD2 project.
@article{Krause23_HierarchicalClassificationInstrument_IEEE-TASLP, author = {Michael Krause and Meinard M{\"u}ller}, title = {Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings}, journal = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing}, year={2023}, volume={31}, pages={2567--2578}, doi = {10.1109/TASLP.2023.3291506}, url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2023-TASLP-HierarchicalInstrumentClass/}, url-pdf = {https://ieeexplore.ieee.org/abstract/document/10171391} }
@article{AbesserGM23_PolyphonicSound_TASLP, author = {Jakob Abe{\ss}er and Sascha Grollmisch and Meinard M{\"u}ller}, title = {How Robust are Audio Embeddings for Polyphonic Event Tagging?}, journal = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing}, volume = {31}, pages = {2658--2667}, year = {2023}, doi = {10.1109/TASLP.2023.3293032}, url-pdf = {https://ieeexplore.ieee.org/document/10178070}, url-demo = {https://zenodo.org/record/7912746} }
@inproceedings{KrauseWM23_CrossVersionRep_ISMIR, author = {Michael Krause and Christof Wei{\ss} and Meinard M{\"u}ller}, title = {A Cross-Version Approach to Audio Representation Learning for Orchestral Music}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Milano, Italy}, year = {2023}, pages = {}, doi = {}, }
@inproceedings{KrauseSM23_WeakPitchCrossVersion_ISMIR, author = {Michael Krause and Sebastian Strahl and Meinard M{\"u}ller}, title = {Weakly Supervised Multi-Pitch Estimation Using Cross-Version Alignment}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {Milano, Italy}, year = {2023}, pages = {}, }
@article{BalkeRWAM22_JSD_TISMIR, author = {Stefan Balke and Julian Reck and Christof Wei{\ss} and Jakob Abe{\ss}er and Meinard M{\"u}ller}, title = {{JSD}: {A} Dataset for Structure Analysis in Jazz Music}, journal = {Transaction of the International Society for Music Information Retrieval ({TISMIR})}, volume = {5}, number = {1}, pages = {156--172}, year = {2022}, publisher = {Ubiquity Press}, doi = {doi.org/10.5334/tismir.131}, url = {https://doi.org/10.5334/tismir.131}, url-pdf = {2022_BalkeRWAM_JSD_TISMIR_ePrint.pdf}, url-demo = {https://github.com/stefan-balke/jsd} }
@inproceedings{KrauseM22_HierarchyClass_ICASSP, author = {Michael Krause and Meinard M{\"u}ller}, title = {Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, pages = {406--410}, address = {Singapore}, year = {2022}, doi = {10.1109/ICASSP43922.2022.9747690} }
@article{AbesserM21_JazzBassTranscription_Electronics, author = {Jakob Abe{\ss}er and Meinard M{\"u}ller}, title = {Jazz Bass Transcription Using a {U}-Net Architecture}, journal = {Electronics}, volume = {10}, number = {6}, pages = {670:1--11}, year = {2021}, doi = {10.3390/electronics10060670}, url-pdf = {2021_AbesserM_JazzBassTranscription_Electronics.pdf} }
@article{KrauseMW21_OperaSingingActivity_Electronics, author = {Michael Krause and Meinard M{\"u}ller and Christof Wei{\ss}}, title = {Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization}, journal = {Electronics}, volume = {10}, number = {10}, pages = {1214:1--14}, year = {2021}, doi = {10.3390/electronics10101214}, url-pdf = {2021_KrauseMW_OperaSingingActivity_Electronics.pdf} }
@phdthesis{Krause23_ActivityDetectionMusic_PhD, author = {Michael Krause}, year = {2023}, title = {Activity Detection for Sound Events in Orchestral Music Recordings}, school = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg}, url-details = {}, url-pdf = {} }