Informed Sound Activity Detection in Music and Audio Signals (ISAD2)

Logo_DFG Teaser_ISAD2 Logo_FAU Logo_IDMT

In the ISAD2 project, we develop model-based and data-driven techniques for learning and detecting characteristic sound events in acoustic data including music recordings and environmental sounds. The project is funded by the German Research Foundation. On this website, we summarize the project's main objectives and provide links to project-related resources (data, demonstrators, websites) and publications.

Project Description

Informed Sound Activity Detection in Music and Audio Signals

In music information retrieval (MIR), the development of computational methods for analyzing, segmenting, and classifying music signals is of fundamental importance. In the project's first phase (2017-2020), we explored fundamental techniques for detecting characteristic sound events present in a given music recording. Here, our focus was on informed approaches that exploit musical knowledge in the form of score information, instrument samples, or musically salient sections. We considered concrete tasks such as locating audio sections with a specific timbre or instrument, identifying monophonic themes in complex polyphonic music recordings, and classifying music genres or playing styles based on melodic contours. We tested our approaches within complex music scenarios, including instrumental Western classical music, jazz, and opera recordings. In this second phase of the project, our goals are significantly extended. First, we go beyond the music scenario by considering environmental sounds as a second challenging audio domain. As a central methodology, we explore and combine the benefits of model-based and data-driven techniques to learn task-specific sound event representations. Furthermore, we investigate hierarchical approaches to simultaneously incorporate, exploit, learn, and capture sound events that manifest on different temporal scales and belong to hierarchically ordered categories. An overarching goal of the project's second phase is to develop explainable deep learning models that provide a better understanding of the structural and acoustic properties of sound events.

Projektbeschreibung

Informierte Klangquellenerkennung in Musik- und Audiosignalen

Im Bereich des Music Information Retrieval (MIR) ist die Entwicklung von computergestützten Methoden zur Analyse, Segmentierung und Klassifizierung von Musiksignalen von grundlegender Bedeutung. In der ersten Projektephase (2017-2020) untersuchten wir grundlegende Techniken zur Erkennung charakteristischer Klangereignisse, die in einer gegebenen Musikaufnahme vorhanden sind. Dabei lag unser Fokus auf Ansätzen, die musikalisches Wissen in Form von Notentextinformationen, Klangbeispielen oder musikalisch repräsentativen Musikpassagen nutzen. Zentrale Aufgabenstellungen bestanden im Auffinden von Audioabschnitten mit einer bestimmten Klangfarbe oder Instrumentierung, die Erkennung monophoner Themen in polyphonen Musikaufnahmen und die Klassifizierung von Musikstilen oder Spielweisen anhand melodischer Konturmerkmale. Die entwickelten Erkennungsverfahren wurden im Rahmen komplexer Musikszenarien (u.a. klassische Musik, Jazzmusik und Opernaufnahmen) experimentell getestet und ausgewertet. In der zweiten Projektphase erweitern wir unsere Ziele erheblich. Erstens betrachten wir neben dem Musikszenario die Erkennung von Umwelt- und Umgebungsgeräusche als zweite komplexe Audiodomäne. Zweitens kombinieren wir, als unsere zentrale Methodik, Aspekte von modellbasierten und datengetriebenen Verfahren, um aufgabenspezifische Darstellungsformen von Klangereignissen zu lernen. Darüber hinaus verfolgen wir integrative und hierarchische Strategien, um Schallereignisse auf verschiedenen Zeitskalen und hinsichtlich hierarchisch angeordneter Kategorien zu erfassen und zu analysieren. Unser übergeordnetes Ziel der zweiten Projektphase ist es, erklärbare und nachvollziehbare Deep-Learning-Modelle zu entwickeln, die ein besseres Verständnis der strukturellen und akustischen Eigenschaften von Klangquellen ermöglichen.

Projected-Related Activities

Projected-Related Resources and Demonstrators

The following list provides an overview of the most important publicly accessible sources created in the ISAD2 project:

Projected-Related Publications

The following publications reflect the main scientific contributions of the work carried out in the ISAD2 project.

  1. Michael Krause and Meinard Müller
    Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31: 2567–2578, 2023. PDF Details DOI
    @article{Krause23_HierarchicalClassificationInstrument_IEEE-TASLP,
    author = {Michael Krause and Meinard M{\"u}ller},
    title = {Hierarchical Classification for Instrument Activity Detection in Orchestral Music Recordings},
    journal = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing},
    year={2023},
    volume={31},
    pages={2567--2578},
    doi = {10.1109/TASLP.2023.3291506},
    url-details = {https://www.audiolabs-erlangen.de/resources/MIR/2023-TASLP-HierarchicalInstrumentClass/},
    url-pdf = {https://ieeexplore.ieee.org/abstract/document/10171391}
    }
  2. Jakob Abeßer, Sascha Grollmisch, and Meinard Müller
    How Robust are Audio Embeddings for Polyphonic Event Tagging?
    IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31: 2658–2667, 2023. PDF Demo DOI
    @article{AbesserGM23_PolyphonicSound_TASLP,
    author      = {Jakob Abe{\ss}er and Sascha Grollmisch and Meinard M{\"u}ller},
    title       = {How Robust are Audio Embeddings for Polyphonic Event Tagging?},
    journal     = {{IEEE}/{ACM} Transactions on Audio, Speech, and Language Processing},
    volume      = {31},
    pages       = {2658--2667},
    year        = {2023},
    doi         = {10.1109/TASLP.2023.3293032},
    url-pdf     = {https://ieeexplore.ieee.org/document/10178070},
    url-demo    = {https://zenodo.org/record/7912746}
    }
  3. Michael Krause, Christof Weiß, and Meinard Müller
    A Cross-Version Approach to Audio Representation Learning for Orchestral Music
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2023. DOI
    @inproceedings{KrauseWM23_CrossVersionRep_ISMIR,
    author    = {Michael Krause and Christof Wei{\ss} and Meinard M{\"u}ller},
    title     = {A Cross-Version Approach to Audio Representation Learning for Orchestral Music},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {Milano, Italy},
    year      = {2023},
    pages     = {},
    doi       = {},
    }
  4. Michael Krause, Sebastian Strahl, and Meinard Müller
    Weakly Supervised Multi-Pitch Estimation Using Cross-Version Alignment
    In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2023.
    @inproceedings{KrauseSM23_WeakPitchCrossVersion_ISMIR,
    author    = {Michael Krause and Sebastian Strahl and Meinard M{\"u}ller},
    title     = {Weakly Supervised Multi-Pitch Estimation Using Cross-Version Alignment},
    booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
    address   = {Milano, Italy},
    year      = {2023},
    pages     = {},
    }
  5. Stefan Balke, Julian Reck, Christof Weiß, Jakob Abeßer, and Meinard Müller
    JSD: A Dataset for Structure Analysis in Jazz Music
    Transaction of the International Society for Music Information Retrieval (TISMIR), 5(1): 156–172, 2022. PDF Demo DOI
    @article{BalkeRWAM22_JSD_TISMIR,
    author = {Stefan Balke and Julian Reck and Christof Wei{\ss} and Jakob Abe{\ss}er and Meinard M{\"u}ller},
    title = {{JSD}: {A} Dataset for Structure Analysis in Jazz Music},
    journal = {Transaction of the International Society for Music Information Retrieval ({TISMIR})},
    volume = {5},
    number = {1},
    pages = {156--172},
    year = {2022},
    publisher = {Ubiquity Press},
    doi = {doi.org/10.5334/tismir.131},
    url       = {https://doi.org/10.5334/tismir.131},
    url-pdf   = {2022_BalkeRWAM_JSD_TISMIR_ePrint.pdf},
    url-demo = {https://github.com/stefan-balke/jsd}
    }
  6. Michael Krause and Meinard Müller
    Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings
    In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP): 406–410, 2022. DOI
    @inproceedings{KrauseM22_HierarchyClass_ICASSP,
    author    = {Michael Krause and Meinard M{\"u}ller},
    title     = {Hierarchical Classification for Singing Activity, Gender, and Type in Complex Music Recordings},
    booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})},
    pages     = {406--410},
    address   = {Singapore},
    year      = {2022},
    doi       = {10.1109/ICASSP43922.2022.9747690}
    }
  7. Jakob Abeßer and Meinard Müller
    Jazz Bass Transcription Using a U-Net Architecture
    Electronics, 10(6): 1–11, 2021. PDF DOI
    @article{AbesserM21_JazzBassTranscription_Electronics,
    author    = {Jakob Abe{\ss}er and Meinard M{\"u}ller},
    title     = {Jazz Bass Transcription Using a {U}-Net Architecture},
    journal   = {Electronics},
    volume    = {10},
    number     = {6},
    pages     = {670:1--11},
    year      = {2021},
    doi       = {10.3390/electronics10060670},
    url-pdf   = {2021_AbesserM_JazzBassTranscription_Electronics.pdf}
    }
  8. Michael Krause, Meinard Müller, and Christof Weiß
    Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization
    Electronics, 10(10): 1–14, 2021. PDF DOI
    @article{KrauseMW21_OperaSingingActivity_Electronics,
    author    = {Michael Krause and Meinard M{\"u}ller and Christof Wei{\ss}},
    title     = {Singing Voice Detection in Opera Recordings: A Case Study on Robustness and Generalization},
    journal   = {Electronics},
    volume    = {10},
    number    = {10},
    pages     = {1214:1--14},
    year      = {2021},
    doi       = {10.3390/electronics10101214},
    url-pdf   = {2021_KrauseMW_OperaSingingActivity_Electronics.pdf}
    }

Projected-Related Ph.D. Theses

  1. Michael Krause
    Activity Detection for Sound Events in Orchestral Music Recordings
    PhD Thesis, Friedrich-Alexander-Universität Erlangen-Nürnberg, 2023. Details
    @phdthesis{Krause23_ActivityDetectionMusic_PhD,
    author      = {Michael Krause},
    year        = {2023},
    title       = {Activity Detection for Sound Events in Orchestral
    Music Recordings},
    school      = {Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg},
    url-details = {},
    url-pdf     = {}
    }