ISMIR 2025 Tutorial: Differentiable Alignment Techniques for Music Processing

This is the accompanying website for the tutorial Differentiable Alignment Techniques for Music Processing: Techniques and Applications offered at the International Society for Music Information Retrieval Conference 2025.

Content

A core strategy in Music Information Retrieval (MIR) is to use mid-level representations to connect and analyze music-related information across different domains. For example, these representations help link audio recordings to symbolic data such as pitches, chords, and lyrics. While traditional MIR approaches relied on expert knowledge to design these representations, recent advances in deep learning have made it possible to learn them directly from annotated data. This shift has led to major progress in tasks such as music transcription, chord recognition, pitch tracking, version identification, and lyrics alignment. A key challenge in training deep learning models for these tasks is the limited availability of strongly aligned datasets, which provide detailed frame-level annotations but are costly and time-consuming to produce. In contrast, weakly aligned data offers only coarse segment-level correspondences, making it easier to collect but harder to use with standard training methods. This tutorial addresses the problem by introducing differentiable alignment techniques, which enable models to learn from weakly aligned data using alignment-aware and fully differentiable loss functions. We begin with an intuitive overview of classical methods such as Dynamic Time Warping (DTW), followed by differentiable alternatives like Soft-DTW and Connectionist Temporal Classification (CTC) loss. The tutorial also introduces key concepts such as convex optimization and gradient computation, which are essential for integrating these methods into end-to-end learning systems. Applications in MIR are illustrated through case studies including multi-pitch estimation, transcription, score-audio alignment, and cross-version retrieval. This tutorial is intended for a broad audience and emphasizes both conceptual clarity and practical relevance. It equips participants with a solid understanding of how differentiable alignment techniques enable the training of deep models using weakly or partially aligned data. These methods are becoming increasingly important in MIR and other domains involving time-based multimedia data.

Topics and Slides

Resources