AudioLabs - Ferienakademie 2024, Sarntal (22.09.

Ferienakademie 2024, Sarntal (22.09. - 04.10.2024)

Course 8: Learning with Music Signals

Main Tutor/Lecturer: Simon Schwär, Prof. Dr. Meinard Müller

Group: Fundamental Frequency (F0) Estimation

The extraction of fundamental frequency (F0) information from music recordings is a crucial task in the field of music information retrieval. The sequence of F0 estimates over successive time frames (also called F0 trajectory) often corresponds to a melodic phrase and serves as a representation for downstream tasks such as automatic music transcription and performance analysis. A large number of algorithms and tools for F0 estimation have been proposed in the literature. In this group, we delve into this important topic, studying various approaches to F0 estimation, including traditional signal processing methods such as YIN and SWIPE, and data-driven methods using deep learning, including CREPE, SPICE, and PESTO. In doing so, we also gain a deeper understanding of the acoustic properties of music signals.

Literature

Alain de Cheveigné and Hideki Kawahara
YIN, a fundamental frequency estimator for speech and music.
Journal of the Acoustical Society of America (JASA), 111(4): 1917–1930, 2002. PDF

@article{CheveigneK02_YIN_JASA,
author    = {Alain de Cheveigné and Hideki Kawahara},
title     = {{YIN}, a fundamental frequency estimator for speech and music.},
journal   = {Journal of the Acoustical Society of America (JASA)},
year      = {2002},
volume    = {111},
pages     = {1917--1930},
number    = {4},
url-pdf   = {2002_CheveigneK_YIN_JASA.pdf}
}

Arturo Camacho and John G. Harris
A sawtooth waveform inspired pitch estimator for speech and music
The Journal of the Acoustical Society of America, 124(3): 1638–1652, 2008. PDF

@article{CamachoH08_SawtoothWaveform_JASA,
author    = {Arturo Camacho and John G. Harris},
title     = {A sawtooth waveform inspired pitch estimator for speech and music},
publisher = {ASA},
year      = {2008},
journal   = {The Journal of the Acoustical Society of America},
volume    = {124},
number    = {3},
pages     = {1638--1652},
url-pdf   = {2008_CamachoH_SWIPE_JASA.pdf}
}

Justin Salamon and Emilia Gómez
Melody Extraction from Polyphonic Music Signals using Pitch Contour Characteristics
IEEE Transactions on Audio, Speech, and Language Processing, 20(6): 1759–1770, 2012. PDF DOI

@article{SalamonG12_MelodyExtraction_TASLP,
Author = {Justin Salamon and Emilia G{\'o}mez},
Title = {Melody Extraction from Polyphonic Music Signals using Pitch Contour Characteristics},
Journal = {IEEE Transactions on Audio, Speech, and Language Processing},
Number = {6},
Volume = {20},
Pages = {1759--1770},
Year = {2012},
doi  = {10.1109/TASL.2012.2188515},
url-pdf   = {2012_SalamonG_Melodia_TALSP.pdf}
}

Matthias Mauch and Simon Dixon
pYIN: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions
In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 659–663, 2014. PDF

@inproceedings{MauchD14_pYIN_ICASSP,
author    = {Matthias Mauch and Simon Dixon},
title     = {{pYIN}: A Fundamental Frequency Estimator Using Probabilistic Threshold Distributions},
booktitle = {{IEEE} International Conference on Acoustics, Speech and Signal Processing ({ICASSP})},
year      = {2014},
address  = {Florence, Italy},
pages     = {659--663},
url-pdf   = {2014_MauchD_pYIN_ICASSP.pdf}
}

Jong Wook Kim, Justin Salamon, Peter Li, and Juan Pablo Bello
CREPE: A Convolutional Representation for Pitch Estimation
In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP): 161–165, 2018. PDF DOI

@inproceedings{KimSLB18_CREPE_ICASSP,
author    = {Jong Wook Kim and Justin Salamon and Peter Li and Juan Pablo Bello},
title     = {{CREPE}: {A} Convolutional Representation for Pitch Estimation},
booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech and Signal Processing ({ICASSP})},
address   = {Calgary, Canada},
pages     = {161--165},
year      = {2018},
doi       = {10.1109/ICASSP.2018.8461329},
url-pdf   = {2018_KimSLB_PtichEstCREPE_ICASSP_arXiv}
}

Beat Gfeller, Christian Frank, Dominik Roblek, Matthew Sharifi, Marco Tagliasacchi, and Mihajlo Velimirovic
SPICE: Self-supervised Pitch Estimation
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28: 1118–1128, 2020. PDF

@article{GfellerFRSTV20_SPICE_IEEE,
author    = {Beat Gfeller and Christian Frank and Dominik Roblek and Matthew Sharifi and Marco Tagliasacchi and Mihajlo Velimirovic},
title     = {{SPICE:} {S}elf-supervised Pitch Estimation},
journal   = {{IEEE/ACM} Transactions on Audio, Speech, and Language Processing},
volume    = {28},
pages     = {1118--1128},
year      = {2020},
url-pdf   = {2020_GfellerEtAl_SPICE_TASLP.pdf}
}

Riou, Alain, Lattner, Stefan, Hadjeres, Gaëtan, and Peeters, Geoffroy
PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective
In Proceedings of the International Society for Music Information Retrieval Conference (ISMIR): 535–544, 2023. PDF

@inproceedings{RiouLHP23_PESTO_ISMIR,
author    = {Riou, Alain and Lattner, Stefan and Hadjeres, Gaëtan and Peeters, Geoffroy},
title     = {{PESTO}: Pitch Estimation with Self-supervised Transposition-equivariant Objective},
booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})},
pages     = {535--544},
year      = {2023},
address   = {Milano, Italy},
url-pdf   = {2023_RiouLHP_PESTO_ISMIR.pdf}
}

Further Links

Section 8.2 of Fundamentals of Music Processing
Cool interactive analysis of American National Anthem performances
libf0 – Python library for fundamental frequency estimation
Python Notebooks for Fundamentals of Music Processing: FMP Notebooks

International Audio Laboratories Erlangen

Ferienakademie 2024, Sarntal (22.09. - 04.10.2024)

Group: Fundamental Frequency (F0) Estimation

Literature

Further Links