This is the accompanying website for the paper "NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization" by Patricio López-Serrano, Christian Dittmar, Yiğitcan Özer and Meinard Müller.
Nonnegative matrix factorization (NMF) is a family of methods widely used for information retrieval across domains including text, images, and audio. Within music processing, NMF has been used for tasks such as transcription, source separation, and structure analysis. Prior work has shown that initialization and constrained update rules can drastically improve the chances of NMF converging to a musically meaningful solution. Along these lines we present the NMF toolbox, containing MATLAB and Python implementations of conceptually distinct NMF variants---in particular, this paper gives an overview for two algorithms. The first variant, called nonnegative matrix factor deconvolution (NMFD), extends the original NMF algorithm to the convolutive case, enforcing the temporal order of spectral templates. The second variant, called diagonal NMF, supports the development of sparse diagonal structures in the activation matrix. Our toolbox contains several demo applications and code examples to illustrate its potential and functionality. By providing MATLAB and Python code on a documentation website under a GNU-GPL license, as well as including illustrative examples, our aim is to foster research and education in the field of music processing.
The toolbox is available both for MATLAB and Python. We additionally implemented unit tests to ensure that the results for both programming languages are consistent. A small dataset of example audiofiles is provided for demonstration purposes.
Here is the link to the toolbox: NMFtoolbox.zip
The folder structure is as follows:
Folder Name | Description |
---|---|
data | Dataset directory |
matlab | MATLAB implementation |
python | Python implementation |
unit_tests | Includes the unit tests to ensure that results on both programming languages are consistent. |
MATLAB implementation requires a MATLAB version of 2016a. Please remark the folder structure while using it.
demoAudioMosaicingContinuityNMF
demoDrumSoundSeparationNMF
demoEDMDecompositionFourComp
demoDrumExtractionKAM_NMF_percThreshold
demoDrumExtractionKAM_NMF_scoreInformed
Filename | Description and main parameters |
---|---|
NMFD.m |
Nonnegative Matrix Factor Deconvolution with KLD and fixable components [2]. V , numComp , numIter , numTemplateFrames , initW , initH , paramConstr , fixH |
NMF.m |
Nonnegative matrix factorization with KLD as default cost function [3], [4]. V , costFunc , numIter , numComp . |
NMFdiag.m |
Nonnegative matrix factorization with enhanced diagonal continuity constraints [5]. V , W0 , H0 , distmeas , numOfIter , fixW , continuity.length , continuity.grid , continuity.sparsen , continuity.polyphony |
NMFconv.m |
Convolutive NMF with beta-divergence [6]. V , numComp , numIter , numTemplateFrames , initW , initH , beta , sparsityWeight , uncorrWeight |
convModel.m |
Convolutive NMF model implementing Eq. (4) from [7]. Note that it can also be used to compute the standard NMF model in case the number of time frames of the templates equals one. W , H |
shiftOperator.m |
Shift operator as described in Eq. (5) from [7]. It shifts the columns of a matrix to the left or the right and fills undefined elements with zeros. A , shiftAmount |
initActivations.m |
Initialization strategies for NMF activations, including random and uniform . The pitched strategy places gate-like activations at the frames where certain notes are active in the ground truth [8]. The strategy drums uses decaying impulses at these positions [7]. numComp , numFrames , deltaT , pitches , onsets , durations , drums , decay , onsetOffsetTol , tolerance , strategy |
initTemplates.m |
NMF template initialization strategies, including random and uniform . The strategy pitched uses comb-filter templates [8]. The drums strategy uses pre-extracted averaged spectra of typical drum types. numComp , numBins , numTemplateFrames , pitches , drumTypes , strategy |
NEMA.m |
Row-wise nonlinear exponential moving average. Used to introduce exponentially decaying slopes according to Eq. (3) from [9]. lambda |
midi2freq.m , freq2midi.m , logFreqLogMag.m |
Helper functions to convert between MIDI pitches and frequencies in Hz, as well as log-frequency and log-magnitude representations for visualization. midi , freq , A , deltaF , binsPerOctave , upperFreq , lowerFreq |
LSEE_MSTFTM_GriffinLim.m , forwardSTFT.m , inverseSTFT.m |
Reconstruct the time-domain signal by means of the frame-wise inverse FFT and overlap-add method described as least squares error estimation from the modified STFT magnitude (LSEE-MSTFT) in[10]. blockSize , hopSize , anaWinFunc , synWinFunc , reconstMirror , appendFrame , analyticSig , numSamples |
alphaWienerFilter.m |
Alpha-related soft masks for extracting sources from mixture. Details in [11] and experiments in [12]. alpha , binarize |
Python implementation needs python3.6.
virtualenv env -p python3.6
source env/bin/activate
cd python
python setup.py develop
jupyter notebook
virtualenv env
env\Scripts\activate
cd python
python setup.py develop
jupyter notebook
demoAudioMosaicingContinuityNMF
demoDrumSoundSeparationNMF
demoEDMDecompositionFourComp
demoDrumExtractionKAM_NMF_percThreshold
demoDrumExtractionKAM_NMF_scoreInformed
We implemented NMFtoolbox as a Python library which contains the above MATLAB scripts with exactly the same names and parameters. Please note, that some data structures were implemented on Python as dictionaries. We also needed to write an additional utils.py
helper script to provide some functionalities that are not that straightforward as on MATLAB.
Filename | Description and main parameters |
---|---|
NMFD.py |
Nonnegative Matrix Factor Deconvolution with KLD and fixable components [2]. V , numComp , numIter , numTemplateFrames , initW , initH , paramConstr , fixH |
NMF.py |
Nonnegative matrix factorization with KLD as default cost function [3], [4]. V , costFunc , numIter , numComp . |
NMFdiag.py |
Nonnegative matrix factorization with enhanced diagonal continuity constraints [5]. V , W0 , H0 , distmeas , numOfIter , fixW , continuity['length'] , continuity['grid'] , continuity['sparsen'] , continuity['polyphony'] |
NMFconv.py |
Convolutive NMF with beta-divergence [6]. V , numComp , numIter , numTemplateFrames , initW , initH , beta , sparsityWeight , uncorrWeight |
convModel.py |
Convolutive NMF model implementing Eq. (4) from [7]. Note that it can also be used to compute the standard NMF model in case the number of time frames of the templates equals one. W , H |
shiftOperator.py |
Shift operator as described in Eq. (5) from [7]. It shifts the columns of a matrix to the left or the right and fills undefined elements with zeros. A , shiftAmount |
initActivations.py |
Initialization strategies for NMF activations, including random and uniform . The pitched strategy places gate-like activations at the frames where certain notes are active in the ground truth [8]. The strategy drums uses decaying impulses at these positions [7]. numComp , numFrames , deltaT , pitches , onsets , durations , drums , decay , onsetOffsetTol , tolerance , strategy |
initTemplates.py |
NMF template initialization strategies, including random and uniform . The strategy pitched uses comb-filter templates [8]. The drums strategy uses pre-extracted averaged spectra of typical drum types. numComp , numBins , numTemplateFrames , pitches , drumTypes , strategy |
NEMA.py |
Row-wise nonlinear exponential moving average. Used to introduce exponentially decaying slopes according to Eq. (3) from [9]. lambda |
midi2freq.py , freq2midi.py , logFreqLogMag.py |
Helper functions to convert between MIDI pitches and frequencies in Hz, as well as log-frequency and log-magnitude representations for visualization. midi , freq , A , deltaF , binsPerOctave , upperFreq , lowerFreq |
LSEE_MSTFTM_GriffinLim.py , forwardSTFT.py , inverseSTFT.py |
Reconstruct the time-domain signal by means of the frame-wise inverse FFT and overlap-add method described as least squares error estimation from the modified STFT magnitude (LSEE-MSTFT) in[10]. blockSize , hopSize , anaWinFunc , synWinFunc , reconstMirror , appendFrame , analyticSig , numSamples |
alphaWienerFilter.py |
Alpha-related soft masks for extracting sources from mixture. Details in [11] and experiments in [12]. alpha , binarize |
utils.py |
Additional helper functions on Python. |
The test script 'run_test.py' serves as the main workhorse to test different functions. Please note, that the unit tests require MATLAB to generate the reference files.
usage: run_test.py [-h] [-f <FunctionName>] [-m <MatlabPath>]
optional arguments:
-h, --help show this help message and exit
-f <FunctionName>, --function_name <FunctionName>
Function name to test
-m <MatlabPath>, --matlab_path <MatlabPath>
Path to matlab binary file
Here is an example call that shows how to run the unit test for the core NMFD function.
python run_test.py -f NMFD -m /usr/local/MATLAB/R2019a/bin/matlab
This is the accompanying website for [1], where further details on the toolbox, dataset, and the applications are discussed.
@inproceedings{LopezSerranoDOEM19_NMFToolbox_DAFx, author = {Patricio L\{'o}pez-Serrano and Christian Dittmar and Yi{ğ}itcan \{"O}zer and Meinard M\"uller}, booktitle = {Proceedings of the International Conference on Digital Audio Effects ({DAFx})}, title = {NMF Toolbox: Music Processing Applications of Nonnegative Matrix Factorization}, year = {2019}, month = {September}, address = {Birmingham, UK}, pages = {}, }
@inproceedings{Smaragdis04_NMD, author = {Paris Smaragdis}, title = {Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs}, booktitle = {Proceedings of the International Conference on Independent Component Analysis and Blind Signal Separation {(ICA)}}, pages = {494--499}, address = {Granada, Spain}, year = {2004}, month = {September}, }
@book{Mueller15_FMP_SPRINGER, author = {Meinard M{\"u}ller}, title = {Fundamentals of Music Processing}, type = {Monograph}, year = {2015}, isbn = {978-3-319-21944-8}, publisher = {Springer Verlag} }
@article{LeeS99_LearningPartsNMF_Nature, author={Daniel D. Lee and H. Sebastian Seung}, title={Learning the parts of objects by non-negative matrix factorization}, volume={401}, number={6755}, journal={Nature}, year={1999}, pages={788--791} }
@inproceedings{DriedgerPM15_AudioMosaicingNMF_ISMIR, author = {Jonathan Driedger and Thomas Pr{\"a}tzlich and Meinard M{\"u}ller}, title = {{L}et {I}t {B}ee -- {T}owards {NMF}-Inspired Audio Mosaicing}, booktitle = {Proceedings of the International Society for Music Information Retrieval Conference ({ISMIR})}, address = {M\'{a}laga, Spain}, year = {2015}, pages = {350--356}, }
@Book{CichockiZP_AlternateAlgorithmsNmf_Book, author = {Andrzej Cichocki and Rafal Zdunek and Anh Huy Phan and {Shun-ichi} Amari}, title = {Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-Way Data Analysis and Blind Source Separation}, publisher = {John Wiley and Sons}, year = {2009} }
@article{DittmarMueller16_DrumSep_IEEE-ACM-TASLP, author = {Christian Dittmar and Meinard M{\"u}ller}, title = {Reverse Engineering the {A}men Break -- Score-Informed Separation and Restoration Applied to Drum Recordings}, journal = {{IEEE/ACM} Transactions on Audio, Speech, and Language Processing}, volume = {24}, number = {9}, pages = {1531--1543}, year = {2016}, doi = {10.1109/TASLP.2016.2567645}, }
@inproceedings{DriedgerGPEM13_AudioDecomposition_ACM-MM, author = {Jonathan Driedger and Harald Grohganz and Thomas Pr{\"a}tzlich and Sebastian Ewert and Meinard M{\"u}ller}, title = {Score-Informed Audio Decomposition and Applications}, booktitle = {Proceedings of the {ACM} International Conference on Multimedia ({ACM-MM})}, address = {Barcelona, Spain}, year = {2013}, pages = {541--544}, url-pdf = {2013_DriedgerGPEM_SourceSeparationInterface_ACM.pdf}, url-details = {https://www.audiolabs-erlangen.de/resources/2013-ACMMM-AudioDecomp/} }
@inproceedings{DittmarLM18_HPSS_KAM_NMF_ICASSP, author = {Christian Dittmar and Patricio L{\'o}pez-Serrano and Meinard M{\"u}ller}, title = {Unifying Local and Global Methods for Harmonic-Percussive Source Separation}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech, and Signal Processing ({ICASSP})}, address = {Calgary, Canada}, month = {April}, year = {2018}, pages = {176--180}, url-demo={https://www.audiolabs-erlangen.de/resources/MIR/2018-ICASSP-HPSS_KAM_NMF}, }
@article{GriffinL84_SpecgramInversion_TASSP, author={Daniel W. Griffin and Jae S. Lim}, title={Signal estimation from modified short-time {F}ourier transform}, journal={{IEEE} Transactions on Acoustics, Speech, and Signal Processing}, year={1984}, volume={32}, number={2}, pages={236--243} }
@inproceedings{LiutkusB15_WienerFilter_ICASSP, author = {Antoine Liutkus and Roland Badeau}, booktitle = {Proceedings of the {IEEE} International Conference on Acoustics, Speech and Signal Processing ({ICASSP})}, title = {Generalized {W}iener filtering with fractional power spectrograms}, year = {2015}, month = {April}, pages = {266--270}, address = {Brisbane, Australia}, }
@inproceedings{DittmarDMP16_WienerFiltering_EUSIPCO, author = {Christian Dittmar and Jonathan Driedger and Meinard M{\"u}ller and Jouni Paulus}, title = {An Experimental Approach to Generalized {W}iener Filtering in Music Source Separation}, booktitle = {Proceedings of the European Signal Processing Conference ({EUSIPCO})}, address = {Budapest, Hungary}, year = {2016}, pages = {}, month = {August}, url-pdf = {} }