Estimating the fundamental frequency (F0) of a signal is a well studied task in audio signal processing with many applications. If the F0 varies over time, the complexity increases, and it is also more difficult to provide ground truth data for evaluation. In this project we present a dataset of cello recordings addressing the lack of reference annotations for musical instruments. Besides audio data, we include sensor recordings capturing the finger position on the fingerboard which is converted into an instantaneous frequency estimate. This is similar to speech processing, where the electroglottograph (EGG) is able to capture the excitation signal of the vocal tract, which is then used to generate a reference instantaneous F0. Inspired by this approach, we included high speed video camera recordings to extract the excitation signal originating from the moving string. The derived data can be used to analyze vibratos — a very commonly used playing style.
High-quality audio serves as the basis for this test set. The recordings took place in a professional recording studio. We used an AKG C414 condenser microphone placed at a distance of approx. 30 cm from the cello bridge. The audio was sampled by an RME MADIface A/D converter. The sample rate is 48000 kHz at 24 bit (for the raw dataset). All files are available as uncompressed PCM files.
To measure this position we used a linear membrane potentiometer. The base layer consists of material with length dependent resistance. The middle layer is made of highly conductive material. The sensor therefore acts as a resistance linearly dependent on the point where the circuit is shorted. The benefits of this sensor type are described in. We chose sensors of length 100 mm; a trade-off between the sensors linearity and convenience to capture multiple notes from a single finger position. The membrane sensor is thin enough to be attached on the fingerboard of the cello. Additionally the motion of the finger on the fingerboard was recorded using a 3-axis accelerometers (Texas Instruments LM335), taped on the players fingers. All sensor data was sampled by an Arduino Due microcontroller at 12 bit resolution.
To cell string is excited by the bow, which is then transferred to the cello body by the bridge. By using a high speed camera focusing on the string, we are able to extract the movement pattern from the video signal. We used a professional high-speed camera (made by Fraunhofer IIS) and bright light sources specifically designed to minimize flicker present on the image when standard bulbs. Camera data is available as uncompressed raw video. To also be able to extract the slowly varying finger motion, we ensured the camera captured the finger movements as well as the moving string. One close-up video was made from each of the two players playing the D♯3 note. We decided not to include high speed camera videos for the complete test set, because of the loud noise emitted from the camera, as well as the bright set light which needs constant fan cooling.
To pre-process the MUSERC RAW datasets several choices had been made. To allow researchers to reproduces and understand our choices the pre-processing methods are publicly available in our Github repository
MUSERC is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. If you want to use this dataset in your academic research, please cite our paper:
@inproceedings{Stoter:2015:MCR:2733373.2806384, author = {St\"{o}ter, Fabian-Robert and M\"{u}ller, Michael and Edler, Bernd}, title = {Multi-Sensor Cello Recordings for Instantaneous Frequency Estimation}, booktitle = {Proceedings of the 23rd Annual ACM Conference on Multimedia Conference}, series = {MM '15}, year = {2015}, isbn = {978-1-4503-3459-4}, location = {Brisbane, Australia}, pages = {995--998}, numpages = {4}, url = {http://doi.acm.org/10.1145/2733373.2806384}, doi = {10.1145/2733373.2806384}, acmid = {2806384}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {cello, dataset, fundamental frequency estimation, sensors, visual acoustics}, }
The actual dataset comes in many flavors to download. Choose your files here.
Dataset
|
Description
|
Domains
|
Number of Items
|
Download Size
|
Download
|
MUSERC SA RAW
|
Raw Recordings
|
Sensor, Audio
|
13 (continous recordings)
|
339 MB
|
|
MUSERC SAV RAW
|
Raw Recordings
|
Sensor, Audio, Video
|
2
|
3.5 GB
|
|
MUSERC SA PRE
|
Pre-Processed Recordings
|
Sensor, Audio
|
140
|
50 MB
|
t.b.a
|
MUSERC SAV PRE
|
Pre-Processed Recordings
Compressed Video
|
Sensor, Audio, Video
|
2
|
1.2 GB
|
t.b.a
|
MUSERC SA
|
Pre-Processed Recordings
+ derived data
|
Sensor, Audio
|
140
|
50 MB
|
t.b.a
|
MUSERC SAV
|
Pre-Processed Recordings
+ derived data
|
Sensor, Audio, Video
|
2
|
1.2 GB
|
t.b.a
|
We thank Karlheinz Busch for his professional cello play for this recordings.