A. Adami, L. Brand, S. Disch and J. Herre
Applause is the result of many individuals rhythmically clapping their hands. Applause recordings exhibit a certain temporal, timbral and spatial structure: claps originating from a distinct direction (i.e, from a particular person) usually have a similar timbre and occur in a quasi-periodic repetition. Traditional upmix approaches for blind mono-to-stereo upmix do not consider these properties and may therefore produce an output with suboptimal perceptual quality to be attributed to a lack of plausibility. In this paper, we propose a blind upmixing approach of applause-like signals which aims at preserving the natural structure of applause signals by incorporating periodicity and timbral similarity of claps into the upmix process and therefore supporting plausibility of the artificially generated spatial scene [1].
The applause decomposition proposed in this paper is a modified version based on the approaches used in [2,3].
The Figure depicts a block diagram describing the basic structure of the applause decomposition processing.
Within the energy extraction stage, an instantaneous energy estimate as well as an average energy estimate is derived from the input applause signal and subsequently, the ratio of both is computed.
This ratio is gated/thresholded resulting in a separation gain which is applied to the input applause signal and yielding a foreground signal containing individually perceivable foreground claps and a background signal containing the more noise-like background.
Below, some sound examples of the decomposition are presented.
The upmix of the foreground signals was based on perceptual plausibility criteria, meaning the upmix exploited the assumptions that claps originating from a particular person exhibit
The Background signal was decorrelated using a modified version of the method proposed in [4] to yield a wider stereo background. Below, some sound examples of the resulting blind upmixed signals are presented.