AudioLabs - Blind Upmix for Applause-Like Signals Based on Perceptual Plausibility Criteria

Blind Upmix for Applause-Like Signals Based on Perceptual Plausibility Criteria

A. Adami, L. Brand, S. Disch and J. Herre

Abstract

Applause is the result of many individuals rhythmically clapping their hands. Applause recordings exhibit a certain temporal, timbral and spatial structure: claps originating from a distinct direction (i.e, from a particular person) usually have a similar timbre and occur in a quasi-periodic repetition. Traditional upmix approaches for blind mono-to-stereo upmix do not consider these properties and may therefore produce an output with suboptimal perceptual quality to be attributed to a lack of plausibility. In this paper, we propose a blind upmixing approach of applause-like signals which aims at preserving the natural structure of applause signals by incorporating periodicity and timbral similarity of claps into the upmix process and therefore supporting plausibility of the artificially generated spatial scene [1].

Applause Separation Demo

The applause decomposition proposed in this paper is a modified version based on the approaches used in [2,3]. The Figure depicts a block diagram describing the basic structure of the applause decomposition processing. Within the energy extraction stage, an instantaneous energy estimate as well as an average energy estimate is derived from the input applause signal and subsequently, the ratio of both is computed. This ratio is gated/thresholded resulting in a separation gain which is applied to the input applause signal and yielding a foreground signal containing individually perceivable foreground claps and a background signal containing the more noise-like background. Below, some sound examples of the decomposition are presented.

2 people

4 people

8 people

16 people

32 people

64 people

128 people

Blind Upmix Demo

The upmix of the foreground signals was based on perceptual plausibility criteria, meaning the upmix exploited the assumptions that claps originating from a particular person exhibit

similar spectral envelopes and
some form of temporal periodicity.

The Background signal was decorrelated using a modified version of the method proposed in [4] to yield a wider stereo background. Below, some sound examples of the resulting blind upmixed signals are presented.

2 people

4 people

8 people

16 people

32 people

64 people

128 people

References

Adami, A. and Brand, L. and Disch, S. and Herre, J., "Blind Upmix for Applause-Like Signals Based on Perceptual Plausibility Criteria", In Proceedings of the 20th International Conference on Digital Audio Effects (DAFx-17), pages 496–501, Edinburgh, UK, 2017.
Adami, A. and Herre, J., "Perception and Measurement of Applause Characteristics: Wahrnehmung und Messung von Applauseigenschaften", In Proceedings of the 29th Tonmeistertagung (TMT29), pages 199-206, Cologne, Germany, 2016
Adami, A. and Brand, L. and Herre, J., "Investigations Towards Plausible Blind Upmixing of Applause Signals", In 142nd International Convention of the AES, Berlin, Germany, 2017
Hotho, G. and van de Par, S. and Breebaart, J., "Multichannel Coding of Applause Signals", EURASIP Journal on Advances in Signal Processing, vol. 2008, 2008

International Audio Laboratories Erlangen

Blind Upmix for Applause-Like Signals Based on Perceptual Plausibility Criteria

Abstract

Applause Separation Demo

2 people

4 people

8 people

16 people

32 people

64 people

128 people

Blind Upmix Demo

2 people

4 people

8 people

16 people

32 people

64 people

128 people

References