Skip to content

Speech Dereverberation via Coherent to Diffuse Power Ratio Estimators (CDR)

Room Recordings and Acoustic Effects: During the process of recording conversations, various acoustic effects emerge, such as background noise and echo-like reflections of speech off room surfaces. This phenomenon is referred to as reverberation. As sound encounters and is absorbed by room...

A Discourse on Dereverberation through Coherent to Diffuse Power Ratio Estimators (CDR)
A Discourse on Dereverberation through Coherent to Diffuse Power Ratio Estimators (CDR)

Speech Dereverberation via Coherent to Diffuse Power Ratio Estimators (CDR)

A novel approach to dereverberation, particularly suitable for applications with small microphone arrays requiring low computational resources, has been developed using Coherent to Diffuse Power Ratio (CDR) estimators. This technique aims to enhance the direct sound component and reduce reverberation, thereby improving the performance of Automatic Speech Recognition (ASR) systems.

CDR Estimation

The method leverages the spatial characteristics of the sound field captured by the two microphones. The direct speech is modeled as a coherent wave arriving at both microphones with a known phase difference, while reverberation is modeled as a diffuse sound field, which is spatially incoherent over the microphone pair. The CDR estimator quantifies the proportion of coherent speech power relative to diffuse reverberation power in each frequency bin.

Dereverberation Process

By applying the CDR estimate to the multichannel microphone signals, the system filters or weighs the frequency components, emphasizing those with higher coherent (direct) power and suppressing those dominated by diffuse (reverberant) power. This selective enhancement reduces reverberation effects.

Integration with ASR

The dereverberated signal, now with improved speech clarity and less reverberation distortion, improves the robustness and accuracy of the ASR system.

The underlying assumption enabling CDR-based dereverberation is that the direct speech and reverberation differ in spatial coherence, which is measurable from just two microphones by analyzing their cross-spectral densities. This approach is practical for small microphone arrays and computationally efficient compared to full spatial filtering or blind signal separation methods.

CDR-based Postfilter

The CDR-based postfilter works as a reverberation attenuator for low CDR values for each time and frequency bin. When the CDR estimation is infinitive, the postfilter takes value 1, and when the CDR estimation is zero, it takes the maximum between the minimum gain and one minus the root square of the oversubtraction factor.

Performance Improvement

Using CDR estimators for speech dereverberation reduces the Word Error Rate (WER) by nearly 30% in ASR, as suggested by figure 3. Furthermore, studies have shown that CDR estimators also compete with other traditional dereverberation methods in ASR systems.

Listening to the Recordings

The original speech with reverberation can be listened in the file and the dereverberated version in . The original recording with reverberation and the dereverberated version from figure 2 can be listened in the repositories [3].

In conclusion, the CDR-based dereverberation technique offers a high-quality result without the need for a trained model, making it an attractive solution for various applications.

Read also:

Latest