Speech Dereverberation via Coherent to Diffuse Power Ratio Estimators (CDR)
A novel approach to dereverberation, particularly suitable for applications with small microphone arrays requiring low computational resources, has been developed using Coherent to Diffuse Power Ratio (CDR) estimators. This technique aims to enhance the direct sound component and reduce reverberation, thereby improving the performance of Automatic Speech Recognition (ASR) systems.
CDR Estimation
The method leverages the spatial characteristics of the sound field captured by the two microphones. The direct speech is modeled as a coherent wave arriving at both microphones with a known phase difference, while reverberation is modeled as a diffuse sound field, which is spatially incoherent over the microphone pair. The CDR estimator quantifies the proportion of coherent speech power relative to diffuse reverberation power in each frequency bin.
Dereverberation Process
By applying the CDR estimate to the multichannel microphone signals, the system filters or weighs the frequency components, emphasizing those with higher coherent (direct) power and suppressing those dominated by diffuse (reverberant) power. This selective enhancement reduces reverberation effects.
Integration with ASR
The dereverberated signal, now with improved speech clarity and less reverberation distortion, improves the robustness and accuracy of the ASR system.
The underlying assumption enabling CDR-based dereverberation is that the direct speech and reverberation differ in spatial coherence, which is measurable from just two microphones by analyzing their cross-spectral densities. This approach is practical for small microphone arrays and computationally efficient compared to full spatial filtering or blind signal separation methods.
CDR-based Postfilter
The CDR-based postfilter works as a reverberation attenuator for low CDR values for each time and frequency bin. When the CDR estimation is infinitive, the postfilter takes value 1, and when the CDR estimation is zero, it takes the maximum between the minimum gain and one minus the root square of the oversubtraction factor.
Performance Improvement
Using CDR estimators for speech dereverberation reduces the Word Error Rate (WER) by nearly 30% in ASR, as suggested by figure 3. Furthermore, studies have shown that CDR estimators also compete with other traditional dereverberation methods in ASR systems.
Listening to the Recordings
The original speech with reverberation can be listened in the file and the dereverberated version in . The original recording with reverberation and the dereverberated version from figure 2 can be listened in the repositories [3].
In conclusion, the CDR-based dereverberation technique offers a high-quality result without the need for a trained model, making it an attractive solution for various applications.
Read also:
- California links 100,000 home storage batteries through its Virtual Power Plant program.
- Air conditioning and air source heat pumps compared by experts: they're not identical, the experts stress
- Tech Conflict Continues: Episode AI - Rebuttal to the Tech Backlash
- Container Tracking and Sustainable Shipping: Cutting Carbon Emissions!