Assessing Speech Quality and Noise Reduction

Assessing Speech Quality and Background Noise Removal

By Jeffrey Leahy
2023-07-17

Evaluating Speech Quality with Background Noise Removal

Have you ever wondered how companies assess the speech quality of their microphones in noisy environments? How do these tests hold up for background noise removal? Evaluating speech quality, particularly in the presence of background noise, is a more complex process than it appears. Audio processing algorithms, such as beamforming, compression, noise reduction, and automatic gain control, are employed to eliminate unwanted noise. However, it is important to note that these methods can also introduce undesired artifacts and distortions into recordings. In this article, we will learn the various evaluation methods and emphasize the significance of these tests for microphones.

When speech is accompanied by background noises, it can be quite challenging to understand the speech clearly. Sometimes, the noise becomes so overwhelming that it becomes difficult to grasp the message being conveyed. If we use an overly aggressive algorithm for background noise removal, it can unintentionally alter the speech, resulting in a robotic-sounding tone. This task becomes even more demanding when the background noise consists of speech itself, such as conversations in a busy restaurant.

Using background noise removal in a coffee shop.

Furthermore, our brains are accustomed to communicating in noisy environments, which means we are more tolerant of speech artifacts when the noise level is higher. However, when the noise level is very low, we naturally expect a higher quality of speech with exceptional clarity. In essence, the goal is to capture the intended speech with excellent clarity and precision, while effectively reducing or minimizing the impact of background noise.

Rating Audio Quality

Traditionally, evaluating the quality of audio or speech involved an expert panel listening to recordings and rating them on a scale from 1 to 5, with 1 representing "Bad" and 5 representing "Excellent." The Mean Opinion Score (MOS) is then calculated as the average of the scores provided by different experts. However, this method is costly and time-consuming to set up. It requires selecting and coordinating a panel of highly trained experts who need to be available for multiple sessions to rate various prototype iterations or algorithm versions to determine the optimal solution and compare it with existing products.

An audio quality rating post background noise removal.

To streamline this process and make it more accessible, several models have been developed to automate the experts' MOS ratings. One such model, called 3QUEST by HEAD Acoustics, focuses on rating speech quality in noisy environments. The setup involves recording the device under test while playing back speech through a head and torso simulator (HATS) or a mouth simulator. Additionally, a noise simulation is played back through four to eight loudspeakers strategically positioned around the device. The specific noise simulation scenario selected depends on the product category being tested, such as using call center noise to evaluate a headset or car engine noise to assess a car's hands-free microphone.

A measurement setup for 3Quest. — Figure 1: Measurement setup for 3Quest.

The recorded audio is then assessed using HEAD Acoustic's 3QUEST software, which compares it to the clean original speech file. This software evaluates various characteristics of both the noise and speech, including levels, distortions, and more. It generates three scores: N-MOS for noise quality, S-MOS for speech quality, and G-MOS as the global score. As mentioned earlier, we anticipate a higher quality of speech when there is less background noise. Therefore, the S-MOS incorporates the N-MOS as one of its input variables. The G-MOS is a weighted combination of the N-MOS and S-MOS. With some background noise removal services and options, there is significant degradation of the overall speech quality, leading to a lower G-MOS score. Though with advances in technology, such as with Soundskrit's directional MEMS microphones, newer products will be able to remove background noise without significantly affecting overall sound quality.

With a tool like 3QUEST, it becomes easier to compare the performance of different audio solutions for speech in noisy environments. Moreover, when used correctly, this tool allows for inter-lab comparisons to be made, enhancing the evaluation process.

In conclusion, companies rely on sophisticated tools like the 3Quest software to assess speech quality in noisy environments and gauge the performance of microphones. This comprehensive test provides valuable insights into three key factors: N-MOS, S-MOS, and G-MOS, offering guidance for the development of optimal microphones in noisy environments. With advances in background noise removal, sound quality no longer needs to suffer. For more details on audio, please visit our page at AudioHub.

References:

HEAD Acoustic 3QUEST: https://cdn.head-acoustics.com/fileadmin/data/global/Application-Notes/Telecom/3QUEST-Application-Note.pdf