Semantic Hearing: An Interview with Shyam Gollakota
By Brad Diamond
2024-01-15
Peace and Calm
I love my pair of noise cancelling headphones. They’re comfortable, stylish, and an excellent way to enjoy some music. Most importantly, they’re noise cancelling! When I am walking down a busy street or find myself in a crowded mall, I can separate myself from a noisy, chaotic world simply by putting my headphones on. I am then transported by whatever beautiful music I choose, letting the stressful world around me fade away.
Isolation, though sometimes comforting, is a tradeoff. My noise cancelling headphones help me find comfort in a cacophonous world, but they do so by cutting off an entire sense, one that is generally considered the second most important sense. Maybe I feel calmer in a busy street or crowded mall, but maybe I also lose out on vital information, such as an approaching emergency vehicle or a friend calling out my name.
I also lose out on the more beautiful aspects of life. I can tune out people talking at the park with my headphones, but then I also lose the sound of birds singing or the sound of the wind whistling through the grass. That awful thunderstorm overhead will disappear with the click of a button, but I also lose the soothing sound of rain overhead easing me to sleep. It is simply impossible to have the best of both worlds.
Understanding a World of Sound
Until now. Professor Shyam Gollakota and his lab have made very impressive developments with semantic hearing in binaural headphones, developments that will transform the headphones market. Using deep machine learning, the Mobile Intelligence Lab have created a neural network that can generalize in the real-world, turning noise cancelling headphones into intelligent devices that can choose what sounds they let through.
To turn headphones into intelligent devices, Shyam and his team first created machine learning models. These models, able to be run on a device as common place as a smart phone, can identify different sounds around the user. Upon identifying the sounds, Shyam’s models can manipulate the acoustic scene, as the user desires.
The headphones would first cancel out all noise surrounding the user, as noise cancelling headphones typically do. Then, using the machine learning models that only get better the more they’re used, the headphones play back the specific sounds that a user chooses to hear. The user experiences only the bliss of wind blowing through leaves, without the annoying chatter surrounding them.
Timing and Directionality
For this experience to feel truly natural for the user, this whole process takes place in a 100th of a second, so that the audio matches the visual. This, in particular, was a large aspect of the challenge facing Professor Shyam and his team. Whereas other machine learning algorithms could use audio that was a few seconds long, they needed to achieve good results using only a few milliseconds.
Ultimately, achieving this quick reaction time was quite a challenge. There was no big break that opened this technology up to the world. Instead there was the hard work and dedication of Shyam and his lab, who successfully developed processing that resulted in a natural sounding playback for listeners.
Another challenge the lab ran into was with directionality. For the experience to be best for the user, the headset preserves the direction of the noise, providing a fully immersive experience for the user. Their neural network effectively turns a regular pair of headphones into cybernetic ears, able to understand the acoustic scene around them.
As Professor Shyam discusses in the following clip, their success with timing and directionality created a futuristic technology that will be in audio products in the next few years:
Most impressively, semantic hearing can be activated on most existing noise cancelling headphones today. The software itself is run from a connected smartphone, without any technical integration into the hardware of the headset. While this does drive up the battery usage on a phone, Shyam expects that this problem will be solved.
An Intelligent Design
Professor Shyam is hopeful that this technology will garner interest, though it seems unlikely to be left alone for long. This development in semantic hearing has huge promise for the future of audio devices, as more intelligent devices lead to better audio for the consumer. Semantic hearing seems like it will be huge because it is a leap forwards into the realm of Sci-Fi, where technology can change the way we perceive the world around us.
The advancement of semantic hearing promises a whole new audio experience. Given the option to selectively choose what to hear, people are going to want that. Semantic hearing headphones gives people the opportunity to improve their daily lives in the moment, by letting them choose what they want to experience.
Advancements like this only unlock further advancements, as new possibilities are discovered and explored. Where Shyam and his lab have taken the first steps in semantic hearing, there is still progress to be made here. Here at Soundskrit, we are very excited to see what Shyam and his lab do next.
Interested in seeing more of Shyam’s work for yourself? Shyam's webpage contains a plethora of cutting edge research.