Audio DSP System Overview
By Brad Diamond
2024-06-05
Digital Audio Processing
Understanding the digital audio path and the concept of audio callbacks in digital signal processors (DSPs) forms the backbone of real-time digital audio processing. While non-exhaustive, in this article, we will try and help build an intuition of how digital audio processing works and some of the key considerations
Below is a block diagram schematic example of an audio system that captures and processing audio from two microphones and outputs it through USB.
Components of an Audio Capture System
In a typical audio capture system, the main components include the microphone(s), an audio codec, a SoC/DSP, and a reference clock. The microphones provide a measurement of the external sound while the audio codec converts the microphone output into a digital format that is more easily readable and transferable through typical computer processors. The system on chip (SoC) handles many functions including transferring audio to a digital signal processor (DSP) for processing and receiving its output to send to the outside world. Finally, a reference clock is used to make sure the data being transferred among these various systems and components is synchronized.
The main parameters of a digital audio flow are the sample rate, bit depth, and audio block size, determining the fidelity and efficiency of digital audio systems.
The sample rate and bit depth ensure the microphone output is converted into a computer-friendly format. The sample rate determines how often "snapshots" of the sound wave are taken, while the bit depth signifies the detail within each snapshot. Higher sample rates and bit depth yield more accuracy but demand more CPU power.
Digital Audio Interfaces
In DSP systems, sound snapshots are grouped into audio blocks instead of being handled individually. The audio block size signifies the number of samples processed at once. The SoC collects audio samples in a direct memory access (DMA) buffer. Once the audio block is filled with samples, it triggers an audio callback interrupt. This triggers the audio callback which is a function that is triggered when new audio data is available and runs the desired processing on the collected audio block. Larger audio blocks lead to fewer interruptions but take longer to process, whereas smaller blocks offer lower latency but might increase processing overhead due to higher callback frequency.
Digital audio interfaces like I2S, USB, and PDM facilitate the transfer of audio data between DSPs and output/input ports. Handling multiple audio interfaces introduces challenges due to hardware, protocol, or clocking differences. For instance, PDM inputs capturing data from microphones might operate on their own clock cycle, while a USB output for external devices uses a different timing mechanism. Synchronizing these separate data streams becomes crucial to avoid issues like buffer underflows or overflows, leading to audio distortion or loss. Without synchronization, timing or data alignment mismatches can cause glitches.
One synchronization solution incorporates dedicated hardware components like clock synchronizers. The hardware synchronizer accepts a single source as a reference clock and generates synchronous clocks for each separate audio interface (e.g., I2S, USB).
Latency
As audio data is captured, it is stored in a buffer until enough samples have been collected to trigger the processing. Buffering audio data can introduce latency. The formula to calculate the latency introduced by buffering is:
Latency (seconds) = Buffer Size (samples) / Sample Rate (samples/second)
The buffer size is determined by the audio block size, as the block essentially functions as a buffer. Therefore, the block size, measured in samples, can be considered the buffer size in this example. Larger block sizes typically imply more samples stored before processing, resulting in increased latency. By increasing the sample rate, the buffer can be filled more quickly, reducing overall latency. In some cases, the buffer may store more than one audio block.
The total latency of an audio system from input to output will depend on the buffer latency in addition to the latency of the algorithm. Reducing total latency involves optimizing block size and enhancing CPU processing efficiency. Balancing these factors is crucial for minimizing latency while ensuring efficient audio processing.
Synchronization
The ideal synchronization solution involves a combination of software and hardware approaches tuned to the system's specific requirements. Leveraging hardware-based synchronization where available and complementing it with software-level buffering and resampling methods ensures synchronized operation among different audio interfaces.
Consider an array of microphones implementing a beamforming algorithm in the digital audio path. Each microphone sends data via PDM to the DSP. Within the audio callback, the DSP receives multiple data streams, enabling beamforming algorithms to create directional audio by adjusting phases and amplitudes while minimizing interference. Upon synchronization, indicating data readiness (likely through an interrupt), the processed audio block is sent as output to the USB audio driver.
This example showcases how the audio callback integrates with processing techniques, enhancing directional audio.
Learn more about our SimplyClear Software!