From DSD to Download: A deep dive into sample rate conversion
Posted by Thijs Muijs
At TRPTK, we generally record our artists’ performances in DSD256 for its natural sound quality, specifically with our Merging Technologies Hapi MkII A/D-converters. Since processing in DSD is a very difficult and impractical process, the DSD signal is then converted into PCM, and more specifically, 352.8kHz 64bit floating point. Once in the territory of PCM, we can immerse ourselves in the art of mixing, which may involve applying techniques such as panning, volume balancing, as well as more subtle manoeuvres like gentle limiting and minor EQ adjustments. This step of the production process forms the core of the mixing and mastering process.
What happens after mastering
After fine-tuning these details, the last step involves the final transformation of the audio signal and our current topic: converting our mastering resolution (again, 352.8kHz 64bit floating point) to the downloadable formats you see in our web store as well as on streaming services.
The need for this conversion arises because different playback devices and distribution platforms require different resolutions. But, if we don’t perform this conversion to a different resolution with care, we risk losing valuable parts of the sound, or even worse, introducing new artifacts that negatively affect the intended sonic experience. By approaching this part of the production with care, we can retain a significant portion — if not the majority — of the original quality, providing you, the listener, with an exceptional sonic experience, preserved even at lower resolutions.
Two types of conversions come into play in this final step: bit depth conversion and sample rate conversion. In this week’s blog, we’ll focus on the latter.
Sample rate conversion
Sample rate conversion is the process of altering the number of digital audio samples per second (or Hertz, Hz) to match a different desired rate. In other words, sample rate conversion alters the resolution of a digital signal in the time domain. For instance, while we work with a time resolution of 352.800 samples per second (352.8kHz), we could end up with a time resolution of 88.200 samples per second (88.2kHz). You can see this process in the graphs below, with the bottom graph showing a downsampled version of the top graph.
The two stages of downsampling
The first step, low-pass filtering, is crucial as it removes high-frequency components that would otherwise distort the signal through aliasing (more on that later). In essence, low-pass filtering is simply using a filter (like those in an equalizer, for example) that only lets frequencies past below a certain threshold.
The second step is the downsampling itself, also known as decimation. After low-pass filtering, samples are simply discarded to achieve the desired lower sample rate. This step has some interesting intricacies about how to reach the desired sample rate and which samples to discard. If not done well, distortions can arise. The details of this process are a little bit beyond the scope of this blog, but if you’re interested, you can find some more information here and here.
Aliasing and the Nyquist frequency
Let’s dive a bit deeper into the concept of aliasing. Aliasing is a phenomenon that can cause significant distortions in our audio signals. It refers to the misrepresentation of high-frequency content when a signal is downsampled.
To understand this concept a bit better, have a look at the animation below, depicting a sine wave sampled at a high sample rate (in grey). As the frequency of this sine wave increases, you’ll notice something interesting happening to its downsampled counterpart (in blue). High frequencies appear to mirror or fold back, transforming into lower frequencies in the downsampled signal. This effect is called aliasing, and us listeners perceive it as distortions.
The point beyond which the downsampled signal begins to produce the aliasing effect is called the Nyquist frequency, named after Harry Nyquist. This frequency is exactly half of the sampling rate. For example, when working at 352.8kHz, the Nyquist frequency would be exactly 176.4kHz. Simply put, the Nyquist frequency is the highest frequency you can accurately represent at a specific sample rate.
Understanding aliasing and the Nyquist frequency is crucial for high-quality sample rate conversion. By ensuring that our original signal’s frequencies don’t exceed Nyquist, we can prevent aliasing and maintain the integrity of the audio, even when downsampling. This is why it’s crucial to apply low-pass filtering before downsampling too; it removes these high-frequency components before converting to the lower resolution, hopefully avoiding aliasing altogether.
Low-pass and anti-aliasing filters
We’ve now arrived at the most crucial segment of this blog: the selection and fine-tuning of the anti-aliasing low-pass filter. This stage will decide the lion’s share of how we perceive and maintain the quality and sonic signature of our high-resolution master.
A filter, in essence, can attenuate or boost specific frequency bands within a signal. Filtering may introduce a degree of delay or phase-rotation, effectively altering the timing of various frequency components. Of course, not all filters are equal; for example, there are differences in how they handle the phase of the input signal. In this regard, there are two main types of low-pass filters we use for sample rate conversion: linear-phase and minimum-phase filters. Each of these come with their own unique characteristics and mix of upsides and downsides.
Let’s start with linear-phase filters. These filters are designed to maintain a consistent phase relationship across all frequencies, ensuring that every element of the signal is delayed by exactly the same amount. This dedication to phase linearity effectively eliminates phase distortion, but may introduce an issue known as transient smearing, where rapid pulses may become somewhat blurred.
On the other side of the coin we have minimum-phase filters, which do introduce phase distortions and, consequently, exhibit more (and different) transient smearing. However, some audio professionals prefer minimum-phase filters, for example due to their more “analogue” feel.
Transient smearing is a phenomenon where rapid, high-energy events in the signal, such as the attack of a drum hit or the plucking of a string on a harpsichord, become a bit blurred or smeared. Transient smearing can be divided into pre-ringing and post-ringing, which are key contributors to this effect. The amount of pre- and post-ringing depends on the type of low-pass filter used, and the steepness of it. Check out the graphs below.
Pre-ringing and post-ringing
Pre-ringing and post-ringing are closely related to the behavior of the filter. Pre-ringing is the ripple effect you see before a transient, and can be perceived as a tiny, faint echo just before the initial impact of the transient. Pre-ringing is part of the behavior of linear-phase filters; their quest for phase linearity ensures a consistent phase relationship across the entire bandwidth of the signal, but results in both pre-ringing and post-ringing, which may compromise the clarity of the audio.
On the other hand, minimum-phase filters have no pre-ringing at all. They do however suffer from heavier post-ringing, which sounds a bit like a lingering resonance after plucking a string, for example. The idea behind minimum-phase filters is that transients should be instantaneous, and decay after this transient is considered a natural process.
So, all of this ringing and smearing sounds like a sonic horror show, right? And what can we do about it, you might be wondering. And lastly, which filter type should you choose, the one with eerie pre-ringing and haunting post-ringing, or the one without pre-ringing but even scarier post-ringing? Well, you don’t have to choose just yet — let’s first discuss another twist in this audio tale that influences the ringing: the steepness of the filter.
Filter steepness
Think of the filter as an orchestra conductor, and the steepness as his baton directing the performance. A gentler conductor might initiate the attenuation early on, delicately fading out our sound, while a stricter conductor commands a sharp and abrupt conclusion, where the ‘ringing’ of the concert hall becomes more pronounced. It’s like choosing between a symphony that’s missing a few well-defined last notes or one with lingering echoes that follow an abrupt ending; each has its own unique character.
In an ideal world, we would use a filter with an infinitely steep transition, where frequencies below Nyquist all pass through completely unharmed, and those above it are unceremoniously silenced. However, reality has its constraints; filter steepness and ringing are bound inevitably — you can’t have one without the other. The steeper the filter, the more ringing, and infinite steepness would mean infinite ringing.
This connection between filter steepness and ringing confronts us with a pivotal design choice: do we aim to preserve as much high-frequency content as we can, or do we strive to keep ringing at a minimum? Let’s check out a few graphs.
In these spectrograms, the left side plots a linear-phase filter, the right one its minimum-phase counterpart. Both filters have a very high steepness, to illustrate their respective effects: you can clearly see the effects of pre-ringing and post-ringing, and the difference between the two filters. You can also see that the high frequencies extend all the way to the top of the spectrum.
Now, compare this to the following two graphs:
In these spectrograms, it’s evident that we’re looking at filters with a much gentler roll-off (or, in other words, with a lower steepness). You can see that high-frequency content is somewhat diminished, but also that pre-ringing and post-ringing are much less pronounced than in the previous two examples.
Linear-phase or minimum-phase?
So which filter, and which kind of sample rate conversion should you use? Well, that all boils down to how you want to shape the audio, and the technical and artistic considerations involved.
Linear-phase filters, with their commitment to phase consistency, deliver a precise stereo image and maintain transient behavior consistently across the entire frequency spectrum. This precision is vital in preserving spatial characteristics, a crucial element due to our sensitivity to the tiniest timing differences to the average listener. However, the pursuit of phase-alignment in linear-phase filters can introduce pre-ringing, an artifact that can be noticeable particularly in music with rapid transients.
Conversely, minimum-phase filters have a different approach, introducing subtle phase distortions and more pronounced post-ringing. They don’t provide the same level of phase-alignment as linear-phase filters, but they do often find favor in scenarios with very transient-rich content.
Given our focus on acoustic music and our view that spatial accuracy is paramount, linear-phase usually take the lead here at TRPTK. But there is no one-size-fits-all solution: the choice between linear-phase and minimum-phase filters, as well as the steepness of those filters, is a delicate balance, depending on the specific requirements of the sonic characteristics you’re dealing with.
For those of you who, after reading this article, would like to try some different options for yourself, we’re releasing another blog article with a test you can do at home in the upcoming couple of days! So stay tuned, and for now, thank you so much for reading. We’ll see you in the next one.
From DSD to Download: A deep dive into sample rate conversion
At TRPTK, we generally record our artists’ performances in DSD256 for its natural sound quality, specifically with our Merging Technologies Hapi MkII A/D-converters. Since processing in DSD is a very difficult and impractical process, the DSD signal is then converted into PCM, and more specifically, 352.8kHz 64bit floating point. Once in the territory of PCM, we can immerse ourselves in the art of mixing, which may involve applying techniques such as panning, volume balancing, as well as more subtle manoeuvres like gentle limiting and minor EQ adjustments. This step of the production process forms the core of the mixing and mastering process.
What happens after mastering
After fine-tuning these details, the last step involves the final transformation of the audio signal and our current topic: converting our mastering resolution (again, 352.8kHz 64bit floating point) to the downloadable formats you see in our web store as well as on streaming services.
The need for this conversion arises because different playback devices and distribution platforms require different resolutions. But, if we don’t perform this conversion to a different resolution with care, we risk losing valuable parts of the sound, or even worse, introducing new artifacts that negatively affect the intended sonic experience. By approaching this part of the production with care, we can retain a significant portion — if not the majority — of the original quality, providing you, the listener, with an exceptional sonic experience, preserved even at lower resolutions.
Two types of conversions come into play in this final step: bit depth conversion and sample rate conversion. In this week’s blog, we’ll focus on the latter.
Sample rate conversion
Sample rate conversion is the process of altering the number of digital audio samples per second (or Hertz, Hz) to match a different desired rate. In other words, sample rate conversion alters the resolution of a digital signal in the time domain. For instance, while we work with a time resolution of 352.800 samples per second (352.8kHz), we could end up with a time resolution of 88.200 samples per second (88.2kHz). You can see this process in the graphs below, with the bottom graph showing a downsampled version of the top graph.
The two stages of downsampling
The first step, low-pass filtering, is crucial as it removes high-frequency components that would otherwise distort the signal through aliasing (more on that later). In essence, low-pass filtering is simply using a filter (like those in an equalizer, for example) that only lets frequencies past below a certain threshold.
The second step is the downsampling itself, also known as decimation. After low-pass filtering, samples are simply discarded to achieve the desired lower sample rate. This step has some interesting intricacies about how to reach the desired sample rate and which samples to discard. If not done well, distortions can arise. The details of this process are a little bit beyond the scope of this blog, but if you’re interested, you can find some more information here and here.
Aliasing and the Nyquist frequency
Let’s dive a bit deeper into the concept of aliasing. Aliasing is a phenomenon that can cause significant distortions in our audio signals. It refers to the misrepresentation of high-frequency content when a signal is downsampled.
To understand this concept a bit better, have a look at the animation below, depicting a sine wave sampled at a high sample rate (in grey). As the frequency of this sine wave increases, you’ll notice something interesting happening to its downsampled counterpart (in blue). High frequencies appear to mirror or fold back, transforming into lower frequencies in the downsampled signal. This effect is called aliasing, and us listeners perceive it as distortions.
The point beyond which the downsampled signal begins to produce the aliasing effect is called the Nyquist frequency, named after Harry Nyquist. This frequency is exactly half of the sampling rate. For example, when working at 352.8kHz, the Nyquist frequency would be exactly 176.4kHz. Simply put, the Nyquist frequency is the highest frequency you can accurately represent at a specific sample rate.
Understanding aliasing and the Nyquist frequency is crucial for high-quality sample rate conversion. By ensuring that our original signal’s frequencies don’t exceed Nyquist, we can prevent aliasing and maintain the integrity of the audio, even when downsampling. This is why it’s crucial to apply low-pass filtering before downsampling too; it removes these high-frequency components before converting to the lower resolution, hopefully avoiding aliasing altogether.
Low-pass and anti-aliasing filters
We’ve now arrived at the most crucial segment of this blog: the selection and fine-tuning of the anti-aliasing low-pass filter. This stage will decide the lion’s share of how we perceive and maintain the quality and sonic signature of our high-resolution master.
A filter, in essence, can attenuate or boost specific frequency bands within a signal. Filtering may introduce a degree of delay or phase-rotation, effectively altering the timing of various frequency components. Of course, not all filters are equal; for example, there are differences in how they handle the phase of the input signal. In this regard, there are two main types of low-pass filters we use for sample rate conversion: linear-phase and minimum-phase filters. Each of these come with their own unique characteristics and mix of upsides and downsides.
Let’s start with linear-phase filters. These filters are designed to maintain a consistent phase relationship across all frequencies, ensuring that every element of the signal is delayed by exactly the same amount. This dedication to phase linearity effectively eliminates phase distortion, but may introduce an issue known as transient smearing, where rapid pulses may become somewhat blurred.
On the other side of the coin we have minimum-phase filters, which do introduce phase distortions and, consequently, exhibit more (and different) transient smearing. However, some audio professionals prefer minimum-phase filters, for example due to their more “analogue” feel.
Transient smearing is a phenomenon where rapid, high-energy events in the signal, such as the attack of a drum hit or the plucking of a string on a harpsichord, become a bit blurred or smeared. Transient smearing can be divided into pre-ringing and post-ringing, which are key contributors to this effect. The amount of pre- and post-ringing depends on the type of low-pass filter used, and the steepness of it. Check out the graphs below.
Pre-ringing and post-ringing
Pre-ringing and post-ringing are closely related to the behavior of the filter. Pre-ringing is the ripple effect you see before a transient, and can be perceived as a tiny, faint echo just before the initial impact of the transient. Pre-ringing is part of the behavior of linear-phase filters; their quest for phase linearity ensures a consistent phase relationship across the entire bandwidth of the signal, but results in both pre-ringing and post-ringing, which may compromise the clarity of the audio.
On the other hand, minimum-phase filters have no pre-ringing at all. They do however suffer from heavier post-ringing, which sounds a bit like a lingering resonance after plucking a string, for example. The idea behind minimum-phase filters is that transients should be instantaneous, and decay after this transient is considered a natural process.
So, all of this ringing and smearing sounds like a sonic horror show, right? And what can we do about it, you might be wondering. And lastly, which filter type should you choose, the one with eerie pre-ringing and haunting post-ringing, or the one without pre-ringing but even scarier post-ringing? Well, you don’t have to choose just yet — let’s first discuss another twist in this audio tale that influences the ringing: the steepness of the filter.
Filter steepness
Think of the filter as an orchestra conductor, and the steepness as his baton directing the performance. A gentler conductor might initiate the attenuation early on, delicately fading out our sound, while a stricter conductor commands a sharp and abrupt conclusion, where the ‘ringing’ of the concert hall becomes more pronounced. It’s like choosing between a symphony that’s missing a few well-defined last notes or one with lingering echoes that follow an abrupt ending; each has its own unique character.
In an ideal world, we would use a filter with an infinitely steep transition, where frequencies below Nyquist all pass through completely unharmed, and those above it are unceremoniously silenced. However, reality has its constraints; filter steepness and ringing are bound inevitably — you can’t have one without the other. The steeper the filter, the more ringing, and infinite steepness would mean infinite ringing.
This connection between filter steepness and ringing confronts us with a pivotal design choice: do we aim to preserve as much high-frequency content as we can, or do we strive to keep ringing at a minimum? Let’s check out a few graphs.
In these spectrograms, the left side plots a linear-phase filter, the right one its minimum-phase counterpart. Both filters have a very high steepness, to illustrate their respective effects: you can clearly see the effects of pre-ringing and post-ringing, and the difference between the two filters. You can also see that the high frequencies extend all the way to the top of the spectrum.
Now, compare this to the following two graphs:
In these spectrograms, it’s evident that we’re looking at filters with a much gentler roll-off (or, in other words, with a lower steepness). You can see that high-frequency content is somewhat diminished, but also that pre-ringing and post-ringing are much less pronounced than in the previous two examples.
Linear-phase or minimum-phase?
So which filter, and which kind of sample rate conversion should you use? Well, that all boils down to how you want to shape the audio, and the technical and artistic considerations involved.
Linear-phase filters, with their commitment to phase consistency, deliver a precise stereo image and maintain transient behavior consistently across the entire frequency spectrum. This precision is vital in preserving spatial characteristics, a crucial element due to our sensitivity to the tiniest timing differences to the average listener. However, the pursuit of phase-alignment in linear-phase filters can introduce pre-ringing, an artifact that can be noticeable particularly in music with rapid transients.
Conversely, minimum-phase filters have a different approach, introducing subtle phase distortions and more pronounced post-ringing. They don’t provide the same level of phase-alignment as linear-phase filters, but they do often find favor in scenarios with very transient-rich content.
Given our focus on acoustic music and our view that spatial accuracy is paramount, linear-phase usually take the lead here at TRPTK. But there is no one-size-fits-all solution: the choice between linear-phase and minimum-phase filters, as well as the steepness of those filters, is a delicate balance, depending on the specific requirements of the sonic characteristics you’re dealing with.
For those of you who, after reading this article, would like to try some different options for yourself, we’re releasing another blog article with a test you can do at home in the upcoming couple of days! So stay tuned, and for now, thank you so much for reading. We’ll see you in the next one.