You see us talking a lot about DXD, DSD, things like 352.8 kHz or 32 bits and what have you. But what do all these things mean, and more practically, what do they mean for you as a listener? In this article, I hope to be able to give you a bit more of a background what sample rates are, what they do, and when you should or shouldn’t choose for a higher one. So let’s dive into this! The wheels of the bus go round and round… except when they don’t You’ve probably seen this effect a million times in movies. A car (or any vehicle for that matter) accelerates from a standstill. The wheels begin to move forward, and then all of a sudden seem to stop and go backward, until they go forward again – and this process repeats and repeats. If you haven’t, go watch some of the older spaghetti western movies and look at the spokes of horse-drawn carriages. The effect you see is something called aliasing, and it happens when the frequency of something you want to capture exceeds half the frequency of what you’re capturing it with. This happens with audio as well. For example, check out the image below. Let’s say the red line is the analog wave you want to capture. Every black dot represents a sample your analog to digital converter (A/D-converter) takes of this analog waveform. So every 2/3s of this waveform, a sample is taken and the amplitude of it is stored in the digital domain. All is good and well, until you have to reproduce this signal again. An analog wave (represented by the red line) sampled at a certain frequency (represented by the black dots). Its result, when reconstructed, is a wave of a much lower frequency (represented by the striped black lines). This is called aliasing. What happens next, as you can see from the striped black lines, is that a waveform is reconstructed from these samples, but with a different frequency than the original sample. That’s not good! That means different audio came out than what went in. This is called aliasing. So what frequencies can you and can’t you capture? Well, that’s what the so-called Nyquist-Shannon sampling theorem is for, created by Harry Nyquist and Claude Shannon. No wait, this theorem was actually thought of by an information theorist and radar astronomer Vladimir Kotelnikov. The theorem is actually pretty simple: the highest frequency you can capture is exactly half of your sampling frequency. This means that if your sampling frequency is 100 Hz (Hertz here being sampling cycles per second), the highest frequency you can capture is 50 Hz. Great stuff, if you only record subsonic music with only the lowest registers of a pipe organ. But for music you’ll probably want to be able to record more than that. This is why the brilliant people over at Philips and Sony thought 44.100 Hz would be a great sample rate. CD, a format for humans We humans are not as sophisticated as we think we are. Our minds can only process so much, and for our dear audiophiles even more importantly, our ears can only hear so much. So how much is so much? The answer to that is, not a whole lot. A healthy, young person can hear frequencies from around 15-20 Hz (the frequency of the lowest pipes of a cathedral organ) to around 20 kHz (approximately the frequency of the tone a CRT monitor (remember these?) emits when in use). For comparison, a bat can usually hear up to 110 kHz, and a beluga whale can still hear 150 kHz, making it a perfect candidate for some good audiophile listening. So, in short, our hearing is flawed. To say the least – the range of frequencies you’re able to hear drastically decreases when you get older, especially if you tend to listen to music at high volumes. You can actually hear this effect sometimes when comparing the early work of a mastering engineer with work from when they’re, say, middle-aged or above; they tend to compensate for their own hearing by adding more high frequencies to the master. Furthermore, not only do we have a limited frequency range we can hear, but we’re also much more sensitive to certain frequencies (like 4 to 5 kHz, the presence area of speech) than we are to others. Our hearing is definitely not as flat a curve as those beautiful speakers or headphones your system’s rocking, unfortunately. Because of all this, the people at Sony and Philips decided to use 44.1 kHz as a good sampling frequency for their latest-and-greatest new format, the Compact Disc Digital Audio (or as you and I know it, CD). This sampling frequency meant that, theoretically (and this gets important later) you could capture sounds up to little over 22 kHz, so just over our hearing limit. Perfect, right? Well… Not always, and that has to do with filtering. Keep the good stuff, get rid of the bad stuff As much as I’d like to dive deep into the subject of anti-aliasing filters, reconstruction filters and the like, I have to acknowledge that a) there are people with far more knowledge on the subject than I have, and b) this blog post would easily get ten times as big as it already is. But here’s the gist of it: when converting analog signals into digital ones or vice versa, you need filters to take out the nasty stuff whilst trying to preserve the stuff you actually want. So what nasty stuff exactly? Well, let’s go back to our example of the humble CD format. Its sample rate of 44.1 kHz tells us we can record any frequency up to 22.05 kHz (let’s just round it off to 22 kHz to keep things simple). Since our hearing can ideally go up to 20 kHz, we want to keep everything up to 20 kHz and remove everything above 22 kHz that could cause aliasing to occur – and this is exactly what you use a filter for. You’re probably familiar with filters, they, well, filter things. Audio things in this context. Some streaming devices, for example, have a built-in equalizer, which is essentially a bunch of filters put together. For example, a so-called low pass or high cut filter is able to remove frequencies above its cut-off frequency, while retaining everything below it. In theory, that is. The thing is, by the way they work, filters also add something called temporal distortion, meaning that they distort the waveform in the temporal (time) domain. It creates what is known as pre-ringing and post-ringing, meaning that if you would have an impulse in the analog domain, you’d get some wobbling or ringing before and after it. So what does that look like? I’ve drawn up an examples below. Left: the original (analog) signal. Right: the digital signal after filtering. You can see the effects of pre-ringing and post-ringing here, distorting the original waveform. In the figure above, you can see that because of pre-ringing, an impulse will get some noise right before it hits, and the decay of the sound is distorted and elongated. This can make the recorded audio sound less straight in its rhythm, something a lot of listeners and especially musicians are sensitive to. This, I theorize, has to do with the fact that we’re evolved as more temporally-sensitive beings rather than tonally sensitive. This makes sense; when our ancestors were out in the jungle, it was way more important where, how far away and how big a possible predator was than exactly what timbre its call had. In any case, for critical music listening, pre- and post-ringing can be a major issue. Luckily, there are a number of ways to mitigate some of this. One is that, fortunately, there are now many great filtering techniques that are able to do their work without causing too much ringing, or at least getting rid of pre-ringing but having a bit more post-ringing. But there’s another solution. You can also just filter very slowly. Brickwall vs. slow roll-off In the figure I sketched above, you can see what a so-called brickwall filter does. But there are more elegant solutions to it. What if you just make a less aggressive filter that has less pre- and post-ringing? That’s definitely possible, but then you run into a little dilemma… In our CD example (we’ll get to higher resolutions in a bit, don’t worry, we’re almost there!), you could choose to either roll off the filter more slowly (let’s say you still leave in a little bit of 23 kHz) and take some aliasing or distortion for granted, or begin rolling off earlier such as at 16 kHz, and compromise a bit on your high frequency content. But what if you just don’t want to compromise? High sample rates and filtering This is where the advantages of high sample rates come in. We here at TRPTK record at a staggering 352.8 kHz sample rate (a format called DXD, more on that in a later blog). The Nyquist-Shannon theorem tells us that we can record anything up to 176.4 kHz, making it the perfect candidate for slow filtering. Let’s get one thing out of the way: there are many, many people who claim there’s no use in recording at higher sample rates than 44.1 kHz, because “humans can’t hear anything above 20 kHz either way”. This is a perfectly understandable way of reasoning, except that it’s flawed in a very big way. It doesn’t account for all the nastiness filtering creates. So let’s go back to our CD example and compare it to DXD. At CD sample rates, we’d need to have a pretty aggressive filter that leaves everything below 20 kHz as-is, yet completely remove anything above 22 kHz. With DXD and its Nyquist frequency of 176.4 kHz (that’s half of 352.8), you could start filtering very slowly from, say, 25 kHz, and just ever so slowly roll off until it’s dead silent around 176.4 kHz. Gone is your pre-ringing, gone is your post-ringing, you’re left with an almost exact copy of the original analog waveform, yet in the digital domain. When high sample rates matter (and when they don’t) So, the reason we record in the DXD format at 352.8 kHz sample rates is not because we tailor our music to the audiophile beluga whale or bat. No, it has simply to do with filtering and trying to keep the original waveform intact. We humans are very sensitive to timing and rhythm, and a lot of us – especially musicians – hear the difference between a recording with nasty pre- or post-ringing and one without easily. So should you delete all your non-DXD music from your NAS, set your CD collection on fire and throw your iPod into the river? Well, no! I certainly believe that, once recorded at a high sample rate, a lower resolution master isn’t half bad. A CD made from a DXD recording, to me at least, sounds much better than one recorded at CD resolution. Of course, listening to audio at its exact recording sample rate is always preferable because you don’t have to deal with sample rate conversion and its nasty side effects, but there’s no shame in listening to 44.1 kHz or even 96 kHz for that matter. Even more so, at high sample rates, your DAC may perform even worse than it would at lower sample rates, which is true for some of the cheaper models out there. This has to do with something called jitter, which we’ll talk about later. For now, I hope you’ve become a bit wiser as to what all these numbers and formats mean, and why it’s so important. And also, why it’s sometimes not so important. Should you have any questions, feel free to write them below or send me a message! Posted by Brendon Heinst. Brendon is the founder and senior recording & mastering engineer at TRPTK. He gained a Bachelor's and Master's Degree in audio engineering at the University of the Arts Utrecht. Brendon was involved in more than 200 recordings to date, focusing heavily on ultra-high-resolution and multichannel immersive recordings.