What you can't hear is still there, until it isn't

It's pretty safe, I think, to say most people have never seen audio. Sure, they've heard it, argued about it (in case of audiophiles, probably a lot), maybe even lost some sleep over it... But never seen it. Which is a shame, since once you look at what happens to a recording when it's compressed into an MP3, the debate about lossless vs. lossy stops being abstract pretty quickly. All the more reason to dive further into the subject.

Sometimes a picture's worth a thousand words

The images in this post are so-called spectrograms. They're pretty easy to read once you've gotten how they work: the horizontal axis represents time, the vertical axis represents frequency, and the brightness of each point is how much acoustic energy is present at that frequency at that specific moment. That's all. It's like a heat map for sound.

This first image is a spectrogram of an original 44.1 kHz 16 bit WAV file (in this case the first minute of track #3 of In Motu by Intercontinental Ensemble). It's a pretty direct representation of the audio as it was recorded and mastered (albeit converted back from DXD, of course). As you can see, the energy extends all the way to 22 kHz, around the Nyquist limit for a 44.1 kHz audio file. Notice the sharp transients, rich detail in the midrange, and the complex structure and texture throughout. This is what the original recording also contained.

Now, have a look at happens the moment you convert it to MP3.

A 320 kb/s MP3 converted from the same WAV file (above), its 96 kb/s counterpart (below)

The obvious

At 320 kb/s (considered the "gold standard" of MP3 quality) you can already see a noticeable drop in high-frequency information from about 16 kHz. At 256 kb/s it drops even further, and by the time you reach 96 kb/s, you're looking at a file that contains almost nothing above this limit. What remains is a thin, noisy smear of what was once detailed high-frequency information.

But this matters more than you might think. These high frequencies carry a disproportionate share of the spatial information in a recording (the cues your brain uses to "visualise" where instruments are placed, both between your speakers and their perceived depth). This spatial information is used to feel the difference between a concert hall and a living room, between instruments sitting 20º or 25º to your left, to them sitting right in front of you or 10 metres away from you.

Strip these high frequencies away, and the music doesn't just sound "less bright"; it sounds smaller, more two-dimensional rather than three-dimensional.

The less obvious

Here's what the standard MP3 debate almost always misses: the damage isn't only at the top of the spectrum. Look carefully at the mid-range in the lower-bitrate MP3s. The crisp, detailed texture of the original has been replaced by something much blurrier; information has been averaged and approximated rather than preserved. Another thing you might've noticed is that a lot of information is simply gone, especially on the lower-bitrate MP3.

This is the psychoacoustic model of the MP3 format at work. The encoder doesn't simply record what's there, it makes decisions about what you "probably won't notice", based on a mathematical model of the human hearing. Loud sounds tend to mask quiet ones nearby in frequency. Fast transients mask what comes before and after them. The MP3 codec exploits these attributes aggressively, discarding anything it predicts will fall behind the mask.

The issue, though, is that this model isn't perfect. And it was never even designed with classical or acoustic music in mind. It's tuned largely to rock and pop music, with sustained energy, heavy compression, and a relatively consistent spectral shape. Acoustic music, from the perspective of your MP3 encoder, pretty much the hardest possible test case: sparse textures, long reverb tails, complex transients, and harmonically rich instruments that extend into exactly the frequency ranges the codec is most aggressive about cutting.

A note on pre-echo

There's another artifact worth understanding on its own terms, simply because it's such a musically-damaging characteristic of the MP3 format.

This artifact is called pre-echo, and this is how it works: MP3 processes audio in fixed-length frames. When a sharp transient, like a staccato note on the piano or pizzicato on a cello or violin, falls near the end of one of those frames, the encoder has to distribute the quantization noise (don't worry too much about this term, we might get into it a bit later... or not), across the entire frame. That noise then also bleeds backwards in time, meaning you hear a faint smear of sound before the attack itself that caused it.

Your hearing is extremely sensitive to this kind of temporal disturbances, and the precise moment of a transient is fundamental to how you perceive not just rhythm and articulation, but localisation, depth, and so much more. Pre-echo doesn't just add noise, it basically erodes the sense of physical presence that makes a recording feel "real".

"But 320 kb/s is transparent!"

Well, yeah. It can be. On some material. On some systems. In some conditions. To some listeners. This worth acknowledging honestly. Modern high-bitrate MP3s, and even more so modern codecs like AAC (Apple Music) and Opus (Spotify) are genuinely impressive engineering achievements. The average listener on a typical system will probably not hear a major difference, if at all.

But there's a major difference between "nothing obviously wrong" and "nothing lost". If you're listening on a good system in a quiet room, to music that rewards close attention, the difference isn't subtle at all. More importantly, if you care about your music-listening experience at all, is "probably fine enough" really going to cut it?

Lastly, there's also the question of what happens downstream. Let's say you're listening to an MP3 file through your wireless headphones. Now there's a compound lossy-ness where the original WAV file is converted to an MP3, which is then converted to, for example, a 256 kb/s AAC stream your Bluetooth headphones use.

A final thought

Look, we're not here to win an argument. We're here to make something visible that usually isn't, trying to give you a way of seeing the difference rather than having to trust some kind of vague, subjective description of it. Look at the original WAV image again, and then at the MP3 version. What you see is real. It's in the music. This is why we deliver all our music in FLAC; nothing is predicted, approximated, discarded. the codec doesn't decide what you "probably won't hear", it just stores that we recorded.

We're not dogmatic about file formats by any stretch of the imagination. But when you've invested in great microphones, great musicians, and a great acoustic space, delivering the result as a lossless file is simply the only thing that makes sense.

We hope you've enjoyed this bit of tech background. We'll try to update the blog with more information, behind-the-scenes photos and videos, and much more, more frequently in the future.

A visual exploration of MP3 as a format