One WAV or the other (WAV formats explained)

There’s no audio format simpler than good old WAV. Right? Well, it’s more complicated than you think, and today we want to include you in our journey of discovering what exactly WAV entails, how two WAV files can be vastly different, and even why some WAV files just don’t get recognised by your playback software.

It’s going to be a long read, but I hope you like it! But first, let’s start at the beginning, where the format comes from, who came up with it, and more.

A little history lesson

WAV, or Waveform Audio File Format was developed by IBM and Microsoft and introduced 32 years ago in 1991, alongside Windows 3.1. As an application of the Resource Interchange File Format (RIFF), a generic file container format for multimedia such as audio and video, all data is stored in chunks. We’ll tell you more about those chunks a little later, but what’s important to note is that before WAV or RIFF, there were many proprietary and competing audio formats in use by all kinds of different systems and applications. With WAV/RIFF, IBM and Microsoft sought to standardize this.

Generally speaking, a WAV file is used to store uncompressed Linear PCM (pulse-code modulation) audio, such as 44.1kHz 16bit. It can however also contain compressed audio using the Audio Compression Manager by Windows, but this is definitely a rare use case. Uncompressed audio has many benefits; although file sizes are a bit larger than with compressed audio, uncompressed LPCM can be easily edited and manipulated by most devices. This is why WAV is such a popular format in the recording and broadcasting world, enjoying compatibility with nearly every device on the planet.

Another pretty well-known audio format, AIFF (Audio Interchange File Format) released about three years before RIFF/WAV and used in Amiga and Macintosh systems, also employs a chunk-based structure, with each chunk containing a header identifying the chunk type, and a data section containing the chunk’s content. The biggest difference between the formats is their endianness, or in other terms, the way bytes are stored. AIFF uses big-endian order, whereas WAV uses little-endian order. I’m not going to go into the intricacies of it, but just like you can write a date in multiple different ways, like DD/MM/YYYY or MM/DD/YYYY like many people in the U.S. do, for example.

So now that we’ve had a little intro into the history of WAV, let’s get down to business and explore how it’s built up.

The chunky stuff

As explained before, as an RIFF file, WAV files are stored in chunks, each chunk telling the system a little bit about what to do with it. The most important chunks you might find in a WAV file are the RIFF chunk, the fmt chunk, and the data chunk. There are also some other optional chunks, such as the LIST and fact chunks, but we’ll not go into those for a bit.

The RIFF chunk is identified by 4 bytes saying “RIFF”, and then proceeds to tell the size of the overall file, also using four bytes. It also uses 4 bytes to describe the file being “WAVE”. This chunk itself is the container for the entire file, and it informs that this file is indeed a WAV file.

This chunk is then followed by the fmt (Format) chunk. It’s identified by 4 bytes saying “fmt ” (note the space), then (again, four-byte) fields for size of the format data, followed by the compression code, number of channels, sample rate, byte rate, block align (bytes per sample including all channels), and the bit depth. In other words, this chunk describes the WAV file format, telling the system whether it’s compressed or not, the amount of channels, resolution, etcetera.

The next chunk, the data chunk, contains the actual sound information. Its 4-byte identifier says “data”, which is followed by the size field (yet again, 4 bytes of it), and, last but not least, the data field which stores the audio.

As I mentioned before, you can also store more optional stuff in the WAV container, but let’s keep it relatively simple for now; I don’t want to force you to read a 500-page thesis on the material, just give you a good basis to understand the format.

So now that we understand a little bit about how a WAV file works, let’s get to the problem at hand, shall we?

Back to bits

So, there’s a reason we’re giving you all this information, and this has to do with the fact that, over the past 10 years already, we’ve received some messages from listeners that some (a tiny percentage, but still) of our dear listeners cannot play back some WAV files we’ve produced. We were always confused; wasn’t WAV this super simple, always compatible powerhouse of a format? Well, not always, and it gets complex pretty fast. Because, well, one WAV isn’t always the same as the other. And here’s why.

Did you notice I wrote down the byte sizes in my chunk explanation back there? Well, there’s a reason for that. You see, in 1991, back when a 40-megabyte harddrive was considered average and a 500-megabyte one humongous, the RIFF/WAV format was designed to use 4 bytes to describe the size of the file. And as many of you know, one byte equals 8 bits, thus 4 bytes equals 32 bits. And herein lies the problem.

In binary, a 32-bit integer can represent 2^32 different values, ranging from 0 to 2^32 – 1. If we convert 2^32 – 1 to bytes, it equals 4.294.967.295 bytes, or exactly 4 GB (using the computer definition of a GB as 1.024 MB). This means that when your file size can only be stored in 32 bits, the file can be 4 GB at most. Now, at CD quality (2 channels of 44.1kHz 16bit audio), this is roughly 6 hours and 18 minutes, plenty of time for a good music album, which, generally, are about an hour long. But, when we’re talking crazy TRPTK levels of resolution, such as DXD 32bit, suddenly the file needs to be much, much shorter: around 23 minutes in stereo or 9 minutes in 5-channel surround. And how about immersive audio? A 5.1.4-channel, 352.8kHz 24bit DXD file could only be 4 minutes and 43 seconds long, much too short for a nice listening session right?

A solution needed to be found, and thankfully, were found!

Throwing more bits at the problem

As you can imagine, throwing more bits at the problem usually solves it, and it does here as well. If we were to double the length of the size field to 64 bits (8 bytes), we’re suddenly able to have files 2^64 or 16 exabytes long! At CD quality, that’s a little over half a million years of music. That should be plenty, right?

So how do we implement this? Well, there are two main formats that aim to extend the WAV format to the 64-bit age, namely RF64 and W64. Both are quite similar in most ways, but aren’t always compatible in the same systems.

Let’s start with RF64, or RIFF 64, the most popular of the two. Recognising the need for a solution of the file size issue, the European Broadcasting Union (EBU) played a key role in the development of RF64. The format was outlined in the EBU Technical Document 3306 and released in 2006, and it allowed for an extension of the standard WAV file to accommodate larger file sizes.

The format works as follows. First, the first four bytes are changed from “RIFF” to “BW64”. It also introduces a ds64 chunk (data size 64). A JUNK chunk ensures that non-RF64 compliant readers can still open the file, even if they can’t handle data beyond the 4GB mark. And lastly, because RF64 is adapted from the Broadcast Wave Format, which also uses some chunks to define metadata, generally for recording or broadcast purposes, it employs some metadata chunks such as axml, bxml, sxml, and chna for channel information. Other than these things mentioned, RF64 WAV is pretty much exactly the same as standard WAV.

W64, also known as Sony Wave64, was designed to also address the file size issue just like RF64, but much less focused on backwards compatibility than RF64. It was developed by Sonic Foundry, creators of the popular audio editing software Sound Forge, specifically for high-resolution audio. Unlike RF64, which carefully maintained a structure close to the original WAV to ensure compatibility, W64 took a different approach. It extended the chunk identifiers and size fields to 64 bits, providing more flexibility in defining chunks and allowing for much larger file sizes. The W64 format was not explicitly designed to be backward compatible with standard WAV readers, unlike RF64. Its structure and headers differ more significantly from the WAV format, making it a more distinct and independent format.

So which format to choose?

Originally, we had always offered our 32-bit downloads as WAV files, simply because FLAC only supports 24-bit audio (yes, we know, technically FLAC supports 32bit, but the official libraries for it unfortunately do not). Unbeknownst to us, these were mostly RF64, but the software we used at the time created standard WAV files for any file under 4GB. This led to some issues where some of our listeners would complain they cannot play back some files on their system, even though all the files showed up as WAV. Interestingly, we’ve asked a number of these listeners to try out some different files we’ve sent them, and some systems don’t play back RF64 but play W64 perfectly, and some the other way around.

For now, we’re sticking to the RF64 format for everything, to ensure the best quality. That being said, we’re currently experimenting with Apple’s ALAC format, which also supports up to 8 channels of 32-bit audio, even at DXD. It’s also about 30% smaller than their WAV counterparts, which improves downloading times (and our Amazon AWS bill) by a fair bit! It will need a bit more time, though, to ensure maximum compatibility for every system, or at least for most.

I hope you’ve enjoyed this little journey we took, exploring the different WAV file formats. Let us know if you have any more questions!

A little history lesson

The chunky stuff

Back to bits

Throwing more bits at the problem

So which format to choose?

TRPTK

About us

Artists

Shop

Updating…