100 Days Of ML Code — Day 097

100 Days Of ML Code — Day 097

100 Days Of ML Code — Day 097

Recap from day 096

In day 095 and 096 we talked about the way that we hear sound in space: interaural delay time, head related transfer function and we also talked about binaural recording and processing, which are very effective if we are just working with headphones. And then we talked about different speaker configurations available to us for a diffusion of sound and space through speakers.

You can catch up using the links below. 100 Days Of ML Code — Day 096 Recap from day 095medium.com 100 Days Of ML Code — Day 095 //medium.com

Today, we’ll talk about the question of how we actually take data and store it. How much space it takes up on our disk and different file formats that are available to us to manage that.

Digital Audio Storage

So, we’ve decided on our sampling rate and our bit-width, how many channels we need to represent audio for a particular situation. Now, how do we figure out how much space it takes up? that’s what we’re going to cover today.


We’re going to look at how to calculate storage space and then we’re going to look at different file formats to use of lossless file formats that preserve all of our amplitude values perfectly and lossy file formats that can save us a lot of disk space but loosen data in the process.

So, before we look through the file formats let’s just go through some very simple calculations here. let’s assume that we had 1 minute of audio, at 16-bits, 44,100 Hz and stereo, so two channels. How much disc space would this actually take up to store?

So we’ve got, 60 seconds, multiplied by 44,000 100 samples per second multiplied by16 bits, per sample multiplied by two channels, two samples, per moment in time. This comes out to about 84 million Bits.

Now before you start freaking out that’s bits, that’s not usually how we talk about digital data. So, if we convert 84 million to bytes we divide by 8 and then we’re going to go to kilobytes we would divide by 1024 and then if we wanted to go to megabytes, we’d divide by 1,024 again. And that number is going to end up coming out to be about 10 MB.

So, in order to store 1 minute of 16-bit 44100 Hz stereo sound, we need about 10 MB of disk space. So, how are we going to store this? Let's assume we’ve got plenty of space to store, that’s not an issue, we just want to store it on this.

The, easiest thing we can do, is just to, use a, a standard file format. Basically, it takes all the amplitude values, all the binary digits and just kind of plots them onto disk in a structured format. The two most popular formats for doing that these days are WAVE files and AIFF files.

There was a time long ago when WAVE was the Windows format and AIFF was the Apple format. In any music technology program, we’d be encountering these days, they would both support, they would all support both formats just as well. There’s a lot of more obscure formats that aren’t used nearly as much.

WAVE files and AIFF files are supported by just about every audio program out there. If we wanted to save some space, we could try to compress this data. And we could use a lossless compression format.

What a lossless compression format would do is something similar to what like a ZIP archive would do for other types of files. It would go through and it would try to re-encode all our amplitude values in a way that represents the most commonly used ones a little bit more efficiently, at the expense of representing some of the less frequently used ones less efficiently.

Using a technique like this, we could save usually about 50%. So, instead of our 10 MB per minute of CD-quality sound, we’d have about 5 MB to represent that same minute and the most popular format here is, is FLAC, that stands for Free Lossless Audio Codec.

That’s all for day 097. I hope you found this informative. Thank you for taking time out of your schedule and allowing me to be your guide on this journey. And until next time, be legendary.