This website is designed to teach enough digital signal processing to start your journey into programming audio software. Whether you want to create VST plugins for digital audio workstations, add complex audio to your website/application/game, or just want to pick up a new skill, this free course will give you a great foundation in DSP.
For those interested in what technology powers this site, we are using Github Pages to host the website, Tailwind CSS to style the pages, Desmos' API to create the interactive visualizations, and a slightly modified JS environment for the editor. The public GitHub repository can be found here.
This website is organized into a tree of dropdown menus, each expanding on subjects in higher level dropdowns (such as this "How to use this website" dropdown).
To get the most out of this site, it's recommended to start by reading everything (with "Show Everything" if you prefer). Once you have gone through the information once, specific information is easily accessable by navigating the dropdowns.
It seems pretty reasonable that to understand how to process digital audio, you must first understand what digital audio is. As you'll soon find out, digital audio is merely a category of a larger kind of digital data called signals. Signals are that new beat you spent all night working on, the readings from an EKG sensor, seismic data from geophones, and even images of cats on the internet! The point is that while this course is specifically geared towards digital audio, that is far from the only application of this course's material. If you want more information on DSP's applications in image processing and data compression, I'd recommend looking through Chapters 23 to 27 of Dr. Steven W. Smith's “The Scientist and Engineer's Guide to Digital Signal Processing”
What even is analog? Why does that older fellow in the music store keep insisting that his dusty old analog synthesizers sound so much warmer and natural than the modern digital trash? Does he even know what the word “analog” means? Since the D in DSP stands for “digital,” it only makes sense to start by explaining the difference. Before computers became the standard for audio processing, the main option for synthesizing and processing audio came in the form of dedicated electronic circuits. The sound waves in these circuits were represented through fluctuations of voltage over time. In other words, the fluctuations in voltage were an analog of the actual signal it represents (and that's why it's called "analog." The older meaning of the word "analog" is a noun and synonymous with words like "comparable" or "representation")! Since analog signals are represented by changes in energy, and physical levels of energy can't jump from one value to another in 0 units of time, the resulting signal is continuous. In other words, if you attached a polygraph machine to the source of an analog signal, and traced your finger across the resulting curve, you would never have to pick up your finger; the polygraph's position won't teleport from one position to another. To answer the question about the older fellow in the music store, there is some truth to his claim that analog synths can sound more natural. Electronic components like resistors and capacitors are prone to minor fluctuations in their current, voltage, and resistance. The result of this interference is noise which gives analog signals their characteristic warmth.
In comparison, digital signals are discrete. Rather than a continuous curve, the polygraph would show a set of evenly spaced points for a digital sound source. This method works well for computers as you can store a digital signal pretty efficiently on a hard drive, while you would need an infinite amount of storage space to store an analog one. Since computers store signals using lists of numbers that represent sound rather than the continuous voltage signals in analog circuits, digital signals don't suffer from the same noise interference that analog signals do. Since our application is audio programming, we will only focus on digital signals from now on.
So, what is this signal? A signal is a list of numbers over a fixed rate of time. In the case of audio, the numbers are known as the “samples” and they measure the “amplitude” at each sample's point in time. Typically, amplitude is denoted as a decimal value between -1 and 1. The way I like to think about amplitude is by imagining the cone of a speaker. An amplitude of 1 means the cone is pushed all the way out, and -1 is pulled all the way in. From the rapid movement of the cone's position over time, you get sound. The fixed rate of time is known as the “sample rate”, which is commonly measured in samples per second. The most common sample rate in music is 44.1khz, or 44,100 samples per second.
Without further ado, here's your first exercise. You might have noticed that for the examples, we have been using a sine wave. The filled-in code produces a sine wave, but a very slow one. You first want to speed that wave up so that it's audible (You can peek ahead to the frequency section if you get stuck). Next, try adjusting the amplitude of the wave (without adjusting the volume slider). Finally, try experimenting with adding sine waves of different speeds and amplitudes together. You can start the exercise here.
Computers, and especially older ones, have a hard time accurately storing non-whole numbers. Since a digital audio signal's values are non-whole numbers by nature, that creates a limitation on how precise each value can be. In DSP, this precision is known as the “Bit Depth” of a digital signal. The bit depth is merely the amount of bits required to hold said number (Here is a good resource to understanding bits and binary numbers). Most modern digital audio workstations use a standard bit depth of 32 bits, meaning that there are 232 possible values a sample can have between -1 and 1. Converting to lower bit depths forces samples' amplitudes to be quantized (snapped to the closest intermediate value). The effect of this is a noisier audio signal that sounds characteristically "retro" (since old hardware had lower bit depths).
Downsampler Starter Code
Downsampler Solution Code
If you're up for a bit of a harder challenge, you can try building a bitcrusher. A bitcrusher is a common effect that takes a signal with a high bit depth and “crushes” it down to a lower bit depth by rounding the samples to the nearest evenly spaced value. For example, if the bit depth was set to 2, the values that the wave will snap to are -1, -0.5, 0.5, and 1. For a bit depth of 3, it would be -1, -0.75, -0.5, -0.25, …, 0.75, 1. In the starter code, you are given two variables: a sine wave signal (wave) and a bit depth (b). Your goal is to quantize the signal into 2b evenly spaced values between 0 and 1. (Hint: using the round() function could be useful if you could change the range of the wave to make it work).
Bitcrusher Starter Code
Bitcrusher Solution Code
Until now, we have only been looking at digital audio in the time domain (where time is the horizontal axis of the graph), but a great deal of information can be dissected from the frequency domain as well. To introduce frequency, let's think about this cat spinning on a record player.
We can measure the “frequency” by finding how many times the cat makes a full rotation per unit time. The standard unit for frequency is the hertz (hz), or cycles/second. Since the cat is spinning on a 33 rpm (revolutions per minute) record, we can find the revolutions per second by dividing 33 by 60. Therefore, we can find that the cat is spinning at a frequency of 0.55 hz, or roughly one full rotation for every 1.8 seconds.
The best equivalent to this spinning cat in the world of DSP is the sine wave. Much like the GIF of the cat, sine waves are periodic, or repeat infinitely in a very predictable manner. Because of this property, we can measure a sine wave's frequency in a similar way. The frequency of a sine wave is 1 divided by the “period” of the wave, a.k.a. the time it takes to complete one cycle.
While frequency is the number of cycles per second, phase is the starting position of the cycle. Phase is measured in either degrees or radians depending on the context (in this section, we use degrees). Yet again, let's think about these two gifs of cats spinning on a record player.
Both cats are spinning at the same frequency, and both GIFs are 28 frames long, but what differentiates the two felines is their starting position on the first frame. Cat number one starts facing the camera, while cat number two starts by facing away from the camera. In DSP terms, we would say that these cats are “out of phase” with each other. Now, compare that to these cats below.
This time, the cats are “in phase” with each other because they start at the same point in their cycle around the record player. In the context of DSP, signals that are added together can either interfere constructively or destructively depending on each signal's respective phases. If you play around with the graph below, you'll see how the individual sine waves constructively interfere when they are at the same phase, and weaken the signal the more they differ. If you put one wave at 0 degrees, and the other at 180 degrees, the two waves cancel each other out completely (this phenomenon has the convenient name of “phase cancellation”).
While a sine wave's signal can be described as having a single frequency, most audio signals are not simple sine waves. For instance, take a look this signal which is the sum of two sine waves where the second has a higher frequency and lower amplitude than the first:
Instead of assigning this complicated signal a single frequency value, it can be thought of as having two coexisting frequencies. If you graph this signal in the frequency domain instead of the time domain, you can see two distinct peaks which represent the amplitude of each sine wave in the original signal.
The horizontal position of each peak represents the frequency of each sine wave (higher frequencies are further right than lower ones), and the height of each peak represents the amplitude of each sine wave (amplitude in this case is the value which each sine function is multiplied by). The various smaller peaks are just debris from the method used to convert this signal from the time domain to the frequency domain: the Fourier Transform! In the 1800s, French mathematician Joseph Fourier discovered that ANY signal can be decomposed into a sum of many (or few) sine waves with different amplitudes, frequencies, and phases. The first wave is known as the “DC component," and is a flat line that represents the average amplitude of the signal. Each succeeding sinusoidal wave has a frequency that is an integer multiple of the first wave after the DC component (DC can be thought of as having a frequency of 0 for convenience). In some cases, this sum can be infinite, but for digital signals, there is a finite limit (this idea will be explored when we talk about the Nyquist Sampling Theorem). As this is an introductory exploration into DSP, we aren't going to go into the details of how the Fourier Transform works specifically. If you are interested in that, 3blue1brown has a great video on the how of Fourier.
As we briefly touched on in the previous section, a lot of language surrounding frequency and the frequency domain is abstracted from the actual concepts. The ideas of harmonics, pitch, and timbre are intimately tied with the idea of frequency, even if the hobbyist musician doesn't necessarily think of them in this way. For example, if you ever hear a musician talking about “harmonics”, they are merely talking about what you know as the frequency components of a sound. For any given sound, there is a “fundamental” harmonic/frequency, which is the lowest frequency of a sound, and then its “overtones”, or subsequent harmonics above the fundamental. The “pitch” of a sound is determined by the frequency of the fundamental, while the timbre (a fancy french word for the sound of a sound) is determined by the overtones. In the example below, you can add overtones to a fundamental sine wave and watch the signal morph into what is called a “square wave”, which is a signal that periodically alternates between a positive and negative value, such that the signal is positive 50% of the time and negative the other 50% of the time.
Below is a similar example, but this time, the harmonics added turn the fundamental sine wave into a “sawtooth wave”, which is given its name by how it looks like the teeth of a saw blade.
So what makes these signals look different? In the case of the sawtooth wave, the overtones are added at every integer multiple of the fundamental frequency, at an amplitude of 1 divided by the number of the harmonic (the first overtone will be half as strong as the fundamental, the second will be one third, and so on). For the square wave, only the harmonics with a frequency with an odd multiple of the fundamental are added. For more complicated sounds, such as guitars and pianos, these harmonics will be much less orderly than basic shapes such as square and sawtooth waves, and are subject to change over time. For example, the start of a piano has a lot of high-frequency content as the mallet hits the string, but over time, those higher harmonics are dampened out.
The Nyquist Sampling Theorem connects two of the concepts that we touched on previously: sample rate, and frequency. One of the main limitations of digital audio is the fact that it can only be precise to a finite amount. We explored this idea in the section about bit depth by talking about how lower bit depths quantize a digital signal, but there exists another form of digital information loss known as aliasing. When digitally sampling an analog signal, the highest frequency that a system can accurately store is equal to one half of the sampling rate of the system. This frequency is known as the “nyquist frequency”. This theorem is one of the main reasons why the most common sampling rate for digitally storing music is 44.1khz. Humans can only hear sounds up to about 20khz, which is close to half of 44.1khz (with a bit more room for good measure). When a wave containing frequencies higher than a sampler's nyquist frequency is sampled, aliasing occurs, where extra frequency content is introduced. By interacting with the graph below, you can see the effects that sampling a sine wave at different frequencies and sample rates has. The two sliders are scaled such that when they are lined up, the sine wave will be at the sampling rate's corresponding nyquist frequency. Pushing the frequency slider to the right of the sample rate slider shows the effects of aliasing, where the samples collected incorrectly introduce a sine wave that didn't exist in the source signal.
Now that we understand how frequencies work, we can now make basic shapes other than sine waves using a process known as Additive Synthesis. Additive Synthesis is merely the process of creating sounds using the sum of sine waves at different frequencies, phases, and amplitudes. In the starter code, we are using a for loop to create a sawtooth wave, where each harmonic is at an amplitude of 1 divided by the harmonic number. Your first challenge is to make this code produce a square wave. If you remember from the “Vocabulary of Frequency” section, a square wave contains only odd-numbered harmonics. If you feel good about that, try producing a triangle wave. Like square waves, triangle waves only contain odd harmonics. What differs is that the phase of each harmonic is a 180-degree turn from the previous harmonic (which is equivalent to multiplying the wave by -1). Additionally, the amplitude of each harmonic dampens by 1 over the harmonic number squared (or the harmonic number to the -2nd power). In this exercise, we have also included possible solutions for each challenge with comments explaining how they work.
Filters are any system that takes a single input signal (commonly denoted as x[n] in DSP-world), and returns a single output signal (denoted as y[n]).
Before we jump into filters themselves, it would be best to explain a concept that dictates a lot of the things we will be able to do later. Pretty much everything that we assume in this section hinges on the condition that the filter is LTI. LTI stands for Linear (and) Time-Invariant, and is a set of rules about how a system can output based on an input (a system is a more general word for this concept, but we'll be looking at filters specifically). There are three rules in total: Additivity, Homogeneity (these two being bundled under the term “Linearity”), and Time Invariance. The additive property states that two signals passed through a filter and then added together is equivalent to those two signals added together first and then passed through the filter. If a filter does not have this property, it is not additive. The homogeneous property states that a change in an input signal's amplitude results in a corresponding change in amplitude in the output signal. For example, if you change the amplitude of an input signal into a homogenous filter to twice what it was before, the output will also be twice the amplitude. The last property, time invariance, states that if a system receives an input shifted in time by k samples, then the only change in the output will be a shift by k samples. If a filter has all 3 of these properties (or any one-input one-output system for that matter), it is LTI. Below are some examples of different filters, and whether they are LTI or not.
To start our exploration of filters, we'll be studying the moving average filter. What a moving average filter does is store the past n samples of the input in an array, and then return the average of them. If you run the example here, you'll see that this has the effect of smoothing out the input sound and taming the high-frequency content. If you play around with the value of n, you'll see that higher values of n results in less high-end, and more little notches in magnitude. Why is that?
One really powerful way of analyzing filters is through response curves. The two common response curves in DSP are the impulse response and the frequency response. The impulse response shows how a filter responds in the time domain by graphing how a single impulse of amplitude 1 at time 0 behaves in a filter. This “impulse” has many names in DSP, such as the unit impulse function, the delta function, Dirac's delta function, δ[n], but the name we'll be using in this guide is the unit impulse. The frequency response curve shows how a unit impulse responds in the frequency domain (you can also think of the frequency response being the Fourier Transform of the Impulse Response if you'd like). For the moving average filter, this impulse response can be done pretty simply in your head. If you are taking the average of k samples at time 0, one of those samples has an amplitude of 1, and the rest are at 0, the average will just be 1 divided by k. Once the moving average buffer moves out of the range of that impulse at time 0 (or when time ≥ k), the average will just be 0.
The frequency response curve, however, is a bit more complicated. The response can be graphed using this equation which is derived from taking the Fourier Transform of k-1:
The input variable, ω (omega), is the frequency as a fraction of the sampling rate. For example, if the frequency in hz is 20hz, and the sampling rate is 80hz, the corresponding ω value would be ¼. The k value here represents the buffer length of the moving average filter, and the output is the amplitude of a frequency ω passed through the moving average filter.
Convolution, for lack of a better term, can be a pretty convoluted concept in DSP. Convolution is a binary operation. It takes in two signals, the input, and the kernel, and returns a singular output signal that blends the two signals together. Since this operation is commutative, there is really no difference mathematically which signal is the input and which is the kernel. However, the common convention in audio programming is to treat the input as the source to be filtered and the kernel as a parameter of said filter (more on why that is later). Here is the equation for the convolution of an input signal x[n] and kernel h[n] (the asterisk is the standard symbol for convolution):
In the convolution operation, the kernel is first flipped horizontally and then shifted by a horizontal offset n. The resulting output is the inner product of the input, and this flipped and shifted kernel (the inner product is when the samples at each index are multiplied together, and then all summed up).
In the animation above, you can see the input wave in yellow being convoluted by the 3-point kernel in blue, resulting in the red output. This results in another sine wave of different phases and amplitudes. For inputs in kernels of different sizes, any place where the kernel or input is undefined is assumed to be 0, so in this case, the only 3 points that are multiplied and then added are the points where the kernel is. In the graph below, you can play around with the 3 points of the kernel, and see what the convolution will be.
Convolution has a property that makes it amazingly useful when it comes to filter design and implementation. That property is that any LTI system can be represented as a convolution operation. To understand why this is, let's look at a special example of convolution. We are going to make the kernel of the convolution equation the unit impulse function:
As you can see, using the unit impulse as the kernel does not change the input signal at all. In math, this is known as an Identity. Much like how multiplying a number by 1 or adding 0 doesn't change the number, convoluting a signal by the unit impulse doesn't change the signal. We can use this property to our advantage by decomposing our input signal into a sum of much smaller signals. For our purposes, we will decompose the signal into a sum of shifted and amplified unit impulses. Each sample in the time domain can be expressed in terms of a unit impulse that is shifted to the position of each sample, and then multiplied by the amplitude of that sample. In equation form, this decomposition looks like this (δ[n] being the unit impulse function and N being the length of the signal):
Since we know how a unit impulse responds through any given filter through its impulse response, we can now replace δ[n] with our impulse response. We are able to do this because the response will be the same regardless of if we add the impulses or filter them first (additivity), adjust the amplitude of the impulse (homogeneity), or shift the signals in the time domain (time invariance):
In the equation above, H is our filter and δh[n] is the impulse response of filter H. Finally, we can rewrite the right-hand side as a summation equation:
Look familiar? Yup. It's the convolution equation. What this tells us is that putting a signal through an LTI filter is an equivalent operation to convolving the signal with the filter's impulse response. This also has the implication that all information about a filter can be stored in just an impulse response. If you take a look at the frequency response of the unit impulse, this should make intuitive sense why:
The unit impulse signal contains all frequencies at the same magnitude of 1, with no deviations in phase. This means that when a unit impulse is run through a filter, the output represents how any signal will change in the filter because the unit impulse's response contains information about the response of every frequency.
While the moving average filter is simple, it isn't too great in separating content by frequency as it leaves those little notches in the frequency domain. Ideally, what we want is a “brick-wall filter”, where every frequency above a cutoff frequency is completely cut out. In the graph below, you'll see what this ideal frequency response looks like, and how the corresponding impulse response looks as well.
This impulse response can be calculated by taking the Inverse Fourier Transform of the frequency response curve. The general equation for this type of response is:
where fc is the cutoff frequency as a fraction of the sample rate. This function shows up so frequently in DSP that it has been given its own special name - the sinc function.
Using what we learned in our lesson on convolution, we should just be able to plug in the sinc function as our kernel and we have our filter, right? Well, there's a slight problem with that. The sinc function goes on forever in both directions, meaning that to convolve a signal with the sinc function, we would need to multiply an infinite number of times for every sample. To get around this, we need to “window” the function, which restricts the domain of the function in order to make it finite. The simplest window function is a rectangular window, which sets all samples more than M samples away from t=0 to 0. You can see how this works in practice with this code (try messing with the cutoff variable, fc, and the stopband variable, M).
While this does work, the frequency response contains a huge boost before rolling off, and the phase is nearly inverted at that cutoff as well. This can be attributed to the very sudden snap to a horizontal line of 0 amplitude at t = M on. What we'll try next is what's known as the “hamming window”. Unlike the rectangular window, the hamming window eases the signal to 0 rather than snap it after a certain point. Here's the equation for the hamming window:
In the graph below, you can play with the cutoff frequency and stopband (M) to see the corresponding hamming window, and in this code here, you can hear how different cutoffs and stopband values filter a white noise signal.
While this window does give more control over the speed of the roll-off through the stopband value, there is a pretty apparent resonance that forms at the cutoff frequency. The phase response isn't great either, with both a huge divot and a peak at the cutoff frequency.
So, what is the magic solution? Well, there really isn't one for this method. Every window has its own offshoots, roll-offs, and ripples that make them unique. Fortunately, in the context of audio programming, this is ok! There are times when you might want that resonating frequency at the cutoff frequency of a hamming window or the roll-off of a rectangular window. The joy of audio programming is that it's one of the more creative applications of math and programming, so why not play around a little?
For this challenge, you're going to take the example code from the Hamming Window, and use it to make a filter using the Blackman Window. The equation for the Blackman Window is this:
Up until now, all of the filters we have looked at were “low-pass” filters, where the frequencies allowed through are below the cutoff frequency. Now, we're going to use what we've learned previously to find a kernel for a highpass filter. To start, we'll use the window-sinc method for this exercise, only now, the ideal brick-wall frequency response is different. Instead of the amplitude being 1 above fc, the amplitude with be 1 below fc.
So for this equation, what would the corresponding impulse response be? We can take advantage of the fact that the fourier transform (and its inverse) are linear systems, meaning they both follow rules of additivity and linearity. To start, we can think about how the frequency response of the brick-wall high pass in terms of its corresponding lowpass frequency response. If we look at the low-pass response's graph, we can see that if we can see that by multiplying by -1, and then adding 1, we get the high pass graph. More formally, we can write this equation:
Now, since we want the impulse response from the frequency response, we'll want to take the inverse fourier transform on both sides, and since this operation is linear, we can rewrite the right side as two separate inverse fourier transforms (the homogeneous rule being applied by the -1 being multiplied by Blo(n), and the additive rule being applied by first taking the inverse fourier transforms and then adding the two terms):
As we discussed in our look into window sinc filter design, the inverse fourier transform of a lowpass brick-wall frequency response is the sinc function, but what's less obvious is what the inverse fourier transform of 1 is. If you remember the little intuition we gave at the end of the convolution section, you'll remember that the unit impulse has a magnitude of 1 at every frequency, or in other words, the fourier transform of the unit impulse is a horizontal line at 1. Therefore, the inverse fourier transform of 1 will just be the unit impulse. Finally, we can write the solution to the inverse fourier transform of the high-pass brick-wall frequency response at the unit impulse subtracted by the sinc function:
Luckily for us, subtracting the impulse response from the unit impulse is a general solution for finding the corresponding high-pass filter for any low-pass filter (for the ones we've looked at so far, at least). Here is the code for a high pass hamming window filter if you're interested in how the implementation looks.
High-pass filters are cool, but there are more than just the two. For instance, if you run a signal through a high-pass and a low-pass filter, you'll get a band-pass filter, which subdues frequencies outside a specific “band” between the high-pass cutoff frequency and low-pass cutoff frequency (assuming the high-pass cutoff is less than the low-pass cutoff). In fact, since convolution is an associative operation (the order in which you do successive convolutions doesn't matter), you can actually convolve the impulse responses of the high-pass and low-pass filter first, and then achieve the same effect with half of the multiplications required every sample. If instead, the high-pass cutoff is higher than the low-pass cutoff, you'll get a band-reject filter, where everything inside the band of frequencies between the cutoffs is subdued. If you're up for a final challenge, try implementing a band pass / band reject filter using the code provided above.
As cool as the basic shapes might sound, they are, for lack of a better term, a little basic. Unless you're making music for a game on an atari 2600, you'll probably want more dynamic sounds than just square and sawtooth waves. One of the ways that you can escape the depths of monotony is through changing the parameters of a signal over time, or in other words: modulating the parameters. For an introductory example to modulation, consider how you would make a sine-wave tone fade in and out over time. One way is to multiply the tone by a much slower sine wave:
What if we speed up an LFO such that it oscillates at the same rate as an audible tone? Well, the “O” wouldn’t be so “LF” anymore; it would just be a regular oscillator. The more general term for modulating the amplitude of one wave by another, regardless of frequency, is AM (amplitude modulation). Other common types of modulation are FM (frequency modulation), and PM (phase modulation), where one wave changes the frequency and phase of another respectively. Modulating signals like this is one (of many) approaches to creating waveforms with much more complicated timbres than those of the basic waveforms. For example, this uses AM synthesis to generate a screechy metalic sound. The initial 220hz sine wave is multiplied by three different waves at 400, 500, and 700hz respectively.
The function sin(x) (where x is radians) makes a full oscillation for every 2π increase in the input. If the input increases by one unit per second (like our time variable), then the sine wave oscillates once every 2π seconds. You can multiply the input by 2π to make it oscillate once per second, or at 1 hz. From here you can produce a sine wave that produces a tone at any arbitrary frequency in hz via multiplying the input by the desired frequency. For instance, sin(2π*x*440) will produce an output tone at 440 hz. This frequency (440 hz) also happens to be the frequency of the musical note A4 (this is just an arbitrarily decided standard). To get A5, the “A” note that’s one octave above A4, you just have to multiply the frequency of A4 by two (I.E. the frequency of A5 is 880 hz). To get notes other than “A” within the tuning system that most music in the western world uses (12-TET), you simply multiply or divide the note’s frequency by the twelfth root of two to go up or down by one “semitone” which is the distance between adjacent notes. Multiplying “A” by the twelfth root of two results in A# (“A” sharp) or Bb (“B” flat) which are two names for the same note. Multiplying once again results in B, and multiplying a third time results in C rather than B# or Cb like you'd expect from the prior pattern. To see why this exception happens, take a look at the two octaves of this keyboard:
Not all the natural (white-key) notes have sharps/flats between them. Instead, “B” and “C” along with “E” and “F” are directly adjacent to each other. To play multiple notes at once, you can add sine waves of the various frequencies you want to play together. The reason that the twelfth root of two is used is because there are 12 notes in an octave, and you increase a frequency by one octive via multiplying 2; multiplying by 12 semitones is the same as multiplying by one octave.
A melody (at its simplest) is just different tones of multiple frequencies being played in a sequence. Try composing a melody by filling an array with notes! The notes supplied are just frequencies to put inside a sine function.
This idea can quite easily be expanded to playing chords and sequences of chords.
Amplitude refers to either the value of a single sample, or to the maximum value of an entire waveform, and it is a perfectly fine measurement for most cases. We’ve been using amplitude to measure signals so far, but another common measurement used in the field of DSP is that of the decibel. Decibels are a logarithmic unit with respect to amplitude, and that means linear changes in dB correspond to multiplicative changes in amplitude. In particular, an increase of about 6 dB is the same as doubling the amplitude, and likewise, a decrease of about 6 dB is the same as halving the amplitude. In the context of digital audio, decibels range from negative infinity to zero where zero is the loudest possible signal, AKA: an amplitude of positive or negative one. Since human senses are logarithmic rather than linear, decibels are a useful unit for capturing how “loud” a signal is. To demonstrate this, listen to this loop of a tone playing at four different amplitudes (not counting the amplitudes of zero). The increase of loudness between the first pair of amplitudes seems greater than that of the second pair of amplitudes. However, the increase in both cases is an equivalent 0.2. If you look at the increase using decibels instead of amplitude, the first increase is around 9.5 db, and the second is only around 1.9 db. The effect is more pronounced if you listen to this louder (but don’t listen too loud). You can convert amplitude to decibels via:
And the inverse is:
This all being said, decibels still don’t capture loudness perfectly, because different frequencies of equal amplitude are perceived as being different in loudness. Listen to this sine wave sweeping across the frequency spectrum! Its amplitude remains the same, but the extreme lows and extreme highs still sound quieter than the middle frequencies. A unit that takes frequencies into account to be more accurate to loudness is LUFS, but the specifics of LUFS won’t be covered here (it’s not as important or common as amplitude and decibels).
The term “noise” in the context of DSP refers to a signal with some element of randomness involved in its creation. The easiest type of noise to make is called “white noise,” and it’s just a (pseudo) random number generated for every sample. Take a look at white noise in the time and frequency domains. In the time domain, the signal is just a noisy mess, but the frequency domain is worth taking a closer look at. The properties of noise in the frequency domain is usually what’s used to differentiate between different types or “colors” of noise, and in the case of white noise, the frequency response is a (roughly) flat line with its height corresponding to amplitude of the noise in question. Another simple color of noise is “brown noise,” which is just the integral of white noise in the time domain. The frequency response of brown noise is a line with a slope of negative 6.02 decibels per octave, or in other words: for every increase of frequency by a factor of two, the amplitude decreases by a factor of two. Another common type of noise you’ll come across in the field of DSP is “pink noise,” which is similar to brown noise, but has half the slope (negative 3.02 decibels per octave). Pink noise has the useful property that each octave is roughly consistent in volume or loudness rather than amplitude. Another common type of noise is “blue noise,” and it’s particularly common in fields other than DSP (such as game development, and computer graphics). As far as DSP is concerned, blue noise is similar to the aforementioned colors of noise except that its slope in the frequency domain is positive 3.01 decibels per octave (the opposite of pink noise).
Making music via raw DSP is really just combining the tools you’ve learned about in interesting ways! To start making the demonstrated loop, start with a variable to which we will add all our individual instruments’ signals. The first instrument in question will be what’s called a “hi-hat” or just “hat” for short. A hi-hat is a percussive instrument with a noisy, atonal, high-frequency sound. To simulate one, you can just apply an LFO to whitenoise. A neat trick being used here is the multiplication of two LFOs to shape the overall final LFO. I implore you to output the LFO itself to see how it behaves. Not just for this hi-hat, but for any LFO or signal you want to understand better. The next instrument will be another percussive instrument known as the kick-drum or “kick” for short. The process of creating it is similar to that of the hi-hat, but the LFO is much steeper and it controls frequency as well as amplitude. The next instrument is an interesting one: a synthy bass sound with its timbre created via phase modulation! On top of that, the notes from C to B are included along with the values for changing a note by a semitone or a wholetone (two semitones). A variable is also used to keep track of whether the current measure is odd so that every other measure has a slight variation. Finally an arpeggio, or “arp” for short, is generated using amplitude modulation. All in all, while the entirety of this loop is a complicated program with many separate parts, you should be able to recognize each individual tool used to construct this loop.
The code in this editor is executed in a function that is called for every processed sample; you just have to return a number for each sample.