DFT audio filter

Ok, I was tired of using Audacity filters, so it is time to build our own low-pass filter.

AFAIK there are three main ways to design a digital filter, like a lowpass filter that we needed in AM modulation. The most common is a FIR filter. Another way is an IIR filter. And a third way is using Discrete Fourier Transform (DFT).

In this post, I will use the easiest way, which is DFT. Some post in the future will show a FIR version.

DFT can be implemented by a fast algorithm called FFT (Fast Fourier Transform), and most mathematical libraries have a FFT implementation. Python-Numeric is no exception.

DFT converts a time-domain signal, like an audio wave, to a frequency-domain representation. For example, the "spectrum analyzer" feature found in most PC MP3 players (like the iTunes picture above) and some mini-systems is (or could be) a direct result of DFT calculation.

Apart from showing this fancy spectrum analyzer, DFT has countless other applications.

DFT is not an one-way signal processing feature. Given a frequency domain representation, we can calculate the IDFT (Inverse DFT), to get back the time-domain wave. So, we can get some audio, transform into DFT, manipulate the frequencies (e.g. increasing bass, decreasing midrange, those equalizer things) and then convert back into time-domain.

In our case, we want to build a lowpass/highpass filter, so we just have to zero out the frequencies we don't want, in frequency domain. Such sharp cutting is NOT a good filter design; it causes distortions. But it is ok for a simple demonstration.

Converting to/from frequency domain seems too much work, and in fact there are ways to do filtering without this. FIR filters operate directly on time-domain samples, so they are normally the choice for practical filter applications. But they are not so easy to understand, so this will have to wait for a next post.

For this post, I used a female singer's voice that I like much: Núbia Lafayette. Another reason I used this particular sample is that I own the CD, so nobody can complain about piracy etc. Let's see the original audio:

(link to wav)

As usual, if your browser does not understand EMBED tags, you can get the file directly from http://epx.com.br/wav.

Now, the filtered version (1000-4000Hz bandpass filter), which sounds like a very small radio receiver:

(link to wav)

And the code which does the trick:

#!/usr/bin/env python

LOWPASS = 4000 # Hz
HIGHPASS = 1000 # Hz
SAMPLE_RATE = 44100 # Hz

import wave, struct, math
from numpy import fft
FFT_LENGTH = 2048

# The higher the better, but this seems enough
OVERLAP = 512

FFT_SAMPLE = FFT_LENGTH - OVERLAP

# Nyquist rate = bandwidth available for a given sample rate
NYQUIST_RATE = SAMPLE_RATE / 2.0

# Convert frequencies from Hz to our digital sampling units
LOWPASS /= (NYQUIST_RATE / (FFT_LENGTH / 2.0))
HIGHPASS /= (NYQUIST_RATE / (FFT_LENGTH / 2.0))

zeros = [ 0 for x in range(0, OVERLAP) ]

# Builds filter mask. Note that this filter is BAD, a
# good filter must have a smooth curve, respecting a
# dB/octave attenuation ramp!
mask = []
for f in range(0, FFT_LENGTH / 2 + 1):
    rampdown = 1.0
    if f > LOWPASS:
        rampdown = 0.0
    elif f < HIGHPASS:
        rampdown = 0.0
    mask.append(rampdown)

def bound(sample):
    # takes care of "burnt" samples, due some mistake in filter design
    if sample > 1.0:
        print "!",
        sample = 1.0
    elif sample < -1.0:
        print "!",
        sample = -1.0
    return sample

original = wave.open("NubiaCantaDalva.wav", "r")
filtered = wave.open("NubiaFiltered.wav", "w")
filtered.setnchannels(1)
filtered.setsampwidth(2)
filtered.setframerate(SAMPLE_RATE)

n = original.getnframes()
original = struct.unpack('%dh' % n, original.readframes(n))
# scale from 16-bit signed WAV to float
original = [s / 32768.0 for s in original]

saved_td = zeros

for pos in range(0, len(original), FFT_SAMPLE):
    time_sample = original[pos : pos + FFT_LENGTH]

    # convert frame to frequency domain representation
    frequency_domain = fft.fft(time_sample, FFT_LENGTH)
    l = len(frequency_domain)

    # mask positive frequencies (f[0] is DC component)
    for f in range(0, l/2+1):
        frequency_domain[f] *= mask[f]

    # mask negative frequencies
    # http://stackoverflow.com/questions/933088/clipping-fft-matrix
    for f in range(l-1, l/2, -1):
        cf = l - f
        frequency_domain[f] *= mask[cf]

    # convert frame back to time domain
    time_domain = fft.ifft(frequency_domain)

    # modified overlap-add logic: previously saved samples
    # prevail in the beginning, and they are ramped down
    # in favor of this frame's samples, to avoid 'clicks'
    # in either end of frame.
    for i in range(0, OVERLAP):
        time_domain[i] *= (i + 0.0) / OVERLAP
        time_domain[i] += saved_td[i] * (1.0 - (i + 0.00) / OVERLAP)

    # reserve last samples for the next frame
    saved_td = time_domain[FFT_SAMPLE:]
    # do not write reserved samples right now
    time_domain = time_domain[:FFT_SAMPLE]

    # scale back into WAV 16-bit and write
    time_domain = [ bound(sample) * 32767 for sample in time_domain ]
    filtered.writeframes(struct.pack('%dh' % len(time_domain),
                                     *time_domain))

There is one twist in this code that is worth mentioning. Since any filtering you do in frequency domain will cause phase distortion, you can't just piggyback the processed audio frames one after another, because the audio wave will not transition smoothly from one frame to the next. It will have a different phase at the next frame, so there will be a sharp discontinuity, very annoying to the ears. (If you don't believe me, you can just set OVERLAP to zero and see what happens.)

Both textbook solutions for this (overlap-save and overlap-add) seem not to solve this problem. I have modified the overlap-add method, which apparently solved the issue. My solution fades in the current frame while fading out the end of previous frame. Maybe there is a better solution for this, but I didn't find any.

Another thing that must be remembered by anyone that wants to play with FFT, is that frequency representation is linear. We are used to logarithmic spectrum analyzers, which match the octaves in music, but FFT is different. This means that frequency resolution of a FFT-generated array is much more precise at low octaves than at high octaves.

FFT generates two frequency "lobes": positive and negative frequencies. This sounds very strange, because there can't be negative frequencies in physical world, but DFT has this mathematical feature. In our filter, we had two choices: just zero out negative frequencies (which reduces the final audio volume by half), or filter them in the same fashion as positive frequencies, as we do.

The array generated by FFT consists of complex numbers. This sounds scary but it has a valid purpose: a complex number can represent amplitude AND phase in a single shot. For example, if we find 1+0j in the 3000Hz 'bar', it means that audio has a 3000Hz component with amplitiude 1.0 and in phase with cos(x). If the sample were 0.707+0.707j, it means that the 3000Hz component still has amplitude = 1 (the modulus) but it is 45 degrees ahead of cos(x).

Some applications, for example the spectrum analyzer, do not care about signal phase, so they could employ DCT (discrete cosine transform) whose output is an array of real numbers. The original waveform can be obtained by IDCT, but of course the reconstructed signal will be phase-distorted.

blog comments powered by Disqus