Where's that 2 coming from?
I have a question somewhat less ordinary, that's been bugging me for a while and can't figure out the answer.
Looking at the
SoundTouch web page explanation of the time and pitch scaling -
http://www.surina.net/article/time-and-pitch-scaling.html - there's a simple formula that confuses me.
It states that given the sampling frequency f
s, a Discrete Fourier Transform of N points gives N/2 equally spaced frequency "bins" of width f
s/(2*N).
My question is: where this factor 2 in the denominator comes from; shouldn't it be N/2 bins spaced at f
s/N?
Using a discrete FT, it transforms input signal, or better yet N-dimensional vector
x into its frequency domain representation
X. Thus F[
x] =
X.
But as the audio data is a real signal without imaginary components, i.e.
Im{x[n]} = 0 for n = 0, 1, ..., N-1, there is a redundancy in
X.
First, X[k] = X[-k]* (complex conjugate), but indices are taken modulo N, so X[k] = X[N-k]*. The immediate consequence of this is at index 0 and N/2. For the "DC" bin we have X[0] = X[N-0]* = X[0]*, therefore its imaginary part is 0. similar with the "Nyquist" bin X[N/2] = X[N-N/2]* = X[N/2]*, so it is also a pure real number.
Other values that are not redundant are X[1], X[2], ... X[N/2 - 1] giving N/2 - 1 real and N/2 - 1 imaginary coefficients, totalling to N - 2. Those numbers along with the real "DC" and the "Nyquist" term give exactly N non-redundant coefficients, as expected.
Now, back to the frequency resolution. For a signal sampled at f
s, the highest representable frequency (per Shannon)*** is f
s/2. Taking N points of such signal, f
s/2 is at the Nyquist bin N/2 (N/2 cycles per N samples). All bins are equally spaced, so the spacing (or resolution) is (f
s/2)/(N/2) = f
s/N.
The other way to look at this is to take coefficient X[1], representing exactly 1 full cycle (of a complex sinusoid) per N samples. Those N samples are N/f
s seconds apart, so the first bin corresponds to the frequency f
s/N. The X[2] corresponds to two full cycles each of length N/(2*f
s) and the frequency of 2*f
s/N.
Generalizing, X[m] (m < N/2) corresponds to frequency m*f
s/N (m full cycles per N points), and finally the Nyquist X[N/2] corresponds to N/2*f
s/N = f
s/2, the highest representable frequency.
Again, the spacing between frequencies is f
s/N. :bang:
For example, taking f
s = 44100Hz and N = 44100, the resolution is 1Hz up to 22050Hz, as expected. But according to the page cited above, the resolution would be 0.5Hz up to the 0.5*44100/2 = 11025Hz?!?
Can someone please enlighten me, shed some light where I might be wrong or where I missed that 2 in the denominator?
It's far from a strict, formal description, but should suffice for clearing this "extraneous 2" problem. Plus, this forum doesn't have a specialized math-editor, so "writing formal" would be quite tedious.
Any help or pointers would be greatly appreciated.
Regards.
________
*** - Actually, the sampling frequency must exceed twice the maximum frequency contained in the sampled signal, i.e. there must be a strict inequality.