Why does FFT produce complex numbers instead of real numbers? - algorithm

All the FFT implementations we have come across result in complex values (with real and imaginary parts), even if the input to the algorithm was a discrete set of real numbers (integers).
Is it not possible to represent frequency domain in terms of real numbers only?

The FFT is fundamentally a change of basis. The basis into which the FFT changes your original signal is a set of sine waves instead. In order for that basis to describe all the possible inputs it needs to be able to represent phase as well as amplitude; the phase is represented using complex numbers.
For example, suppose you FFT a signal containing only a single sine wave. Depending on phase you might well get an entirely real FFT result. But if you shift the phase of your input a few degrees, how else can the FFT output represent that input?
edit: This is a somewhat loose explanation, but I'm just trying to motivate the intuition.

The FFT provides you with amplitude and phase. The amplitude is encoded as the magnitude of the complex number (sqrt(x^2+y^2)) while the phase is encoded as the angle (atan2(y,x)). To have a strictly real result from the FFT, the incoming signal must have even symmetry (i.e. x[n]=conj(x[N-n])).
If all you care about is intensity, the magnitude of the complex number is sufficient for analysis.

Yes, it is possible to represent the FFT frequency domain results of strictly real input using only real numbers.
Those complex numbers in the FFT result are simply just 2 real numbers, which are both required to give you the 2D coordinates of a result vector that has both a length and a direction angle (or magnitude and a phase). And every frequency component in the FFT result can have a unique amplitude and a unique phase (relative to some point in the FFT aperture).
One real number alone can't represent both magnitude and phase. If you throw away the phase information, that could easily massively distort the signal if you try to recreate it using an iFFT (and the signal isn't symmetric). So a complete FFT result requires 2 real numbers per FFT bin. These 2 real numbers are bundled together in some FFTs in a complex data type by common convention, but the FFT result could easily (and some FFTs do) just produce 2 real vectors (one for cosine coordinates and one for sine coordinates).
There are also FFT routines that produce magnitude and phase directly, but they run more slowly than FFTs that produces a complex (or two real) vector result. There also exist FFT routines that compute only the magnitude and just throw away the phase information, but they usually run no faster than letting you do that yourself after a more general FFT. Maybe they save a coder a few lines of code at the cost of not being invertible. But a lot of libraries don't bother to include these slower and less general forms of FFT, and just let the coder convert or ignore what they need or don't need.
Plus, many consider the math involved to be a lot more elegant using complex arithmetic (where, for strictly real input, the cosine correlation or even component of an FFT result is put in the real component, and the sine correlation or odd component of the FFT result is put in the imaginary component of a complex number.)
(Added:) And, as yet another option, you can consider the two components of each FFT result bin, instead of as real and imaginary components, as even and odd components, both real.

If your FFT coefficient for a given frequency f is x + i y, you can look at x as the coefficient of a cosine at that frequency, while the y is the coefficient of the sine. If you add these two waves for a particular frequency, you will get a phase-shifted wave at that frequency; the magnitude of this wave is sqrt(x*x + y*y), equal to the magnitude of the complex coefficient.
The Discrete Cosine Transform (DCT) is a relative of the Fourier transform which yields all real coefficients. A two-dimensional DCT is used by many image/video compression algorithms.

The discrete Fourier transform is fundamentally a transformation from a vector of complex numbers in the "time domain" to a vector of complex numbers in the "frequency domain" (I use quotes because if you apply the right scaling factors, the DFT is its own inverse). If your inputs are real, then you can perform two DFTs at once: Take the input vectors x and y and calculate F(x + i y). I forget how you separate the DFT afterwards, but I suspect it's something about symmetry and complex conjugates.
The discrete cosine transform sort-of lets you represent the "frequency domain" with the reals, and is common in lossy compression algorithms (JPEG, MP3). The surprising thing (to me) is that it works even though it appears to discard phase information, but this also seems to make it less useful for most signal processing purposes (I'm not aware of an easy way to do convolution/correlation with a DCT).
I've probably gotten some details wrong ;)

The way you've phrased this question, I believe you are looking for a more intuitive way of thinking rather than a mathematical answer. I come from a mechanical engineering background and this is how I think about the Fourier transform. I contextualize the Fourier transform with reference to a pendulum. If we have only the x-velocity vs time of a pendulum and we are asked to estimate the energy of the pendulum (or the forcing source of the pendulum), the Fourier transform gives a complete answer. As usually what we are observing is only the x-velocity, we might conclude that the pendulum only needs to be provided energy equivalent to its sinusoidal variation of kinetic energy. But the pendulum also has potential energy. This energy is 90 degrees out of phase with the potential energy. So to keep track of the potential energy, we are simply keeping track of the 90 degree out of phase part of the (kinetic)real component. The imaginary part may be thought of as a 'potential velocity' that represents a manifestation of the potential energy that the source must provide to force the oscillatory behaviour. What is helpful is that this can be easily extended to the electrical context where capacitors and inductors also store the energy in 'potential form'. If the signal is not sinusoidal of course the transform is trying to decompose it into sinusoids. This I see as assuming that the final signal was generated by combined action of infinite sources each with a distinct sinusoid behaviour. What we are trying to determine is a strength and phase of each source that creates the final observed signal at each time instant.
PS: 1) The last two statements is generally how I think of the Fourier transform itself.
2) I say potential velocity rather the potential energy as the transform usually does not change dimensions of the original signal or physical quantity so it cannot shift from representing velocity to energy.

Short answer
Why does FFT produce complex numbers instead of real numbers?
The reason FT result is a complex array is a complex exponential multiplier is involved in the coefficients calculation. The final result is therefore complex. FT uses the multiplier to correlate the signal against multiple frequencies. The principle is detailed further down.
Is it not possible to represent frequency domain in terms of real numbers only?
Of course the 1D array of complex coefficients returned by FT could be represented by a 2D array of real values, which can be either the Cartesian coordinates x and y, or the polar coordinates r and θ (more here). However...
Complex exponential form is the most suitable form for signal processing
Having only real data is not so useful.
On one hand it is already possible to get these coordinates using one of the functions real, imag, abs and angle.
On the other hand such isolated information is of very limited interest. E.g. if we add two signals with the same amplitude and frequency, but in phase opposition, the result is zero. But if we discard the phase information, we just double the signal, which is totally wrong.
Contrary to a common belief, the use of complex numbers is not because such a number is a handy container which can hold two independent values. It's because processing periodic signals involves trigonometry all the time, and there is a simple way to move from sines and cosines to more simple complex numbers algebra: Euler's formula.
So most of the time signals are just converted to their complex exponential form. E.g. a signal with frequency 10 Hz, amplitude 3 and phase π/4 radians:
can be described by x = 3.ei(2π.10.t+π/4).
splitting the exponent: x = 3.ei.π/4 times ei.2π.10.t, t being the time.
The first number is a constant called the phasor. A common compact form is 3∠π/4. The second number is a time-dependent variable called the carrier.
This signal 3.ei.π/4 times ei.2π.10.t is easily plotted, either as a cosine (real part) or a sine (imaginary part):
from numpy import arange, pi, e, real, imag
t = arange(0, 0.2, 1/200)
x = 3 * e ** (1j*pi/4) * e ** (1j*2*pi*10*t)
ax1.stem(t, real(x))
ax2.stem(t, imag(x))
Now if we look at FT coefficients, we see they are phasors, they don't embed the frequency which is only dependent on the number of samples and the sampling frequency.
Actually if we want to plot a FT component in the time domain, we have to separately create the carrier from the frequency found, e.g. by calling fftfreq. With the phasor and the carrier we have the spectral component.
A phasor is a vector, and a vector can turn
Cartesian coordinates are extracted by using real and imag functions, the phasor used above, 3.e(i.π/4), is also the complex number 2.12 + 2.12j (i is j for scientists and engineers). These coordinates can be plotted on a plane with the vertical axis representing i (left):
This point can also represent a vector (center). Polar coordinates can be used in place of Cartesian coordinates (right). Polar coordinates are extracted by abs and angle. It's clear this vector can also represent the phasor 3∠π/4 (short form for 3.e(i.π/4))
This reminder about vectors is to introduce how phasors are manipulated. Say we have a real number of amplitude 1, which is not less than a complex which angle is 0 and also a phasor (x∠0). We also have a second phasor (3∠π/4), and we want the product of the two phasors. We could compute the result using Cartesian coordinates with some trigonometry, but this is painful. The easiest way is to use the complex exponential form:
we just add the angles and multiply the real coefficients: 1.e(i.0) times 3.e(i.π/4) = 1x3.ei(0+π/4) = 3.e(i.π/4)
we can just write: (1∠0) times (3∠π/4) = (3∠π/4).
Whatever, the result is this one:
The practical effect is to turn the real number and scale its magnitude. In FT, the real is the sample amplitude, and the multiplier magnitude is actually 1, so this corresponds to this operation, but the result is the same:
This long introduction was to explain the math behind FT.
How spectral coefficients are created by FT
FT principle is, for each spectral coefficient to compute:
to multiply each of the samples amplitudes by a different phasor, so that the angle is increasing from the first sample to the last,
to sum all the previous products.
If there are N samples xn (0 to N-1), there are N spectral coefficients Xk to compute. Calculation of coefficient Xk involves multiplying each sample amplitude xn by the phasor e-i2πkn/N and taking the sum, according to FT equation:
In the N individual products, the multiplier angle varies according to 2π.n/N and k, meaning the angle changes, ignoring k for now, from 0 to 2π. So while performing the products, we multiply a variable real amplitude by a phasor which magnitude is 1 and angle is going from 0 to a full round. We know this multiplication turns and scales the real amplitude:
Source: A. Dieckmann from Physikalisches Institut der Universität Bonn
Doing this summation is actually trying to correlate the signal samples to the phasor angular velocity, which is how fast its angle varies with n/N. The result tells how strong this correlation is (amplitude), and how much synchroneous it is (phase).
This operation is repeated for the k spectral coefficients to compute (half with k negative, half with k positive). As k changes, the angle increment also varies, so the correlation is checked against another frequency.
Conclusion
FT results are neither sines nor cosines, they are not waves, they are phasors describing a correlation. A phasor is a constant, expressed as a complex exponential, embedding both amplitude and phase. Multiplied by a carrier, which is also a complex exponential, but variable, dependent on time, they draw helices in time domain:
Source
When these helices are projected onto the horizontal plane, this is done by taking the real part of the FT result, the function drawn is the cosine. When projected onto the vertical plane, which is done by taking the imaginary part of the FT result, the function drawn is the sine. The phase determines at which angle the helix starts and therefore without the phase, the signal cannot be reconstructed using an inverse FT.
The complex exponential multiplier is a tool to transform the linear velocity of amplitude variations into angular velocity, which is frequency times 2π. All that revolves around Euler's formula linking sinusoid and complex exponential.

For a signal with only cosine waves, fourier transform, aka. FFT produces completely real output. For a signal composed of only sine waves, it produces completely imaginary output. A phase shift in any of the signals will result in a mix of real and complex. Complex numbers (in this context) are merely another way to store phase and amplitude.

Related

How to compute Discrete Fourier Transform?

I've been trying to find some places to help me better understand DFT and how to compute it but to no avail. So I need help understanding DFT and it's computation of complex numbers.
Basically, I'm just looking for examples on how to compute DFT with an explanation on how it was computed because in the end, I'm looking to create an algorithm to compute it.
I assume 1D DFT/IDFT ...
All DFT's use this formula:
X(k) is transformed sample value (complex domain)
x(n) is input data sample value (real or complex domain)
N is number of samples/values in your dataset
This whole thing is usually multiplied by normalization constant c. As you can see for single value you need N computations so for all samples it is O(N^2) which is slow.
Here mine Real<->Complex domain DFT/IDFT in C++ you can find also hints on how to compute 2D transform with 1D transforms and how to compute N-point DCT,IDCT by N-point DFT,IDFT there.
Fast algorithms
There are fast algorithms out there based on splitting this equation to odd and even parts of the sum separately (which gives 2x N/2 sums) which is also O(N) per single value, but the 2 halves are the same equations +/- some constant tweak. So one half can be computed from the first one directly. This leads to O(N/2) per single value. if you apply this recursively then you get O(log(N)) per single value. So the whole thing became O(N.log(N)) which is awesome but also adds this restrictions:
All DFFT's need the input dataset is of size equal to power of two !!!
So it can be recursively split. Zero padding to nearest bigger power of 2 is used for invalid dataset sizes (in audio tech sometimes even phase shift). Look here:
mine Complex->Complex domain DFT,DFFT in C++
some hints on constructing FFT like algorithms
Complex numbers
c = a + i*b
c is complex number
a is its real part (Re)
b is its imaginary part (Im)
i*i=-1 is imaginary unit
so the computation is like this
addition:
c0+c1=(a0+i.b0)+(a1+i.b1)=(a0+a1)+i.(b0+b1)
multiplication:
c0*c1=(a0+i.b0)*(a1+i.b1)
=a0.a1+i.a0.b1+i.b0.a1+i.i.b0.b1
=(a0.a1-b0.b1)+i.(a0.b1+b0.a1)
polar form
a = r.cos(θ)
b = r.sin(θ)
r = sqrt(a.a + b.b)
θ = atan2(b,a)
a+i.b = r|θ
sqrt
sqrt(r|θ) = (+/-)sqrt(r)|(θ/2)
sqrt(r.(cos(θ)+i.sin(θ))) = (+/-)sqrt(r).(cos(θ/2)+i.sin(θ/2))
real -> complex conversion:
complex = real+i.0
[notes]
do not forget that you need to convert data to different array (not in place)
normalization constant on FFT recursion is tricky (usually something like /=log2(N) depends also on the recursion stopping condition)
do not forget to stop the recursion if N=1 or 2 ...
beware FPU can overflow on big datasets (N is big)
here some insights to DFT/DFFT
here 2D FFT and wrapping example
usually Euler's formula is used to compute e^(i.x)=cos(x)+i.sin(x)
here How do I obtain the frequencies of each value in an FFT?
you find how to obtain the Niquist frequencies
[edit1] Also I strongly recommend to see this amazing video (I just found):
But what is the Fourier Transform A visual introduction
It describes the (D)FT in geometric representation. I would change some minor stuff in it but still its amazingly simple to understand.

Epanechnikov multivariate density

I have data which consists of vectors of size 1x5, each representing a pikel: [x,y,r,g,b], x and y are the position:0 <= x <= M, 0 <= y <= N. r,g,b are the colors of the pixel: 0 <= r,g,b <= 255.
I want to estimate density estimation using the multivariate Epanechnikov kernel. I read that there are 2 ways to basically do that:
Multiplicative method - calculate the kernel for each dimension and then multiply them.
Calculate the norm of the vector and calculate the kernel for that value.
How exactly would each of the two methods work with my data? What do I need to normalize knowing that the Epanechnikov kernel yields 0 for normalized values > 1 or < -1.
I am programming in C++.
Multiplicative method - calculate the kernel for each dimension and then multiply them.
Calculate the norm of the vector and calculate the kernel for that value.
assumes that your x variable and y are statistically independent, which does not hold for 2. On the other hand, 2. is a radially symmetric kernel.
How exactly would each of the two methods work with my data?
I would try both and see which one gives a better result (e.g. which one gives a better likelihood on the data but taking care not to overfit the data e.g. by using cross validation).
In its most basic form this means that you split your sample, use one part to calculate the density estimation function (i.e. place kernels around data points) and evaluate the likelihood on the other part (product of the values of the density estimation function at the points used for testing or better the log of the product of probabilities) and see which one gives the higher probability product on the 'other' sample (the one NOT used for calculating the estimate).
The same argument (cross validation) also applies to the choice of the width of the kernel ('scaling factor', make the kernel narrow or broad).
You can of course just select a kernel width by hand to start with. Choosing the kernel width too small will give a 'spiky' density estimate, choosing it too large will 'wash out' the important features of your data.
What do I need to normalize knowing that the Epanechnikov kernel yields 0 for normalized values > 1 or < -1.
The feature you mention is not related to the normalization. You should use a normalized expression for the kernel itself, i.e. the integral over the range where the kernel is non-zero should be one. For your case 1., if the 1D kernels are normalized (which is the case for example for 3/4*(1-u^2) on [-1..1], also the 2D product will be normalized. For the case 2. one has to calculate the 2D integral.
Assuming the kernel is normalized, you then can normalize the density estimate as follows:
where N is the number of data points. This will be normalized, i.e. the integral of p(x,y) over the 2D plane is one.
Note that neither of the functional forms you mentioned allow arbitrary covariance matrices. One way to work around this is to first 'decorrelate' the dataset (i.e. apply a matrix transformation such that the covariance matrix of the dataset becomes the unit matrix), then perform the density estimate and then apply the inverse transformation.
Also there are extensions such as adaptive kernel density estimation where the width of the kernel varies itself as function of x and y if at some point you want to refine your estimate etc.

Discrete Cosine Transformation formula disparity

Well, I was programming something that required the use of DCT. I found 2 resources for the DCT formula:
Mathworks
Wikipedia
Initially I used the wikipedia version of DCT-II. In the DCT-II section of wiki page, it is written that some authors further multiply the X0 term by 1/√2 and multiply the resulting matrix by an overall scale factor, which makes the DCT-II matrix orthogonal, but breaks the direct correspondence with a real-even DFT of half-shifted input. And the mathworks site does this only.
What is this property being talked about?
I beleive that they are trying to say that that are concerned about making the DCT-II transform matrix a unitary matrix. It is nice from a signal processing standpoint to have a unitary matrix because when we transform the signal back to its original domain, we are not adding any more power into the signal.
However, the 1-D DFT:
can be rewritten in terms of sines and consies (using Euler's Identity). If the input is a real-even signal, the even terms of the DFT will correspond to the terms of the DCT. Some people like to simplify their algorithms by simply taking the DFT of a signal, and only concentrating on the even terms.

Algorithm to Match Time Dependent (1D) Signals

I was wondering if someone could point me to an algorithm/technique that is used to compare time dependent signals. Ideally, this hypothetical algorithm would take in 2 signals as inputs and return a number that would be the percentage similarity between the signals (0 being that the 2 signals are statistically unrelated and 1 being that they are a perfect match).
Of course, I realize that there are problems with my request, namely that I'm not sure how to properly define 'similarity' in the context of comparing these 2 signals, so if someone could also point me in the right direction (as to what I should look up/know, etc.), I'd appreciate it as well.
The cross-correlation function is the classic signal processing solution. If you have access to Matlab, see the XCORR function. max(abs(xcorr(Signal1, Signal2, 'coeff'))) would give you specifically what you're looking for and an equivalent exists in Python as well.
Cross-correlation assumes that the "similarity" you're looking for is a measure of the linear relationship between the two signals. The definition for real-valued finite-length signals with time index n = 0..N-1 is:
C[g] = sum{m = 0..N-1} (x1[m] * x2[g+m])
g runs from -N..N (outside that range the product inside the sum is 0).
Although you asked for a number, the function is pretty interesting. The function domain g is called the lag domain.
If x1 and x2 are related by a time shift, the cross-correlation function will have its peak at the lag corresponding to the shift. For instance, if you had x1 = sin[wn] and x2 = sin[wn + phi], so two sine waves at the same frequency and different phase, the cross-correlation function would have its peak at the lag corresponding to the phase shift.
If x2 is a scaled version of x1, the cross-correlation will scale also. You can normalize the function to a correlation coefficient by dividing by sqrt(sum(x1^2)*sum(x2^2)), and bring it into 0..1 by taking an absolute value (that line of Matlab has these operations).
More generally, below is a summary of what cross-correlation is good/bad for.
Cross-correlation works well for determining if one signal is linearly related to another, that is if
x2(t) = sum{n = 0..K-1}(A_n * x1(t + phi_n))
where x1(t) and x2(t) are the signals in question, A_n are scaling factors, and phi_n are time shifts. The implications of this are:
If one signal is a time shifted version of the other (phi_n <> 0 for some n) the cross-correlation function will be non-zero.
If one signal is a scaled version of the other (A_n <> 0 for some n) the cross-correlation function will be non-zero.
If one signal is a combination of scaled and time shifted versions of the other (both A_n and phi_n are non-zero for some number of n's) the cross-correlation function will be non-zero. Note that this is also a definition of a linear filter.
To get more concrete, suppose x1 is a wideband random signal. Let x2=x1. Now the normalized cross-correlation function will be exactly 1 at g=0, and near 0 everywhere else. Now let x2 be a (linearly) filtered version of x1. The cross-correlation function will be non-zero near g=0. The width of the non-zero part will depend on the bandwidth of the filter.
For the special case of x1 and x2 being periodic, the information on the phase-shift in the original part of the answer applies.
Where cross-correlation will not help is if the two signals are not linearly related. For instance, two periodic signals at different frequencies are not linearly related. Nor are two random signals drawn from a wideband random process at different times. Nor are two signals that are similar in shape but with different time indexing - this is like the unequal fundamental frequency case.
In all cases, normalizing the cross-correlation function and looking at the maximum value will tell you if the signals are potentially linearly related - if the number is low, like under 0.1, I would be comfortable declaring them unrelated. Higher than that and I'd look into it more carefully, graphing both the normalized and unnormalized cross-correlation functions and looking at the structure. A periodic cross-correlation implies both signals are periodic, and a cross-correlation function that is noticeably higher around g=0 implies one signal is a filtered version of the other.
You could try a Fast Fourier Transform (look up FFT in Wikipedia, there are open source libraries for performing conversions).
FFTs will transform your data from time domain (i.e. a pulse at 1s, 2s, 3s, 4s...) to data in frequency domain (i.e. a pulse each second).
Then you can compare frequencies and their relative strenghts more easily. It should be a step in the right direction for you.
General solution: you can bin the data into histograms and use a Chi-squared test or a Kolomogrov test.
Both are explicitly intended to estimate the chances that the two distribution represent random samples from the same underling distribution (that is: have the same shape to within statistics).
I don't know a c implementation off the top of my head, but ROOT provides c++ implementation of both:
TH1::Chi2Test
TH1:KolmogorovTest
I believe the docs point to some papers as well.
I think that CERNLIB provides both algorithms in fortran77, which you can link to c. Translating the ROOT code might be easier.
Dynamic Time Warping is an approach you can use if the signals should be matched by speeding up and slowing down time at different positions.
You don't say very much about what the signals are, and what measure of "sameness" would be meaningful to you. But, if the signals are in-phase (that is, you want to compare the two signals instant-by-instant, and there's not going to be any time-delay to consider) then I'd suggest you look at Pearson's correlator. It gives you a value of 1 if the two signals are identical, a value of 0 if they're entirely dissimilar, and something in between if they kinda rhyme. As an added advantage, Pearson's doesn't care if the signals are amplified differently, (except if one signal is the inverse of the other it gives you a result of -1).
Does that sound like what you're looking for?
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
I don't know signal processing, so this is a guess ..:
Is your signal effectively a list of ordered pairs (x,y), where x is the time and y the amplitude? If so, then perhaps you could throw away then time coordinate -- e.g.:
Signal 1: [(x0,y0), (x1,y1), (x2,y2), (x3,y3), ...]
Signal 2: [(x0,z0), (x1,z1), (x2,z1), (x3,z3), ...]
Throw away time:
Signal 1: [y0, y1, y2, y3, ...]
Signal 2: [z0, z1, z2, z3, ...]
Then you can compare the amplitudes with each other, perhaps by looking for a correlation. Perhaps you could plot y against z:
Comparing: [(y0,z0), (y1,z1), (y2,z2), (y3,z3), ...]
Or calculate one of the various correlation coefficients.

Determining if a dataset approximates a sine wave

Is there an algorithm that can be used to determine whether a sample of data taken at fixed time intervals approximates a sine wave?
Take the fourier transform which transforms the data into a frequency table (search for fft, fast fourier transformation, for an implementation. For example, FFTW). If it is a sinus or cosinus, the frequency table will contain one very high value corresponding to the frequency you're searching for and some noise at other frequencies.
Alternatively, match several sinussen at several frequencies and try to match them using cross correlation: the sum of squares of the differences between your signal and the sinus you're trying to fit. You would need to do this for sinussen at a range of frequencies of course. And you would need to do this while translating the sinus along the x-axis to find the phase.
You can calculate the fourier transform and look for a single spike. That would tell you that the dataset approximates a sine curve at that frequency.
Shot into the blue: You could take advantage of the fact that the integral of a*sin(t) is a*cos(t). Keeping track of min/max of your data should allow you to know a.
Check the least squares method.
#CookieOfFortune: I agree, but the Fourier series fit is optimal in the least squares sense (as is said on the Wikipedia article).
If you want to play around first with own input data, check the Discrete Fourier Transformation (DFT) on Wolfram Alpha. As noted before, if you want a fast implementation you should check out one of several FFT-libraries.

Resources