Discrete Cosine Transformation formula disparity

Discrete Cosine Transformation formula disparity - algorithm

Well, I was programming something that required the use of DCT. I found 2 resources for the DCT formula:
Mathworks
Wikipedia
Initially I used the wikipedia version of DCT-II. In the DCT-II section of wiki page, it is written that some authors further multiply the X0 term by 1/√2 and multiply the resulting matrix by an overall scale factor, which makes the DCT-II matrix orthogonal, but breaks the direct correspondence with a real-even DFT of half-shifted input. And the mathworks site does this only.
What is this property being talked about?

I beleive that they are trying to say that that are concerned about making the DCT-II transform matrix a unitary matrix. It is nice from a signal processing standpoint to have a unitary matrix because when we transform the signal back to its original domain, we are not adding any more power into the signal.
However, the 1-D DFT:
can be rewritten in terms of sines and consies (using Euler's Identity). If the input is a real-even signal, the even terms of the DFT will correspond to the terms of the DCT. Some people like to simplify their algorithms by simply taking the DFT of a signal, and only concentrating on the even terms.

Related

Generate random numbers with specific Fourier spectrum

I have a set of random numbers that will be used for a simulation. However, I need this numbers to have a specific Fourier spectrum (that looks similar to the real data I have) but without changing the phase of the random numbers.
Does anyone have any idea on how I can use the amplitude of the Fourier transform of the real data to generate approximately similar Fourier spectrum for the random numbers?
What I thought of doing is:
Take the Fourier transform of the real data.
Multiply the spectrum (|F(w)|) of the real data by the Fourier transform of the random numbers.
Calculate the inverse Fourier transform of the multiplied signal to get the random numbers.
Would this approach work well?
What would be the effect on the phase angle (if any)?
Any suggestions on different ways to do that are welcome.

Your question is a classic one, since many people want to generate random numbers with a specific power spectral density. In my case, I was simulating random rough surfaces. I wrote a paper discussing how to do this:
Chris A. Mack, "Generating random rough edges, surfaces, and volumes", Applied Optics, Vol. 52, No. 7 (1 March 2013) pp. 1472-1480.
A copy of this paper can be found on my website (paper #178):
http://www.lithoguru.com/scientist/papers.html

How to compute Discrete Fourier Transform?

I've been trying to find some places to help me better understand DFT and how to compute it but to no avail. So I need help understanding DFT and it's computation of complex numbers.
Basically, I'm just looking for examples on how to compute DFT with an explanation on how it was computed because in the end, I'm looking to create an algorithm to compute it.

I assume 1D DFT/IDFT ...
All DFT's use this formula:
X(k) is transformed sample value (complex domain)
x(n) is input data sample value (real or complex domain)
N is number of samples/values in your dataset
This whole thing is usually multiplied by normalization constant c. As you can see for single value you need N computations so for all samples it is O(N^2) which is slow.
Here mine Real<->Complex domain DFT/IDFT in C++ you can find also hints on how to compute 2D transform with 1D transforms and how to compute N-point DCT,IDCT by N-point DFT,IDFT there.
Fast algorithms
There are fast algorithms out there based on splitting this equation to odd and even parts of the sum separately (which gives 2x N/2 sums) which is also O(N) per single value, but the 2 halves are the same equations +/- some constant tweak. So one half can be computed from the first one directly. This leads to O(N/2) per single value. if you apply this recursively then you get O(log(N)) per single value. So the whole thing became O(N.log(N)) which is awesome but also adds this restrictions:
All DFFT's need the input dataset is of size equal to power of two !!!
So it can be recursively split. Zero padding to nearest bigger power of 2 is used for invalid dataset sizes (in audio tech sometimes even phase shift). Look here:
mine Complex->Complex domain DFT,DFFT in C++
some hints on constructing FFT like algorithms
Complex numbers
c = a + i*b
c is complex number
a is its real part (Re)
b is its imaginary part (Im)
i*i=-1 is imaginary unit
so the computation is like this
addition:
c0+c1=(a0+i.b0)+(a1+i.b1)=(a0+a1)+i.(b0+b1)
multiplication:
c0*c1=(a0+i.b0)*(a1+i.b1)
=a0.a1+i.a0.b1+i.b0.a1+i.i.b0.b1
=(a0.a1-b0.b1)+i.(a0.b1+b0.a1)
polar form
a = r.cos(θ)
b = r.sin(θ)
r = sqrt(a.a + b.b)
θ = atan2(b,a)
a+i.b = r|θ
sqrt
sqrt(r|θ) = (+/-)sqrt(r)|(θ/2)
sqrt(r.(cos(θ)+i.sin(θ))) = (+/-)sqrt(r).(cos(θ/2)+i.sin(θ/2))
real -> complex conversion:
complex = real+i.0
[notes]
do not forget that you need to convert data to different array (not in place)
normalization constant on FFT recursion is tricky (usually something like /=log2(N) depends also on the recursion stopping condition)
do not forget to stop the recursion if N=1 or 2 ...
beware FPU can overflow on big datasets (N is big)
here some insights to DFT/DFFT
here 2D FFT and wrapping example
usually Euler's formula is used to compute e^(i.x)=cos(x)+i.sin(x)
here How do I obtain the frequencies of each value in an FFT?
you find how to obtain the Niquist frequencies
[edit1] Also I strongly recommend to see this amazing video (I just found):
But what is the Fourier Transform A visual introduction
It describes the (D)FT in geometric representation. I would change some minor stuff in it but still its amazingly simple to understand.

Why does FFT produce complex numbers instead of real numbers?

All the FFT implementations we have come across result in complex values (with real and imaginary parts), even if the input to the algorithm was a discrete set of real numbers (integers).
Is it not possible to represent frequency domain in terms of real numbers only?

The FFT is fundamentally a change of basis. The basis into which the FFT changes your original signal is a set of sine waves instead. In order for that basis to describe all the possible inputs it needs to be able to represent phase as well as amplitude; the phase is represented using complex numbers.
For example, suppose you FFT a signal containing only a single sine wave. Depending on phase you might well get an entirely real FFT result. But if you shift the phase of your input a few degrees, how else can the FFT output represent that input?
edit: This is a somewhat loose explanation, but I'm just trying to motivate the intuition.

The FFT provides you with amplitude and phase. The amplitude is encoded as the magnitude of the complex number (sqrt(x^2+y^2)) while the phase is encoded as the angle (atan2(y,x)). To have a strictly real result from the FFT, the incoming signal must have even symmetry (i.e. x[n]=conj(x[N-n])).
If all you care about is intensity, the magnitude of the complex number is sufficient for analysis.

Yes, it is possible to represent the FFT frequency domain results of strictly real input using only real numbers.
Those complex numbers in the FFT result are simply just 2 real numbers, which are both required to give you the 2D coordinates of a result vector that has both a length and a direction angle (or magnitude and a phase). And every frequency component in the FFT result can have a unique amplitude and a unique phase (relative to some point in the FFT aperture).
One real number alone can't represent both magnitude and phase. If you throw away the phase information, that could easily massively distort the signal if you try to recreate it using an iFFT (and the signal isn't symmetric). So a complete FFT result requires 2 real numbers per FFT bin. These 2 real numbers are bundled together in some FFTs in a complex data type by common convention, but the FFT result could easily (and some FFTs do) just produce 2 real vectors (one for cosine coordinates and one for sine coordinates).
There are also FFT routines that produce magnitude and phase directly, but they run more slowly than FFTs that produces a complex (or two real) vector result. There also exist FFT routines that compute only the magnitude and just throw away the phase information, but they usually run no faster than letting you do that yourself after a more general FFT. Maybe they save a coder a few lines of code at the cost of not being invertible. But a lot of libraries don't bother to include these slower and less general forms of FFT, and just let the coder convert or ignore what they need or don't need.
Plus, many consider the math involved to be a lot more elegant using complex arithmetic (where, for strictly real input, the cosine correlation or even component of an FFT result is put in the real component, and the sine correlation or odd component of the FFT result is put in the imaginary component of a complex number.)
(Added:) And, as yet another option, you can consider the two components of each FFT result bin, instead of as real and imaginary components, as even and odd components, both real.

If your FFT coefficient for a given frequency f is x + i y, you can look at x as the coefficient of a cosine at that frequency, while the y is the coefficient of the sine. If you add these two waves for a particular frequency, you will get a phase-shifted wave at that frequency; the magnitude of this wave is sqrt(x*x + y*y), equal to the magnitude of the complex coefficient.
The Discrete Cosine Transform (DCT) is a relative of the Fourier transform which yields all real coefficients. A two-dimensional DCT is used by many image/video compression algorithms.

The discrete Fourier transform is fundamentally a transformation from a vector of complex numbers in the "time domain" to a vector of complex numbers in the "frequency domain" (I use quotes because if you apply the right scaling factors, the DFT is its own inverse). If your inputs are real, then you can perform two DFTs at once: Take the input vectors x and y and calculate F(x + i y). I forget how you separate the DFT afterwards, but I suspect it's something about symmetry and complex conjugates.
The discrete cosine transform sort-of lets you represent the "frequency domain" with the reals, and is common in lossy compression algorithms (JPEG, MP3). The surprising thing (to me) is that it works even though it appears to discard phase information, but this also seems to make it less useful for most signal processing purposes (I'm not aware of an easy way to do convolution/correlation with a DCT).
I've probably gotten some details wrong ;)

The way you've phrased this question, I believe you are looking for a more intuitive way of thinking rather than a mathematical answer. I come from a mechanical engineering background and this is how I think about the Fourier transform. I contextualize the Fourier transform with reference to a pendulum. If we have only the x-velocity vs time of a pendulum and we are asked to estimate the energy of the pendulum (or the forcing source of the pendulum), the Fourier transform gives a complete answer. As usually what we are observing is only the x-velocity, we might conclude that the pendulum only needs to be provided energy equivalent to its sinusoidal variation of kinetic energy. But the pendulum also has potential energy. This energy is 90 degrees out of phase with the potential energy. So to keep track of the potential energy, we are simply keeping track of the 90 degree out of phase part of the (kinetic)real component. The imaginary part may be thought of as a 'potential velocity' that represents a manifestation of the potential energy that the source must provide to force the oscillatory behaviour. What is helpful is that this can be easily extended to the electrical context where capacitors and inductors also store the energy in 'potential form'. If the signal is not sinusoidal of course the transform is trying to decompose it into sinusoids. This I see as assuming that the final signal was generated by combined action of infinite sources each with a distinct sinusoid behaviour. What we are trying to determine is a strength and phase of each source that creates the final observed signal at each time instant.
PS: 1) The last two statements is generally how I think of the Fourier transform itself.
2) I say potential velocity rather the potential energy as the transform usually does not change dimensions of the original signal or physical quantity so it cannot shift from representing velocity to energy.

Short answer
Why does FFT produce complex numbers instead of real numbers?
The reason FT result is a complex array is a complex exponential multiplier is involved in the coefficients calculation. The final result is therefore complex. FT uses the multiplier to correlate the signal against multiple frequencies. The principle is detailed further down.
Is it not possible to represent frequency domain in terms of real numbers only?
Of course the 1D array of complex coefficients returned by FT could be represented by a 2D array of real values, which can be either the Cartesian coordinates x and y, or the polar coordinates r and θ (more here). However...
Complex exponential form is the most suitable form for signal processing
Having only real data is not so useful.
On one hand it is already possible to get these coordinates using one of the functions real, imag, abs and angle.
On the other hand such isolated information is of very limited interest. E.g. if we add two signals with the same amplitude and frequency, but in phase opposition, the result is zero. But if we discard the phase information, we just double the signal, which is totally wrong.
Contrary to a common belief, the use of complex numbers is not because such a number is a handy container which can hold two independent values. It's because processing periodic signals involves trigonometry all the time, and there is a simple way to move from sines and cosines to more simple complex numbers algebra: Euler's formula.
So most of the time signals are just converted to their complex exponential form. E.g. a signal with frequency 10 Hz, amplitude 3 and phase π/4 radians:
can be described by x = 3.ei(2π.10.t+π/4).
splitting the exponent: x = 3.ei.π/4 times ei.2π.10.t, t being the time.
The first number is a constant called the phasor. A common compact form is 3∠π/4. The second number is a time-dependent variable called the carrier.
This signal 3.ei.π/4 times ei.2π.10.t is easily plotted, either as a cosine (real part) or a sine (imaginary part):
from numpy import arange, pi, e, real, imag
t = arange(0, 0.2, 1/200)
x = 3 * e ** (1j*pi/4) * e ** (1j*2*pi*10*t)
ax1.stem(t, real(x))
ax2.stem(t, imag(x))
Now if we look at FT coefficients, we see they are phasors, they don't embed the frequency which is only dependent on the number of samples and the sampling frequency.
Actually if we want to plot a FT component in the time domain, we have to separately create the carrier from the frequency found, e.g. by calling fftfreq. With the phasor and the carrier we have the spectral component.
A phasor is a vector, and a vector can turn
Cartesian coordinates are extracted by using real and imag functions, the phasor used above, 3.e(i.π/4), is also the complex number 2.12 + 2.12j (i is j for scientists and engineers). These coordinates can be plotted on a plane with the vertical axis representing i (left):
This point can also represent a vector (center). Polar coordinates can be used in place of Cartesian coordinates (right). Polar coordinates are extracted by abs and angle. It's clear this vector can also represent the phasor 3∠π/4 (short form for 3.e(i.π/4))
This reminder about vectors is to introduce how phasors are manipulated. Say we have a real number of amplitude 1, which is not less than a complex which angle is 0 and also a phasor (x∠0). We also have a second phasor (3∠π/4), and we want the product of the two phasors. We could compute the result using Cartesian coordinates with some trigonometry, but this is painful. The easiest way is to use the complex exponential form:
we just add the angles and multiply the real coefficients: 1.e(i.0) times 3.e(i.π/4) = 1x3.ei(0+π/4) = 3.e(i.π/4)
we can just write: (1∠0) times (3∠π/4) = (3∠π/4).
Whatever, the result is this one:
The practical effect is to turn the real number and scale its magnitude. In FT, the real is the sample amplitude, and the multiplier magnitude is actually 1, so this corresponds to this operation, but the result is the same:
This long introduction was to explain the math behind FT.
How spectral coefficients are created by FT
FT principle is, for each spectral coefficient to compute:
to multiply each of the samples amplitudes by a different phasor, so that the angle is increasing from the first sample to the last,
to sum all the previous products.
If there are N samples xn (0 to N-1), there are N spectral coefficients Xk to compute. Calculation of coefficient Xk involves multiplying each sample amplitude xn by the phasor e-i2πkn/N and taking the sum, according to FT equation:
In the N individual products, the multiplier angle varies according to 2π.n/N and k, meaning the angle changes, ignoring k for now, from 0 to 2π. So while performing the products, we multiply a variable real amplitude by a phasor which magnitude is 1 and angle is going from 0 to a full round. We know this multiplication turns and scales the real amplitude:
Source: A. Dieckmann from Physikalisches Institut der Universität Bonn
Doing this summation is actually trying to correlate the signal samples to the phasor angular velocity, which is how fast its angle varies with n/N. The result tells how strong this correlation is (amplitude), and how much synchroneous it is (phase).
This operation is repeated for the k spectral coefficients to compute (half with k negative, half with k positive). As k changes, the angle increment also varies, so the correlation is checked against another frequency.
Conclusion
FT results are neither sines nor cosines, they are not waves, they are phasors describing a correlation. A phasor is a constant, expressed as a complex exponential, embedding both amplitude and phase. Multiplied by a carrier, which is also a complex exponential, but variable, dependent on time, they draw helices in time domain:
Source
When these helices are projected onto the horizontal plane, this is done by taking the real part of the FT result, the function drawn is the cosine. When projected onto the vertical plane, which is done by taking the imaginary part of the FT result, the function drawn is the sine. The phase determines at which angle the helix starts and therefore without the phase, the signal cannot be reconstructed using an inverse FT.
The complex exponential multiplier is a tool to transform the linear velocity of amplitude variations into angular velocity, which is frequency times 2π. All that revolves around Euler's formula linking sinusoid and complex exponential.

For a signal with only cosine waves, fourier transform, aka. FFT produces completely real output. For a signal composed of only sine waves, it produces completely imaginary output. A phase shift in any of the signals will result in a mix of real and complex. Complex numbers (in this context) are merely another way to store phase and amplitude.

Help me understand linear separability in a binary SVM

I'm cross-posting this from math.stackexchange.com because I'm not getting any feedback and it's a time-sensitive question for me.
My question pertains to linear separability with hyperplanes in a support vector machine.
According to Wikipedia:
...formally, a support vector machine
constructs a hyperplane or set of
hyperplanes in a high or infinite
dimensional space, which can be used
for classification, regression or
other tasks. Intuitively, a good
separation is achieved by the
hyperplane that has the largest
distance to the nearest training data
points of any class (so-called
functional margin), since in general
the larger the margin the lower the
generalization error of the
classifier.classifier.
The linear separation of classes by hyperplanes intuitively makes sense to me. And I think I understand linear separability for two-dimensional geometry. However, I'm implementing an SVM using a popular SVM library (libSVM) and when messing around with the numbers, I fail to understand how an SVM can create a curve between classes, or enclose central points in category 1 within a circular curve when surrounded by points in category 2 if a hyperplane in an n-dimensional space V is a "flat" subset of dimension n − 1, or for two-dimensional space - a 1D line.
Here is what I mean:
That's not a hyperplane. That's circular. How does this work? Or are there more dimensions inside the SVM than the two-dimensional 2D input features?
This example application can be downloaded here.
Edit:
Thanks for your comprehensive answers. So the SVM can separate weird data well by using a kernel function. Would it help to linearize the data before sending it to the SVM? For example, one of my input features (a numeric value) has a turning point (eg. 0) where it neatly fits into category 1, but above and below zero it fits into category 2. Now, because I know this, would it help classification to send the absolute value of this feature for the SVM?

As mokus explained, support vector machines use a kernel function to implicitly map data into a feature space where they are linearly separable:
Different kernel functions are used for various kinds of data. Note that an extra dimension (feature) is added by the transformation in the picture, although this feature is never materialized in memory.
(Illustration from Chris Thornton, U. Sussex.)

Check out this YouTube video that illustrates an example of linearly inseparable points that become separable by a plane when mapped to a higher dimension.

I am not intimately familiar with SVMs, but from what I recall from my studies they are often used with a "kernel function" - essentially, a replacement for the standard inner product that effectively non-linearizes the space. It's loosely equivalent to applying a nonlinear transformation from your space into some "working space" where the linear classifier is applied, and then pulling the results back into your original space, where the linear subspaces the classifier works with are no longer linear.
The wikipedia article does mention this in the subsection "Non-linear classification", with a link to http://en.wikipedia.org/wiki/Kernel_trick which explains the technique more generally.

This is done by applying what is know as a [Kernel Trick] (http://en.wikipedia.org/wiki/Kernel_trick)
What basically is done is that if something is not linearly separable in the existing input space ( 2-D in your case), it is projected to a higher dimension where this would be separable. A kernel function ( can be non-linear) is applied to modify your feature space. All computations are then performed in this feature space (which can be possibly of infinite dimensions too).
Each point in your input is transformed using this kernel function, and all further computations are performed as if this was your original input space. Thus, your points may be separable in a higher dimension (possibly infinite) and thus the linear hyperplane in higher dimensions might not be linear in the original dimensions.
For a simple example, consider the example of XOR. If you plot Input1 on X-Axis, and Input2 on Y-Axis, then the output classes will be:
Class 0: (0,0), (1,1)
Class 1: (0,1), (1,0)
As you can observe, its not linearly seperable in 2-D. But if I take these ordered pairs in 3-D, (by just moving 1 point in 3-D) say:
Class 0: (0,0,1), (1,1,0)
Class 1: (0,1,0), (1,0,0)
Now you can easily observe that there is a plane in 3-D to separate these two classes linearly.
Thus if you project your inputs to a sufficiently large dimension (possibly infinite), then you'll be able to separate your classes linearly in that dimension.
One important point to notice here (and maybe I'll answer your other question too) is that you don't have to make a kernel function yourself (like I made one above). The good thing is that the kernel function automatically takes care of your input and figures out how to "linearize" it.

For the SVM example in the question given in 2-D space let x1, x2 be the two axes. You can have a transformation function F = x1^2 + x2^2 and transform this problem into a 1-D space problem. If you notice carefully you could see that in the transformed space, you can easily linearly separate the points(thresholds on F axis). Here the transformed space was [ F ] ( 1 dimensional ) . In most cases , you would be increasing the dimensionality to get linearly separable hyperplanes.

SVM clustering
HTH

My answer to a previous question might shed some light on what is happening in this case. The example I give is very contrived and not really what happens in an SVM, but it should give you come intuition.

Determining if a dataset approximates a sine wave

Is there an algorithm that can be used to determine whether a sample of data taken at fixed time intervals approximates a sine wave?

Take the fourier transform which transforms the data into a frequency table (search for fft, fast fourier transformation, for an implementation. For example, FFTW). If it is a sinus or cosinus, the frequency table will contain one very high value corresponding to the frequency you're searching for and some noise at other frequencies.
Alternatively, match several sinussen at several frequencies and try to match them using cross correlation: the sum of squares of the differences between your signal and the sinus you're trying to fit. You would need to do this for sinussen at a range of frequencies of course. And you would need to do this while translating the sinus along the x-axis to find the phase.

You can calculate the fourier transform and look for a single spike. That would tell you that the dataset approximates a sine curve at that frequency.

Shot into the blue: You could take advantage of the fact that the integral of a*sin(t) is a*cos(t). Keeping track of min/max of your data should allow you to know a.

Check the least squares method.
#CookieOfFortune: I agree, but the Fourier series fit is optimal in the least squares sense (as is said on the Wikipedia article).
If you want to play around first with own input data, check the Discrete Fourier Transformation (DFT) on Wolfram Alpha. As noted before, if you want a fast implementation you should check out one of several FFT-libraries.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio