I have the task related to Radon transform which contains a subtask which uses resampling by means of DFT.
Let's consider the non-periodical discretized signal (Fig.1) (for example the string of pixels) having 515 pixels length. In my implementation for resampling contains following steps:
Cyclic left shift (Fig.2).
Add zeros to the center in order to the length of signal became 2^n (in our case 1024-515 = 509 zeros we must have add) (Fig. 3).
Get DFT from this signal (Fig. 4).
Cyclic right shift. (for shifting low frequencies to the center) (Fig.5)
Fig.1
Fig.2
Fig.3
Fig.4
Fig.5
The main question:
Why we must perform cyclic shift of the signal and add zeros exactly in the center? (I assumed what this made the signal periodic)
Zeropadding makes interpolation DFT spectrum, is it correct? (I asked and someone says what it is not quite so)
Maybe someone can explain in simple way what happens with signal after zeropadding.
I have made some experiments in a Matlab and found out that any other sequence of actions can not give required result.
Now let's consider two cases:
a) (THIS CORRECT VARIANT) We has the non-periodical discretized signal (for example the string of pixels) which will be cyclic shifted to left and filled zeros in the center after that will be obtained DFT from this and to shift it back.
b) We has the non-periodical discretized signal (for example the set strings of pixels) which will be filled zeros from left and right after that will be obtained DFT from this.
What the difference these DFT spectrums?
I have read some books but not found the answer of this zeropadding's case. It seems this can be found only by own experience.
Answer in book:
A. C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging, Society of Industrial and Applied Mathematics, 2001 on page 25
Zero-padding in the time domain corresponds to interpolation in the frequency domain.
Circular shifting in the time domain corresponds to a "phase twist" in the frequency domain; each bin has a complex rotation applied to it. I have no idea why you've been asked to do that in your application!
Shifting the data points and zero padding the exact center of the FFT aperture has the property that all the even (symmetric) signals in the original data window end up in the real component of the complex FFT result, and all the odd signals end up in the imaginary component. e.g. the evenness to oddness ratio is preserved, which allows phase to be interpolated. Being able to interpolate phase is important in the case of a zero-padded FFT because the zero-padding interpolates the spectral magnitude as well.
If you don't center the zero-padding, then the phase has to be "untwisted" in the frequency domain before any additional interpolation can produce reasonable phase results.
Related
I am doing point pattern analysis using the package spatstat and ran Ripley's K (spatstat::Kest) on my points to see if there is any clustering. However, it appears that not all the lines that should appear in the graph (kFem) have plotted. For example, the red line (Ktrans) stops at around x=12 and the green line (Kbord) doesn't appear at all. I would appreciate any insights as to how to interpret this and if there might be a bug.
Here is my study window. It is an irregular shape because I am analyzing a point pattern along a transect line.
And here is a density plot of my point pattern:
It is unlikely (but not impossible) that there is a simple bug in Kest that causes this, since this particular function has been tested intensively by many users. More likely you have a observation window that is irregular and there is a mathematical reason why the various estimates cannot be calculated at all distances. Please add a plot/summary of your point pattern so we have knowledge of the observation window (or even better give access to the observation window).
Furthermore, to manually inspect the estimates of the K-function you can convert the function value (fv) object to a data.frame and print it:
dat <- as.data.frame(kFem)
head(dat, n = 10)
Update:
Your window is indeed very irregular and the explanation of why it is not producing some corrections at large distances. I guess your transect is only a few metres wide and you are considering distances up to 50m. The border correction can only be calculated for distances up to something like the half width of the transect.
Using Kest implies that you believe that your transect is a subset of a big homogeneous point process (of equal intensity everywhere and with same correlation structure throughout space). If that is true then Kest provides a sensible estimate of the unknown true homogeneous K-function. However, you provide a plot where you have divided the region into sections of high, medium and low intensity which doesn't agree with the assumption of homogeneity. The deviation from the theoretical Poisson line may just be due to inhomogeneous intensity and not actual correlation between points. You should probably only consider distances that are much smaller than 50 (you can set rmax when you call Kest).
I'm doing a video homework in which you program one of the famous video compression algorithms, I chose ARPS(adaptive rood pattern search).
Now if I understand it right I must first divide the image into macroblocks, I've already done that, second calculate the pmv(predicted motion vector) by taking the motion vector of the left neighboring macroblock(type D, there are other types in which you take the above or left-above etc, according to some paper they don't differ much in quality).
Last use pmv to calculate the mv of the current macroblock.
If I understand it correctly I have to calculate the first column of macroblocks using other algorithms(NTSS or FSS or etc) and then use that column to calculate the rest.
What will happen if my first column didn't move ? pmv=(0,0) and applying the algorithm as I understand it from wikipedia results in all mvs being (0,0) (aka first column didn't change=nothing changed !!!)
I doubt I understand the algorithm correctly and for some reason many papers don't address those issues, so can you shed some light on it ? I can implement it very well after that.
PS
this is a university homework and I'm at software-engineering department (not AI department) so no AI algorithms please .
As I know strength of ARPS algorithm is adaptive rood size, which base on predicted motion vector (mv). Rood is used as the first step of this method, and may dramatical reduce over estimation by jumping in the right place. Second step is to fine the best point(s) by small diamond or other fixed patterns you like.
So if you predict zero mv, you simple apply fine estimation (second step).
In practice, zero mv mean small metric value of estimated block (by SAD or other metric). Natural static pictures always have some deviation of adjacent (in temporal plane) samples and any metric produce some value. It is decision of you implementation marking this block by zero moving or with small motion vector.
About the column in you implementation. For first block in the non first row you may use mv of above block. that reduce you total calculations.
In any case, you may check is you changes of implementation good or bad, by apply motion compensation (inverse of motion estimation) and calculate with original picture by calculating the PSNR metric.
Wikipedia says you have no knowledge of what the first state is, so you have to assign each state equal probability in the prior state vector. But you do know what the transition probability matrix is, and the eigenvector that has an eigenvalue of 1 of that matrix is the frequency of each state in the HMM (i think), so why don't you go with that vector for the prior state vector instead?
This is really a modelling decision. Your suggestion is certainly possible, because it pretty much corresponds to prefixing the observations with a large stretch of observations where the hidden states are not observed at all or have no effect - this will give whatever the original states are time to settle down to the equilibrium distribution.
But if you have a stretch of observations with a delimited start, such as a segment of speech that starts when the speaker starts, or a segment of text that starts at the beginning of a sentence, there is no particular reason to believe that the distribution of the very first state is the same as the equilibrium distribution: I doubt very much if 'e' is the most common character at the start of a sentence, whereas it is well known to be the most common character in English text.
It may not matter very much what you choose, unless you have a lot of very short sequences of observations that you are processing together. Most of the time I would only worry if you wanted to set one of the state probabilities to zero, because the EM algorithm or Baum-Welch algorithm often used to optimise HMM parameters can be reluctant to re-estimate parameters away from zero.
All the FFT implementations we have come across result in complex values (with real and imaginary parts), even if the input to the algorithm was a discrete set of real numbers (integers).
Is it not possible to represent frequency domain in terms of real numbers only?
The FFT is fundamentally a change of basis. The basis into which the FFT changes your original signal is a set of sine waves instead. In order for that basis to describe all the possible inputs it needs to be able to represent phase as well as amplitude; the phase is represented using complex numbers.
For example, suppose you FFT a signal containing only a single sine wave. Depending on phase you might well get an entirely real FFT result. But if you shift the phase of your input a few degrees, how else can the FFT output represent that input?
edit: This is a somewhat loose explanation, but I'm just trying to motivate the intuition.
The FFT provides you with amplitude and phase. The amplitude is encoded as the magnitude of the complex number (sqrt(x^2+y^2)) while the phase is encoded as the angle (atan2(y,x)). To have a strictly real result from the FFT, the incoming signal must have even symmetry (i.e. x[n]=conj(x[N-n])).
If all you care about is intensity, the magnitude of the complex number is sufficient for analysis.
Yes, it is possible to represent the FFT frequency domain results of strictly real input using only real numbers.
Those complex numbers in the FFT result are simply just 2 real numbers, which are both required to give you the 2D coordinates of a result vector that has both a length and a direction angle (or magnitude and a phase). And every frequency component in the FFT result can have a unique amplitude and a unique phase (relative to some point in the FFT aperture).
One real number alone can't represent both magnitude and phase. If you throw away the phase information, that could easily massively distort the signal if you try to recreate it using an iFFT (and the signal isn't symmetric). So a complete FFT result requires 2 real numbers per FFT bin. These 2 real numbers are bundled together in some FFTs in a complex data type by common convention, but the FFT result could easily (and some FFTs do) just produce 2 real vectors (one for cosine coordinates and one for sine coordinates).
There are also FFT routines that produce magnitude and phase directly, but they run more slowly than FFTs that produces a complex (or two real) vector result. There also exist FFT routines that compute only the magnitude and just throw away the phase information, but they usually run no faster than letting you do that yourself after a more general FFT. Maybe they save a coder a few lines of code at the cost of not being invertible. But a lot of libraries don't bother to include these slower and less general forms of FFT, and just let the coder convert or ignore what they need or don't need.
Plus, many consider the math involved to be a lot more elegant using complex arithmetic (where, for strictly real input, the cosine correlation or even component of an FFT result is put in the real component, and the sine correlation or odd component of the FFT result is put in the imaginary component of a complex number.)
(Added:) And, as yet another option, you can consider the two components of each FFT result bin, instead of as real and imaginary components, as even and odd components, both real.
If your FFT coefficient for a given frequency f is x + i y, you can look at x as the coefficient of a cosine at that frequency, while the y is the coefficient of the sine. If you add these two waves for a particular frequency, you will get a phase-shifted wave at that frequency; the magnitude of this wave is sqrt(x*x + y*y), equal to the magnitude of the complex coefficient.
The Discrete Cosine Transform (DCT) is a relative of the Fourier transform which yields all real coefficients. A two-dimensional DCT is used by many image/video compression algorithms.
The discrete Fourier transform is fundamentally a transformation from a vector of complex numbers in the "time domain" to a vector of complex numbers in the "frequency domain" (I use quotes because if you apply the right scaling factors, the DFT is its own inverse). If your inputs are real, then you can perform two DFTs at once: Take the input vectors x and y and calculate F(x + i y). I forget how you separate the DFT afterwards, but I suspect it's something about symmetry and complex conjugates.
The discrete cosine transform sort-of lets you represent the "frequency domain" with the reals, and is common in lossy compression algorithms (JPEG, MP3). The surprising thing (to me) is that it works even though it appears to discard phase information, but this also seems to make it less useful for most signal processing purposes (I'm not aware of an easy way to do convolution/correlation with a DCT).
I've probably gotten some details wrong ;)
The way you've phrased this question, I believe you are looking for a more intuitive way of thinking rather than a mathematical answer. I come from a mechanical engineering background and this is how I think about the Fourier transform. I contextualize the Fourier transform with reference to a pendulum. If we have only the x-velocity vs time of a pendulum and we are asked to estimate the energy of the pendulum (or the forcing source of the pendulum), the Fourier transform gives a complete answer. As usually what we are observing is only the x-velocity, we might conclude that the pendulum only needs to be provided energy equivalent to its sinusoidal variation of kinetic energy. But the pendulum also has potential energy. This energy is 90 degrees out of phase with the potential energy. So to keep track of the potential energy, we are simply keeping track of the 90 degree out of phase part of the (kinetic)real component. The imaginary part may be thought of as a 'potential velocity' that represents a manifestation of the potential energy that the source must provide to force the oscillatory behaviour. What is helpful is that this can be easily extended to the electrical context where capacitors and inductors also store the energy in 'potential form'. If the signal is not sinusoidal of course the transform is trying to decompose it into sinusoids. This I see as assuming that the final signal was generated by combined action of infinite sources each with a distinct sinusoid behaviour. What we are trying to determine is a strength and phase of each source that creates the final observed signal at each time instant.
PS: 1) The last two statements is generally how I think of the Fourier transform itself.
2) I say potential velocity rather the potential energy as the transform usually does not change dimensions of the original signal or physical quantity so it cannot shift from representing velocity to energy.
Short answer
Why does FFT produce complex numbers instead of real numbers?
The reason FT result is a complex array is a complex exponential multiplier is involved in the coefficients calculation. The final result is therefore complex. FT uses the multiplier to correlate the signal against multiple frequencies. The principle is detailed further down.
Is it not possible to represent frequency domain in terms of real numbers only?
Of course the 1D array of complex coefficients returned by FT could be represented by a 2D array of real values, which can be either the Cartesian coordinates x and y, or the polar coordinates r and θ (more here). However...
Complex exponential form is the most suitable form for signal processing
Having only real data is not so useful.
On one hand it is already possible to get these coordinates using one of the functions real, imag, abs and angle.
On the other hand such isolated information is of very limited interest. E.g. if we add two signals with the same amplitude and frequency, but in phase opposition, the result is zero. But if we discard the phase information, we just double the signal, which is totally wrong.
Contrary to a common belief, the use of complex numbers is not because such a number is a handy container which can hold two independent values. It's because processing periodic signals involves trigonometry all the time, and there is a simple way to move from sines and cosines to more simple complex numbers algebra: Euler's formula.
So most of the time signals are just converted to their complex exponential form. E.g. a signal with frequency 10 Hz, amplitude 3 and phase π/4 radians:
can be described by x = 3.ei(2π.10.t+π/4).
splitting the exponent: x = 3.ei.π/4 times ei.2π.10.t, t being the time.
The first number is a constant called the phasor. A common compact form is 3∠π/4. The second number is a time-dependent variable called the carrier.
This signal 3.ei.π/4 times ei.2π.10.t is easily plotted, either as a cosine (real part) or a sine (imaginary part):
from numpy import arange, pi, e, real, imag
t = arange(0, 0.2, 1/200)
x = 3 * e ** (1j*pi/4) * e ** (1j*2*pi*10*t)
ax1.stem(t, real(x))
ax2.stem(t, imag(x))
Now if we look at FT coefficients, we see they are phasors, they don't embed the frequency which is only dependent on the number of samples and the sampling frequency.
Actually if we want to plot a FT component in the time domain, we have to separately create the carrier from the frequency found, e.g. by calling fftfreq. With the phasor and the carrier we have the spectral component.
A phasor is a vector, and a vector can turn
Cartesian coordinates are extracted by using real and imag functions, the phasor used above, 3.e(i.π/4), is also the complex number 2.12 + 2.12j (i is j for scientists and engineers). These coordinates can be plotted on a plane with the vertical axis representing i (left):
This point can also represent a vector (center). Polar coordinates can be used in place of Cartesian coordinates (right). Polar coordinates are extracted by abs and angle. It's clear this vector can also represent the phasor 3∠π/4 (short form for 3.e(i.π/4))
This reminder about vectors is to introduce how phasors are manipulated. Say we have a real number of amplitude 1, which is not less than a complex which angle is 0 and also a phasor (x∠0). We also have a second phasor (3∠π/4), and we want the product of the two phasors. We could compute the result using Cartesian coordinates with some trigonometry, but this is painful. The easiest way is to use the complex exponential form:
we just add the angles and multiply the real coefficients: 1.e(i.0) times 3.e(i.π/4) = 1x3.ei(0+π/4) = 3.e(i.π/4)
we can just write: (1∠0) times (3∠π/4) = (3∠π/4).
Whatever, the result is this one:
The practical effect is to turn the real number and scale its magnitude. In FT, the real is the sample amplitude, and the multiplier magnitude is actually 1, so this corresponds to this operation, but the result is the same:
This long introduction was to explain the math behind FT.
How spectral coefficients are created by FT
FT principle is, for each spectral coefficient to compute:
to multiply each of the samples amplitudes by a different phasor, so that the angle is increasing from the first sample to the last,
to sum all the previous products.
If there are N samples xn (0 to N-1), there are N spectral coefficients Xk to compute. Calculation of coefficient Xk involves multiplying each sample amplitude xn by the phasor e-i2πkn/N and taking the sum, according to FT equation:
In the N individual products, the multiplier angle varies according to 2π.n/N and k, meaning the angle changes, ignoring k for now, from 0 to 2π. So while performing the products, we multiply a variable real amplitude by a phasor which magnitude is 1 and angle is going from 0 to a full round. We know this multiplication turns and scales the real amplitude:
Source: A. Dieckmann from Physikalisches Institut der Universität Bonn
Doing this summation is actually trying to correlate the signal samples to the phasor angular velocity, which is how fast its angle varies with n/N. The result tells how strong this correlation is (amplitude), and how much synchroneous it is (phase).
This operation is repeated for the k spectral coefficients to compute (half with k negative, half with k positive). As k changes, the angle increment also varies, so the correlation is checked against another frequency.
Conclusion
FT results are neither sines nor cosines, they are not waves, they are phasors describing a correlation. A phasor is a constant, expressed as a complex exponential, embedding both amplitude and phase. Multiplied by a carrier, which is also a complex exponential, but variable, dependent on time, they draw helices in time domain:
Source
When these helices are projected onto the horizontal plane, this is done by taking the real part of the FT result, the function drawn is the cosine. When projected onto the vertical plane, which is done by taking the imaginary part of the FT result, the function drawn is the sine. The phase determines at which angle the helix starts and therefore without the phase, the signal cannot be reconstructed using an inverse FT.
The complex exponential multiplier is a tool to transform the linear velocity of amplitude variations into angular velocity, which is frequency times 2π. All that revolves around Euler's formula linking sinusoid and complex exponential.
For a signal with only cosine waves, fourier transform, aka. FFT produces completely real output. For a signal composed of only sine waves, it produces completely imaginary output. A phase shift in any of the signals will result in a mix of real and complex. Complex numbers (in this context) are merely another way to store phase and amplitude.
I was wondering if someone could point me to an algorithm/technique that is used to compare time dependent signals. Ideally, this hypothetical algorithm would take in 2 signals as inputs and return a number that would be the percentage similarity between the signals (0 being that the 2 signals are statistically unrelated and 1 being that they are a perfect match).
Of course, I realize that there are problems with my request, namely that I'm not sure how to properly define 'similarity' in the context of comparing these 2 signals, so if someone could also point me in the right direction (as to what I should look up/know, etc.), I'd appreciate it as well.
The cross-correlation function is the classic signal processing solution. If you have access to Matlab, see the XCORR function. max(abs(xcorr(Signal1, Signal2, 'coeff'))) would give you specifically what you're looking for and an equivalent exists in Python as well.
Cross-correlation assumes that the "similarity" you're looking for is a measure of the linear relationship between the two signals. The definition for real-valued finite-length signals with time index n = 0..N-1 is:
C[g] = sum{m = 0..N-1} (x1[m] * x2[g+m])
g runs from -N..N (outside that range the product inside the sum is 0).
Although you asked for a number, the function is pretty interesting. The function domain g is called the lag domain.
If x1 and x2 are related by a time shift, the cross-correlation function will have its peak at the lag corresponding to the shift. For instance, if you had x1 = sin[wn] and x2 = sin[wn + phi], so two sine waves at the same frequency and different phase, the cross-correlation function would have its peak at the lag corresponding to the phase shift.
If x2 is a scaled version of x1, the cross-correlation will scale also. You can normalize the function to a correlation coefficient by dividing by sqrt(sum(x1^2)*sum(x2^2)), and bring it into 0..1 by taking an absolute value (that line of Matlab has these operations).
More generally, below is a summary of what cross-correlation is good/bad for.
Cross-correlation works well for determining if one signal is linearly related to another, that is if
x2(t) = sum{n = 0..K-1}(A_n * x1(t + phi_n))
where x1(t) and x2(t) are the signals in question, A_n are scaling factors, and phi_n are time shifts. The implications of this are:
If one signal is a time shifted version of the other (phi_n <> 0 for some n) the cross-correlation function will be non-zero.
If one signal is a scaled version of the other (A_n <> 0 for some n) the cross-correlation function will be non-zero.
If one signal is a combination of scaled and time shifted versions of the other (both A_n and phi_n are non-zero for some number of n's) the cross-correlation function will be non-zero. Note that this is also a definition of a linear filter.
To get more concrete, suppose x1 is a wideband random signal. Let x2=x1. Now the normalized cross-correlation function will be exactly 1 at g=0, and near 0 everywhere else. Now let x2 be a (linearly) filtered version of x1. The cross-correlation function will be non-zero near g=0. The width of the non-zero part will depend on the bandwidth of the filter.
For the special case of x1 and x2 being periodic, the information on the phase-shift in the original part of the answer applies.
Where cross-correlation will not help is if the two signals are not linearly related. For instance, two periodic signals at different frequencies are not linearly related. Nor are two random signals drawn from a wideband random process at different times. Nor are two signals that are similar in shape but with different time indexing - this is like the unequal fundamental frequency case.
In all cases, normalizing the cross-correlation function and looking at the maximum value will tell you if the signals are potentially linearly related - if the number is low, like under 0.1, I would be comfortable declaring them unrelated. Higher than that and I'd look into it more carefully, graphing both the normalized and unnormalized cross-correlation functions and looking at the structure. A periodic cross-correlation implies both signals are periodic, and a cross-correlation function that is noticeably higher around g=0 implies one signal is a filtered version of the other.
You could try a Fast Fourier Transform (look up FFT in Wikipedia, there are open source libraries for performing conversions).
FFTs will transform your data from time domain (i.e. a pulse at 1s, 2s, 3s, 4s...) to data in frequency domain (i.e. a pulse each second).
Then you can compare frequencies and their relative strenghts more easily. It should be a step in the right direction for you.
General solution: you can bin the data into histograms and use a Chi-squared test or a Kolomogrov test.
Both are explicitly intended to estimate the chances that the two distribution represent random samples from the same underling distribution (that is: have the same shape to within statistics).
I don't know a c implementation off the top of my head, but ROOT provides c++ implementation of both:
TH1::Chi2Test
TH1:KolmogorovTest
I believe the docs point to some papers as well.
I think that CERNLIB provides both algorithms in fortran77, which you can link to c. Translating the ROOT code might be easier.
Dynamic Time Warping is an approach you can use if the signals should be matched by speeding up and slowing down time at different positions.
You don't say very much about what the signals are, and what measure of "sameness" would be meaningful to you. But, if the signals are in-phase (that is, you want to compare the two signals instant-by-instant, and there's not going to be any time-delay to consider) then I'd suggest you look at Pearson's correlator. It gives you a value of 1 if the two signals are identical, a value of 0 if they're entirely dissimilar, and something in between if they kinda rhyme. As an added advantage, Pearson's doesn't care if the signals are amplified differently, (except if one signal is the inverse of the other it gives you a result of -1).
Does that sound like what you're looking for?
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
I don't know signal processing, so this is a guess ..:
Is your signal effectively a list of ordered pairs (x,y), where x is the time and y the amplitude? If so, then perhaps you could throw away then time coordinate -- e.g.:
Signal 1: [(x0,y0), (x1,y1), (x2,y2), (x3,y3), ...]
Signal 2: [(x0,z0), (x1,z1), (x2,z1), (x3,z3), ...]
Throw away time:
Signal 1: [y0, y1, y2, y3, ...]
Signal 2: [z0, z1, z2, z3, ...]
Then you can compare the amplitudes with each other, perhaps by looking for a correlation. Perhaps you could plot y against z:
Comparing: [(y0,z0), (y1,z1), (y2,z2), (y3,z3), ...]
Or calculate one of the various correlation coefficients.