Related
I've been trying to find some places to help me better understand DFT and how to compute it but to no avail. So I need help understanding DFT and it's computation of complex numbers.
Basically, I'm just looking for examples on how to compute DFT with an explanation on how it was computed because in the end, I'm looking to create an algorithm to compute it.
I assume 1D DFT/IDFT ...
All DFT's use this formula:
X(k) is transformed sample value (complex domain)
x(n) is input data sample value (real or complex domain)
N is number of samples/values in your dataset
This whole thing is usually multiplied by normalization constant c. As you can see for single value you need N computations so for all samples it is O(N^2) which is slow.
Here mine Real<->Complex domain DFT/IDFT in C++ you can find also hints on how to compute 2D transform with 1D transforms and how to compute N-point DCT,IDCT by N-point DFT,IDFT there.
Fast algorithms
There are fast algorithms out there based on splitting this equation to odd and even parts of the sum separately (which gives 2x N/2 sums) which is also O(N) per single value, but the 2 halves are the same equations +/- some constant tweak. So one half can be computed from the first one directly. This leads to O(N/2) per single value. if you apply this recursively then you get O(log(N)) per single value. So the whole thing became O(N.log(N)) which is awesome but also adds this restrictions:
All DFFT's need the input dataset is of size equal to power of two !!!
So it can be recursively split. Zero padding to nearest bigger power of 2 is used for invalid dataset sizes (in audio tech sometimes even phase shift). Look here:
mine Complex->Complex domain DFT,DFFT in C++
some hints on constructing FFT like algorithms
Complex numbers
c = a + i*b
c is complex number
a is its real part (Re)
b is its imaginary part (Im)
i*i=-1 is imaginary unit
so the computation is like this
addition:
c0+c1=(a0+i.b0)+(a1+i.b1)=(a0+a1)+i.(b0+b1)
multiplication:
c0*c1=(a0+i.b0)*(a1+i.b1)
=a0.a1+i.a0.b1+i.b0.a1+i.i.b0.b1
=(a0.a1-b0.b1)+i.(a0.b1+b0.a1)
polar form
a = r.cos(θ)
b = r.sin(θ)
r = sqrt(a.a + b.b)
θ = atan2(b,a)
a+i.b = r|θ
sqrt
sqrt(r|θ) = (+/-)sqrt(r)|(θ/2)
sqrt(r.(cos(θ)+i.sin(θ))) = (+/-)sqrt(r).(cos(θ/2)+i.sin(θ/2))
real -> complex conversion:
complex = real+i.0
[notes]
do not forget that you need to convert data to different array (not in place)
normalization constant on FFT recursion is tricky (usually something like /=log2(N) depends also on the recursion stopping condition)
do not forget to stop the recursion if N=1 or 2 ...
beware FPU can overflow on big datasets (N is big)
here some insights to DFT/DFFT
here 2D FFT and wrapping example
usually Euler's formula is used to compute e^(i.x)=cos(x)+i.sin(x)
here How do I obtain the frequencies of each value in an FFT?
you find how to obtain the Niquist frequencies
[edit1] Also I strongly recommend to see this amazing video (I just found):
But what is the Fourier Transform A visual introduction
It describes the (D)FT in geometric representation. I would change some minor stuff in it but still its amazingly simple to understand.
All the FFT implementations we have come across result in complex values (with real and imaginary parts), even if the input to the algorithm was a discrete set of real numbers (integers).
Is it not possible to represent frequency domain in terms of real numbers only?
The FFT is fundamentally a change of basis. The basis into which the FFT changes your original signal is a set of sine waves instead. In order for that basis to describe all the possible inputs it needs to be able to represent phase as well as amplitude; the phase is represented using complex numbers.
For example, suppose you FFT a signal containing only a single sine wave. Depending on phase you might well get an entirely real FFT result. But if you shift the phase of your input a few degrees, how else can the FFT output represent that input?
edit: This is a somewhat loose explanation, but I'm just trying to motivate the intuition.
The FFT provides you with amplitude and phase. The amplitude is encoded as the magnitude of the complex number (sqrt(x^2+y^2)) while the phase is encoded as the angle (atan2(y,x)). To have a strictly real result from the FFT, the incoming signal must have even symmetry (i.e. x[n]=conj(x[N-n])).
If all you care about is intensity, the magnitude of the complex number is sufficient for analysis.
Yes, it is possible to represent the FFT frequency domain results of strictly real input using only real numbers.
Those complex numbers in the FFT result are simply just 2 real numbers, which are both required to give you the 2D coordinates of a result vector that has both a length and a direction angle (or magnitude and a phase). And every frequency component in the FFT result can have a unique amplitude and a unique phase (relative to some point in the FFT aperture).
One real number alone can't represent both magnitude and phase. If you throw away the phase information, that could easily massively distort the signal if you try to recreate it using an iFFT (and the signal isn't symmetric). So a complete FFT result requires 2 real numbers per FFT bin. These 2 real numbers are bundled together in some FFTs in a complex data type by common convention, but the FFT result could easily (and some FFTs do) just produce 2 real vectors (one for cosine coordinates and one for sine coordinates).
There are also FFT routines that produce magnitude and phase directly, but they run more slowly than FFTs that produces a complex (or two real) vector result. There also exist FFT routines that compute only the magnitude and just throw away the phase information, but they usually run no faster than letting you do that yourself after a more general FFT. Maybe they save a coder a few lines of code at the cost of not being invertible. But a lot of libraries don't bother to include these slower and less general forms of FFT, and just let the coder convert or ignore what they need or don't need.
Plus, many consider the math involved to be a lot more elegant using complex arithmetic (where, for strictly real input, the cosine correlation or even component of an FFT result is put in the real component, and the sine correlation or odd component of the FFT result is put in the imaginary component of a complex number.)
(Added:) And, as yet another option, you can consider the two components of each FFT result bin, instead of as real and imaginary components, as even and odd components, both real.
If your FFT coefficient for a given frequency f is x + i y, you can look at x as the coefficient of a cosine at that frequency, while the y is the coefficient of the sine. If you add these two waves for a particular frequency, you will get a phase-shifted wave at that frequency; the magnitude of this wave is sqrt(x*x + y*y), equal to the magnitude of the complex coefficient.
The Discrete Cosine Transform (DCT) is a relative of the Fourier transform which yields all real coefficients. A two-dimensional DCT is used by many image/video compression algorithms.
The discrete Fourier transform is fundamentally a transformation from a vector of complex numbers in the "time domain" to a vector of complex numbers in the "frequency domain" (I use quotes because if you apply the right scaling factors, the DFT is its own inverse). If your inputs are real, then you can perform two DFTs at once: Take the input vectors x and y and calculate F(x + i y). I forget how you separate the DFT afterwards, but I suspect it's something about symmetry and complex conjugates.
The discrete cosine transform sort-of lets you represent the "frequency domain" with the reals, and is common in lossy compression algorithms (JPEG, MP3). The surprising thing (to me) is that it works even though it appears to discard phase information, but this also seems to make it less useful for most signal processing purposes (I'm not aware of an easy way to do convolution/correlation with a DCT).
I've probably gotten some details wrong ;)
The way you've phrased this question, I believe you are looking for a more intuitive way of thinking rather than a mathematical answer. I come from a mechanical engineering background and this is how I think about the Fourier transform. I contextualize the Fourier transform with reference to a pendulum. If we have only the x-velocity vs time of a pendulum and we are asked to estimate the energy of the pendulum (or the forcing source of the pendulum), the Fourier transform gives a complete answer. As usually what we are observing is only the x-velocity, we might conclude that the pendulum only needs to be provided energy equivalent to its sinusoidal variation of kinetic energy. But the pendulum also has potential energy. This energy is 90 degrees out of phase with the potential energy. So to keep track of the potential energy, we are simply keeping track of the 90 degree out of phase part of the (kinetic)real component. The imaginary part may be thought of as a 'potential velocity' that represents a manifestation of the potential energy that the source must provide to force the oscillatory behaviour. What is helpful is that this can be easily extended to the electrical context where capacitors and inductors also store the energy in 'potential form'. If the signal is not sinusoidal of course the transform is trying to decompose it into sinusoids. This I see as assuming that the final signal was generated by combined action of infinite sources each with a distinct sinusoid behaviour. What we are trying to determine is a strength and phase of each source that creates the final observed signal at each time instant.
PS: 1) The last two statements is generally how I think of the Fourier transform itself.
2) I say potential velocity rather the potential energy as the transform usually does not change dimensions of the original signal or physical quantity so it cannot shift from representing velocity to energy.
Short answer
Why does FFT produce complex numbers instead of real numbers?
The reason FT result is a complex array is a complex exponential multiplier is involved in the coefficients calculation. The final result is therefore complex. FT uses the multiplier to correlate the signal against multiple frequencies. The principle is detailed further down.
Is it not possible to represent frequency domain in terms of real numbers only?
Of course the 1D array of complex coefficients returned by FT could be represented by a 2D array of real values, which can be either the Cartesian coordinates x and y, or the polar coordinates r and θ (more here). However...
Complex exponential form is the most suitable form for signal processing
Having only real data is not so useful.
On one hand it is already possible to get these coordinates using one of the functions real, imag, abs and angle.
On the other hand such isolated information is of very limited interest. E.g. if we add two signals with the same amplitude and frequency, but in phase opposition, the result is zero. But if we discard the phase information, we just double the signal, which is totally wrong.
Contrary to a common belief, the use of complex numbers is not because such a number is a handy container which can hold two independent values. It's because processing periodic signals involves trigonometry all the time, and there is a simple way to move from sines and cosines to more simple complex numbers algebra: Euler's formula.
So most of the time signals are just converted to their complex exponential form. E.g. a signal with frequency 10 Hz, amplitude 3 and phase π/4 radians:
can be described by x = 3.ei(2π.10.t+π/4).
splitting the exponent: x = 3.ei.π/4 times ei.2π.10.t, t being the time.
The first number is a constant called the phasor. A common compact form is 3∠π/4. The second number is a time-dependent variable called the carrier.
This signal 3.ei.π/4 times ei.2π.10.t is easily plotted, either as a cosine (real part) or a sine (imaginary part):
from numpy import arange, pi, e, real, imag
t = arange(0, 0.2, 1/200)
x = 3 * e ** (1j*pi/4) * e ** (1j*2*pi*10*t)
ax1.stem(t, real(x))
ax2.stem(t, imag(x))
Now if we look at FT coefficients, we see they are phasors, they don't embed the frequency which is only dependent on the number of samples and the sampling frequency.
Actually if we want to plot a FT component in the time domain, we have to separately create the carrier from the frequency found, e.g. by calling fftfreq. With the phasor and the carrier we have the spectral component.
A phasor is a vector, and a vector can turn
Cartesian coordinates are extracted by using real and imag functions, the phasor used above, 3.e(i.π/4), is also the complex number 2.12 + 2.12j (i is j for scientists and engineers). These coordinates can be plotted on a plane with the vertical axis representing i (left):
This point can also represent a vector (center). Polar coordinates can be used in place of Cartesian coordinates (right). Polar coordinates are extracted by abs and angle. It's clear this vector can also represent the phasor 3∠π/4 (short form for 3.e(i.π/4))
This reminder about vectors is to introduce how phasors are manipulated. Say we have a real number of amplitude 1, which is not less than a complex which angle is 0 and also a phasor (x∠0). We also have a second phasor (3∠π/4), and we want the product of the two phasors. We could compute the result using Cartesian coordinates with some trigonometry, but this is painful. The easiest way is to use the complex exponential form:
we just add the angles and multiply the real coefficients: 1.e(i.0) times 3.e(i.π/4) = 1x3.ei(0+π/4) = 3.e(i.π/4)
we can just write: (1∠0) times (3∠π/4) = (3∠π/4).
Whatever, the result is this one:
The practical effect is to turn the real number and scale its magnitude. In FT, the real is the sample amplitude, and the multiplier magnitude is actually 1, so this corresponds to this operation, but the result is the same:
This long introduction was to explain the math behind FT.
How spectral coefficients are created by FT
FT principle is, for each spectral coefficient to compute:
to multiply each of the samples amplitudes by a different phasor, so that the angle is increasing from the first sample to the last,
to sum all the previous products.
If there are N samples xn (0 to N-1), there are N spectral coefficients Xk to compute. Calculation of coefficient Xk involves multiplying each sample amplitude xn by the phasor e-i2πkn/N and taking the sum, according to FT equation:
In the N individual products, the multiplier angle varies according to 2π.n/N and k, meaning the angle changes, ignoring k for now, from 0 to 2π. So while performing the products, we multiply a variable real amplitude by a phasor which magnitude is 1 and angle is going from 0 to a full round. We know this multiplication turns and scales the real amplitude:
Source: A. Dieckmann from Physikalisches Institut der Universität Bonn
Doing this summation is actually trying to correlate the signal samples to the phasor angular velocity, which is how fast its angle varies with n/N. The result tells how strong this correlation is (amplitude), and how much synchroneous it is (phase).
This operation is repeated for the k spectral coefficients to compute (half with k negative, half with k positive). As k changes, the angle increment also varies, so the correlation is checked against another frequency.
Conclusion
FT results are neither sines nor cosines, they are not waves, they are phasors describing a correlation. A phasor is a constant, expressed as a complex exponential, embedding both amplitude and phase. Multiplied by a carrier, which is also a complex exponential, but variable, dependent on time, they draw helices in time domain:
Source
When these helices are projected onto the horizontal plane, this is done by taking the real part of the FT result, the function drawn is the cosine. When projected onto the vertical plane, which is done by taking the imaginary part of the FT result, the function drawn is the sine. The phase determines at which angle the helix starts and therefore without the phase, the signal cannot be reconstructed using an inverse FT.
The complex exponential multiplier is a tool to transform the linear velocity of amplitude variations into angular velocity, which is frequency times 2π. All that revolves around Euler's formula linking sinusoid and complex exponential.
For a signal with only cosine waves, fourier transform, aka. FFT produces completely real output. For a signal composed of only sine waves, it produces completely imaginary output. A phase shift in any of the signals will result in a mix of real and complex. Complex numbers (in this context) are merely another way to store phase and amplitude.
I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss
Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?
For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:
Assume we have a AxB matrix where A is number of samples and B is the number of dimensions, e.g. 100x5000. And I want to reduce the dimension of it to 500, which will produce a 100x500 matrix.
As far as I understand: first, I need to construct a 100x500 matrix and fill the entries randomly with +1 and -1 (with a 50% probability).
Edit:
Okay, I think I started to get it. So we have a matrix A which is mxn. We want to reduce it to E which is mxk.
What we need to do is, to construct a matrix R which has nxk dimension, and fill it with 0, -1 or +1, with respect to 2/3, 1/6 and 1/6 probability.
After constructing this R, we'll simply do a matrix multiplication AxR to find our reduced matrix E. But we don't need to do a full matrix multiplication, because if an element of Ri is 0, we don't need to do calculation. Simply skip it. But if we face with 1, we just add the column, or if it's -1, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E. And that is what makes this method very fast.
It turned out a very neat algorithm, although I feel too stupid to get the idea.
You have the idea right. However as I understand random project, the rows of your matrix R should have unit length. I believe that's approximately what the normalizing by 1/sqrt(k) is for, to normalize away the fact that they're not unit vectors.
It isn't a projection, but, it's nearly a projection; R's rows aren't orthonormal, but within a much higher-dimensional space, they quite nearly are. In fact the dot product of any two of those vectors you choose will be pretty close to 0. This is why it is a generally good approximation of actually finding a proper basis for projection.
The mapping from high-dimensional data A to low-dimensional data E is given in the statement of theorem 1.1 in the latter paper - it is simply a scalar multiplication followed by a matrix multiplication. The data vectors are the rows of the matrices A and E. As the author points out in section 7.1, you don't need to use a full matrix multiplication algorithm.
If your dataset is sparse, then sparse random projections will not work well.
You have a few options here:
Option A:
Step 1. apply a structured dense random projection (so called fast hadamard transform is typically used). This is a special projection which is very fast to compute but otherwise has the properties of a normal dense random projection
Step 2. apply sparse projection on the "densified data" (sparse random projections are useful for dense data only)
Option B:
Apply SVD on the sparse data. If the data is sparse but has some structure SVD is better. Random projection preserves the distances between all points. SVD preserves better the distances between dense regions - in practice this is more meaningful. Also people use random projections to compute the SVD on huge datasets. Random Projections gives you efficiency, but not necessarily the best quality of embedding in a low dimension.
If your data has no structure, then use random projections.
Option C:
For data points for which SVD has little error, use SVD; for the rest of the points use Random Projection
Option D:
Use a random projection based on the data points themselves.
This is very easy to understand what is going on. It looks something like this:
create a n by k matrix (n number of data point, k new dimension)
for i from 0 to k do #generate k random projection vectors
randomized_combination = feature vector of zeros (number of zeros = number of features)
sample_point_ids = select a sample of point ids
for each point_id in sample_point_ids do:
random_sign = +1/-1 with prob. 1/2
randomized_combination += random_sign*feature_vector[point_id] #this is a vector operation
normalize the randomized combination
#note that the normal random projection is:
# randomized_combination = [+/-1, +/-1, ...] (k +/-1; if you want sparse randomly set a fraction to 0; also good to normalize by length]
to project the data points on this random feature just do
for each data point_id in dataset:
scores[point_id, j] = dot_product(feature_vector[point_id], randomized_feature)
If you are still looking to solve this problem, write a message here, I can give you more pseudocode.
The way to think about it is that a random projection is just a random pattern and the dot product (i.e. projecting the data point) between the data point and the pattern gives you the overlap between them. So if two data points overlap with many random patterns, those points are similar. Therefore, random projections preserve similarity while using less space, but they also add random fluctuations in the pairwise similarities. What JLT tells you is that to make fluctuations 0.1 (eps)
you need about 100*log(n) dimensions.
Good Luck!
An R Package to perform Random Projection using Johnson- Lindenstrauss Lemma
RandPro
I'm searching a space of vectors of length 12, with entries 0, 1, 2. For example, one such vector is
001122001122. I have about a thousand good vectors, and about a thousand bad vectors. For each bad vector I need to locate the closest good vector. Distance between two vectors is just the number of coordinates which don't match. The good vectors aren't particularly nicely arranged, and the reason they're "good" doesn't seem to be helpful here. My main priority is that the algorithm be fast.
If I do a simple exhaustive search, I have to calculate about 1000*1000 distances. That seems pretty thick-headed.
If I apply Dijkstra's algorithm first using the good vectors, I can calculate the closest vector and minimal distance for every vector in the space, so that each bad vector requires a simple lookup. But the space has 3^12 = 531,441 vectors in it, so the precomputation is half a million distance computations. Not much savings.
Can you help me think of a better way?
Edit: Since people asked earnestly what makes them "good": Each vector represents a description of a hexagonal picture of six equilateral triangles, which is the 2D image of a 3D arrangement of cubes (think generalized Q-bert). The equilateral triangles are halves of faces of cubes (45-45-90), tilted into perspective. Six of the coordinates describe the nature of the triangle (perceived floor, left wall, right wall), and six coordinates describe the nature of the edges (perceived continuity, two kinds of perceived discontinuity). The 1000 good vectors are those that represent hexagons that can be witnessed when seeing cubes-in-perspective. The reason for the search is to apply local corrections to a hex map full of triangles...
Just to keep the things in perspective, and be sure you are not optimizing unnecessary things, the brute force approach without any optimization takes 12 seconds in my machine.
Code in Mathematica:
bad = Table[RandomInteger[5, 12], {1000}];
good = Table[RandomInteger[2, 12], {1000}];
distance[a_, b_] := Total[Sign#Abs[a - b]];
bestMatch = #[[2]] & /#
Position[
Table[Ordering#
Table[distance[good[[j]], bad[[i]]], {j, Length#good}], {i,
Length#bad}], 1] // Timing
As you may expect, the Time follows a O(n^2) law:
This sounds a lot like what spellcheckers have to do. The trick is generally to abuse tries.
The most basic thing you can do is build a trie over the good vectors, then do a flood-fill prioritizing branches with few mismatches. This will be very fast when there is a nearby vector, and degenerate to brute force when the closest vector is very far away. Not bad.
But I think you can do better. Bad vectors which share the same prefix will do the same initial branching work, so we can try to share that as well. So we also build a trie over the bad vectors and sortof do them all at once.
No guarantees this is correct, since both the algorithm and code are off the top of my head:
var goodTrie = new Trie(goodVectors)
var badTrie = new Trie(badVectors)
var result = new Map<Vector, Vector>()
var pq = new PriorityQueue(x => x.error)
pq.add(new {good: goodTrie, bad: badTrie, error: 0})
while pq.Count > 0
var g,b,e = q.Dequeue()
if b.Count == 0:
//all leafs of this path have been removed
continue
if b.IsLeaf:
//we have found a mapping with minimum error for this bad item
result[b.Item] = g.Item
badTrie.remove(b) //prevent redundant results
else:
//We are zipping down the tries. Branch to all possibilities.
q.EnqueueAll(from i in {0,1,2}
from j in {0,1,2}
select new {good: g[i], bad: b[j], error: e + i==j ? 0 : 1})
return result
A final optimization might be to re-order the vectors so positions with high agreement among the bad vectors come first and share more work.
3^12 isn't a very large search space. If speed is essential and generality of the algorithm is not, you could just map each vector to an int in the range 0..531440 and use it as an index into a precomputed table of "nearest good vectors".
If you gave each entry in that table a 32-bit word (which is more than enough), you'd be looking at about 2 MB for the table, in exchange for pretty much instantaneous "calculation".
edit: this is not much different from the precomputation the question suggests, but my point is just that depending on the application, there's not necessarily any problem with doing it that way, especially if you do all the precalculations before the application even runs.
My computational geometry is VERY rough, but it seems that you should be able to:
Calculate the Voronoi diagram for your set of good vectors.
Calculate the BSP tree for the cells of the diagram.
The Voronoi diagram will give you a 12th dimensional convex hull for each good vector that contains that all the points closest to that vector.
The BSP tree will give you a fast way to determine which cell a vector lies within and, therefore, which good vector it is closest to.
EDIT: I just noticed that you are using hamming distances instead of euclidean distances. I'm not sure how this could be adapted to fit that constraint. Sorry.
Assuming a packed representation for the vectors, one distance computation (comparing one good vector and one bad vector to yield the distance) can be completed in roughly 20 clock cycles or less. Hence a million such distance calculations can be done in 20 million cycles or (assuming a 2GHz cpu) 0.01 sec. Do these numbers help?
PS:- 20 cycles is a conservative overestimate.
I was wondering if someone could point me to an algorithm/technique that is used to compare time dependent signals. Ideally, this hypothetical algorithm would take in 2 signals as inputs and return a number that would be the percentage similarity between the signals (0 being that the 2 signals are statistically unrelated and 1 being that they are a perfect match).
Of course, I realize that there are problems with my request, namely that I'm not sure how to properly define 'similarity' in the context of comparing these 2 signals, so if someone could also point me in the right direction (as to what I should look up/know, etc.), I'd appreciate it as well.
The cross-correlation function is the classic signal processing solution. If you have access to Matlab, see the XCORR function. max(abs(xcorr(Signal1, Signal2, 'coeff'))) would give you specifically what you're looking for and an equivalent exists in Python as well.
Cross-correlation assumes that the "similarity" you're looking for is a measure of the linear relationship between the two signals. The definition for real-valued finite-length signals with time index n = 0..N-1 is:
C[g] = sum{m = 0..N-1} (x1[m] * x2[g+m])
g runs from -N..N (outside that range the product inside the sum is 0).
Although you asked for a number, the function is pretty interesting. The function domain g is called the lag domain.
If x1 and x2 are related by a time shift, the cross-correlation function will have its peak at the lag corresponding to the shift. For instance, if you had x1 = sin[wn] and x2 = sin[wn + phi], so two sine waves at the same frequency and different phase, the cross-correlation function would have its peak at the lag corresponding to the phase shift.
If x2 is a scaled version of x1, the cross-correlation will scale also. You can normalize the function to a correlation coefficient by dividing by sqrt(sum(x1^2)*sum(x2^2)), and bring it into 0..1 by taking an absolute value (that line of Matlab has these operations).
More generally, below is a summary of what cross-correlation is good/bad for.
Cross-correlation works well for determining if one signal is linearly related to another, that is if
x2(t) = sum{n = 0..K-1}(A_n * x1(t + phi_n))
where x1(t) and x2(t) are the signals in question, A_n are scaling factors, and phi_n are time shifts. The implications of this are:
If one signal is a time shifted version of the other (phi_n <> 0 for some n) the cross-correlation function will be non-zero.
If one signal is a scaled version of the other (A_n <> 0 for some n) the cross-correlation function will be non-zero.
If one signal is a combination of scaled and time shifted versions of the other (both A_n and phi_n are non-zero for some number of n's) the cross-correlation function will be non-zero. Note that this is also a definition of a linear filter.
To get more concrete, suppose x1 is a wideband random signal. Let x2=x1. Now the normalized cross-correlation function will be exactly 1 at g=0, and near 0 everywhere else. Now let x2 be a (linearly) filtered version of x1. The cross-correlation function will be non-zero near g=0. The width of the non-zero part will depend on the bandwidth of the filter.
For the special case of x1 and x2 being periodic, the information on the phase-shift in the original part of the answer applies.
Where cross-correlation will not help is if the two signals are not linearly related. For instance, two periodic signals at different frequencies are not linearly related. Nor are two random signals drawn from a wideband random process at different times. Nor are two signals that are similar in shape but with different time indexing - this is like the unequal fundamental frequency case.
In all cases, normalizing the cross-correlation function and looking at the maximum value will tell you if the signals are potentially linearly related - if the number is low, like under 0.1, I would be comfortable declaring them unrelated. Higher than that and I'd look into it more carefully, graphing both the normalized and unnormalized cross-correlation functions and looking at the structure. A periodic cross-correlation implies both signals are periodic, and a cross-correlation function that is noticeably higher around g=0 implies one signal is a filtered version of the other.
You could try a Fast Fourier Transform (look up FFT in Wikipedia, there are open source libraries for performing conversions).
FFTs will transform your data from time domain (i.e. a pulse at 1s, 2s, 3s, 4s...) to data in frequency domain (i.e. a pulse each second).
Then you can compare frequencies and their relative strenghts more easily. It should be a step in the right direction for you.
General solution: you can bin the data into histograms and use a Chi-squared test or a Kolomogrov test.
Both are explicitly intended to estimate the chances that the two distribution represent random samples from the same underling distribution (that is: have the same shape to within statistics).
I don't know a c implementation off the top of my head, but ROOT provides c++ implementation of both:
TH1::Chi2Test
TH1:KolmogorovTest
I believe the docs point to some papers as well.
I think that CERNLIB provides both algorithms in fortran77, which you can link to c. Translating the ROOT code might be easier.
Dynamic Time Warping is an approach you can use if the signals should be matched by speeding up and slowing down time at different positions.
You don't say very much about what the signals are, and what measure of "sameness" would be meaningful to you. But, if the signals are in-phase (that is, you want to compare the two signals instant-by-instant, and there's not going to be any time-delay to consider) then I'd suggest you look at Pearson's correlator. It gives you a value of 1 if the two signals are identical, a value of 0 if they're entirely dissimilar, and something in between if they kinda rhyme. As an added advantage, Pearson's doesn't care if the signals are amplified differently, (except if one signal is the inverse of the other it gives you a result of -1).
Does that sound like what you're looking for?
http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient
I don't know signal processing, so this is a guess ..:
Is your signal effectively a list of ordered pairs (x,y), where x is the time and y the amplitude? If so, then perhaps you could throw away then time coordinate -- e.g.:
Signal 1: [(x0,y0), (x1,y1), (x2,y2), (x3,y3), ...]
Signal 2: [(x0,z0), (x1,z1), (x2,z1), (x3,z3), ...]
Throw away time:
Signal 1: [y0, y1, y2, y3, ...]
Signal 2: [z0, z1, z2, z3, ...]
Then you can compare the amplitudes with each other, perhaps by looking for a correlation. Perhaps you could plot y against z:
Comparing: [(y0,z0), (y1,z1), (y2,z2), (y3,z3), ...]
Or calculate one of the various correlation coefficients.