Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to understand MATLAB's code for the Hough Transform.
Some items are clear to me in this picture,
binary_image is the monochrome version of input_image.
hough_lines is a vector containing detected lines in the image. I see that, four lines have been detected.
T contain the thetas in the (ϴ, ρ) space of the image.
R contain the rhos in the (ϴ, ρ) space of the image.
I have the following questions,
Why is the image rotated before applying Hough Transform?
What do the entries in H represent?
Why is H(Hough Matrix) of size 45x180? Where does this size come from?
Why is T of size 1x180? Where does this size come from?
Why is R of size 1x45? Where does this size come from?
What do the entries in P represent? Are they (x, y) or (ϴ, ρ) ?
29 162
29 165
28 170
21 5
29 158
Why is the value 5 passed into houghpeaks()?
What is the logic behind ceil(0.3*max(H(:)))?
Relevant source code
% Read image into workspace.
input_image = imread('Untitled.bmp');
%Rotate the image.
rotated_image = imrotate(input_image,33,'crop');
% convert rgb to grascale
rotated_image = rgb2gray(rotated_image);
%Create a binary image.
binary_image = edge(rotated_image,'canny');
%Create the Hough transform using the binary image.
[H,T,R] = hough(binary_image);
%Find peaks in the Hough transform of the image.
P = houghpeaks(H,5,'threshold',ceil(0.3*max(H(:))));
%Find lines
hough_lines = houghlines(binary_image,T,R,P,'FillGap',5,'MinLength',7);
% Plot the detected lines
figure, imshow(rotated_image), hold on
max_len = 0;
for k = 1:length(hough_lines)
xy = [hough_lines(k).point1; hough_lines(k).point2];
plot(xy(:,1),xy(:,2),'LineWidth',2,'Color','green');
% Plot beginnings and ends of lines
plot(xy(1,1),xy(1,2),'x','LineWidth',2,'Color','yellow');
plot(xy(2,1),xy(2,2),'x','LineWidth',2,'Color','red');
% Determine the endpoints of the longest line segment
len = norm(hough_lines(k).point1 - hough_lines(k).point2);
if ( len > max_len)
max_len = len;
xy_long = xy;
end
end
% Highlight the longest line segment by coloring it cyan.
plot(xy_long(:,1),xy_long(:,2),'LineWidth',2,'Color','cyan');
Those are some good questions. Here are my answers for you:
Why is the image rotated before applying Hough Transform?
This I don't believe is MATLAB's "official example". I just took a quick look at the documentation page for the function. I believe you pulled this from another website that we don't have access to. In any case, in general it is not necessary for you to rotate the images prior to using the Hough Transform. The goal of the Hough Transform is to find lines in the image in any orientation. Rotating them should not affect the results. However, if I were to guess the rotation was performed as a preemptive measure because the lines in the "example image" were most likely oriented at a 33 degree angle clockwise. Performing the reverse rotation would make the lines more or less straight.
What do the entries in H represent?
H is what is known as an accumulator matrix. Before we get into what the purpose of H is and how to interpret the matrix, you need to know how the Hough Transform works. With the Hough transform, we first perform an edge detection on the image. This is done using the Canny edge detector in your case. If you recall the Hough Transform, we can parameterize a line using the following relationship:
rho = x*cos(theta) + y*sin(theta)
x and y are points in the image and most customarily they are edge points. theta would be the angle made from the intersection of a line drawn from the origin meeting with the line drawn through the edge point. rho would be the perpendicular distance from the origin to this line drawn through (x, y) at the angle theta.
Note that the equation can yield infinity many lines located at (x, y) so it's common to bin or discretize the total number of possible angles to a predefined amount. MATLAB by default assumes there are 180 possible angles that range from [-90, 90) with a sampling factor of 1. Therefore [-90, -89, -88, ... , 88, 89]. What you generally do is for each edge point, you search over a predefined number of angles, determine what the corresponding rho is. After, we count how many times you see each rho and theta pair. Here's a quick example pulled from Wikipedia:
Source: Wikipedia: Hough Transform
Here we see three black dots that follow a straight line. Ideally, the Hough Transform should determine that these black dots together form a straight line. To give you a sense of the calculations, take a look at the example at 30 degrees. Consulting earlier, when we extend a line where the angle made from the origin to this line is 30 degrees through each point, we find the perpendicular distance from this line to the origin.
Now what's interesting is if you see the perpendicular distance shown at 60 degrees for each point, the distance is more or less the same at about 80 pixels. Seeing this rho and theta pair for each of the three points is the driving force behind the Hough Transform. Also, what's nice about the above formula is that it will implicitly find the perpendicular distance for you.
The process of the Hough Transform is very simple. Suppose we have an edge detected image I and a set of angles theta:
For each point (x, y) in the image:
For each angle A in the angles theta:
Substitute theta into: rho = x*cos(theta) + y*sin(theta)
Solve for rho to find the perpendicular distance
Remember this rho and theta and count up the number of times you see this by 1
So ideally, if we had edge points that follow a straight line, we should see a rho and theta pair where the count of how many times we see this pair is relatively high. This is the purpose of the accumulator matrix H. The rows denote a unique rho value and the columns denote a unique theta value.
An example of this is shown below:
Source: Google Patents
Therefore using an example from this matrix, located at theta between 25 - 30 with a rho of 4 - 4.5, we have found that there are 8 edge points that would be characterized by a line given this rho, theta range pair.
Note that the range of rho is also infinitely many values so you need to not only restrict the range of rho that you have, but you also have to discretize the rho with a sampling interval. The default in MATLAB is 1. Therefore, if you calculate a rho value it will inevitably have floating point values, so you remove the decimal precision to determine the final rho.
For the above example the rho resolution is 0.5, so that means that for example if you calculated a rho value that falls between 2 to 2.5, it falls in the first column. Also note that the theta values are binned in intervals of 5. You traditionally would compute the Hough Transform with a theta sampling interval of 1, then you merge the bins together. However for the defaults of MATLAB, the bin size is 1. This accumulator matrix tells you how many times an edge point fits a particular rho and theta combination. Therefore, if we see many points that get mapped to a particular rho and theta value, this is a great potential for a line to be detected here and that is defined by rho = x*cos(theta) + y*sin(theta).
Why is H(Hough Matrix) of size 45x180? Where does this size come from?
This is a consequence of the previous point. Take note that the largest distance we would expect from the origin to any point in the image is bounded by the diagonal of the image. This makes sense because going from the top left corner to the bottom right corner, or from the bottom left corner to the top right corner would give you the greatest distance expected in the image. In general, this is defined as D = sqrt(rows^2 + cols^2) where rows and cols are the rows and columns of the image.
For the MATLAB defaults, the range of rho is such that it spans from -round(D) to round(D) in steps of 1. Therefore, your rows and columns are both 16, and so D = sqrt(16^2 + 16^2) = 22.45... and so the range of D will span from -22 to 22 and hence this results in 45 unique rho values. Remember that the default resolution of theta goes from [-90, 90) (with steps of 1) resulting in 180 unique angle values. Going with this, we have 45 rows and 180 columns in the accumulator matrix and hence H is 45 x 180.
Why is T of size 1x180? Where does this size come from?
This is an array that tells you all of the angles that were being used in the Hough Transform. This should be an array going from -90 to 89 in steps of 1.
Why is R of size 1x45? Where does this size come from?
This is an array that tells you all of the rho values that were being used in the Hough Transform. This should be an array that spans from -22 to 22 in steps of 1.
What you should take away from this is that each value in H determines how many times we have seen a particular pair of rho and theta such that for R(i) <= rho < R(i + 1) and T(j) <= theta < T(j + 1), where i spans from 1 to 44 and j spans from 1 to 179, this determines how many times we see edge points for a particular range of rho and theta defined previously.
What do the entries in P represent? Are they (x, y) or (ϴ, ρ)?
P is the output of the houghpeaks function. Basically, this determines what the possible lines are by finding where the peaks in the accumulator matrix happen. This gives you the actual physical locations in P where there is a peak. These locations are:
29 162
29 165
28 170
21 5
29 158
Each row gives you a gateway to the rho and theta parameters required to generate the detected line. Specifically, the first line is characterized by rho = R(29) and theta = T(162). The second line is characterized by rho = R(29) and theta = T(165) etc. To answer your question, the values in P are neither (x, y) or (ρ, ϴ). They represent the physical locations in P where cross-referencing R and T, it would give you the parameters to characterize the line that was detected in the image.
Why is the value 5 passed into houghpeaks()?
The extra 5 in houghpeaks returns the total number of lines you'd like to detect ideally. We can see that P is 5 rows, corresponding to 5 lines. If you can't find 5 lines, then MATLAB will return as many lines possible.
What is the logic behind ceil(0.3*max(H(:)))?
The logic behind this is that if you want to determine peaks in the accumulator matrix, you have to define a minimum threshold that would tell you whether the particular rho and theta combination would be considered a valid line. Making this threshold too low would report a lot of false lines and making this threshold too high misses a lot of lines. What they decided to do here was find the largest bin count in the accumulator matrix, take 30% of that, take the mathematical ceiling and any values in the accumulator matrix that are larger than this amount, those would be candidate lines.
Hope this helps!
This is actually more of a theoretical question, but here it goes:
I'm developing an effect audio unit and it needs an equal power crossfade between dry and wet signals.
But I'm confused about the right way to do the mapping function from the linear fader to the scaling factor (gain) for the signal amplitudes of dry and wet streams.
Basically, I'ev seen it done with cos / sin functions or square roots... essentially approximating logarithmic curves. But if our perception of amplitude is logarithmic to start with, shouldn't these curves mapping the fader position to an amplitude actually be exponential?
This is what I mean:
Assumptions:
signal[i] means the ith sample in a signal.
each sample is a float ranging [-1, 1] for amplitudes between [0,1].
our GUI control is an NSSlider ranging from [0,1], so it is in
principle linear.
fader is a variable with the value of the NSSlider.
First Observation:
We perceive amplitude in a logarithmic way. So if we have a linear fader and merely adjust a signal's amplitude by doing: signal[i] * fader what we are perceiving (hearing, regardless of the math) is something along the lines of:
This is the so-called crappy fader-effect: we go from silence to a drastic volume increase across the leftmost segment in the slider and past the middle the volume doesn't seem to get that louder.
So to do the fader "right", we instead either express it in a dB scale and then, as far as the signal is concerned, do: signal[i] * 10^(fader/20) or, if we were to keep or fader units in [0,1], we can do :signal[i] * (.001*10^(3*fader))
Either way, our new mapping from the NSSlider to the fader variable which we'll use for multiplying in our code, looks like this now:
Which is what we actually want, because since we perceive amplitude logarithmically, we are essentially mapping from linear (NSSLider range 0-1) to exponential and feeding this exponential output to our logarithmic perception. And it turns out that : log(10^x)=x so we end up perceiving the amplitude change in a linear (aka correct) way.
Great.
Now, my thought is that an equal-power crossfade between two signals (in this case a dry / wet horizontal NSSlider to mix together the input to the AU and the processed output from it) is essentially the same only that with one slider acting on both hypothetical signals dry[i] and wet[i].
So If my slider ranges from 0 to 100 and dry is full-left and wet is full-right), I'd end up with code along the lines of:
Float32 outputSample, wetSample, drySample = <assume proper initialization>
Float32 mixLevel = .01 * GetParameter(kParameterTypeMixLevel);
Float32 wetPowerLevel = .001 * pow(10, (mixLevel*3));
Float32 dryPowerLevel = .001 * pow(10, ((-3*mixLevel)+1));
outputSample = (wetSample * wetPowerLevel) + (drySample * dryPowerLevel);
The graph of which would be:
And same as before, because we perceive amplitude logarithmically, this exponential mapping should actually make it where we hear the crossfade as linear.
However, I've seen implementations of the crossfade using approximations to log curves. Meaning, instead:
But wouldn't these curves actually emphasize our logarithmic perception of amplitude?
The "equal power" crossfade you're thinking of has to do with keeping the total output power of your mix constant as you fade from wet to dry. Keeping total power constant serves as a reasonable approximation to keeping total perceived loudness constant (which in reality can be fairly complicated).
If you are crossfading between two uncorrelated signals of equal power, you can maintain a constant output power during the crossfade by using any two functions whose squared values sum to 1. A common example of this is the set of functions
g1(k) = ( 0.5 + 0.5*cos(pi*k) )^.5
g2(k) = ( 0.5 - 0.5*cos(pi*k) )^.5,
where 0 <= k <= 1 (note that g1(k)^2 + g2(k)^2 = 1 is satisfied, as mentioned). Here's a proof that this results in a constant power crossfade for uncorrelated signals:
Say we have two signals x1(t) and x2(t) with equal powers E[ x1(t)^2 ] = E[ x2(t)^2 ] = Px, which are also uncorrelated ( E[ x1(t)*x2(t) ] = 0 ). Note that any set of gain functions satisfying the previous condition will have that g2(k) = (1 - g1(k)^2)^.5. Now, forming the sum y(t) = g1(k)*x1(t) + g2(k)*x2(t), we have that:
E[ y(t)^2 ] = E[ (g1(k) * x1(t))^2 + 2*g1(k)*(1 - g1(k)^2)^.5 * x1(t) * x2(t) + (1 - g1(k)^2) * x2(t)^2 ]
= g1(k)^2 * E[ x1(t)^2 ] + 2*g1(k)*(1 - g1(k)^2)^.5 * E[ x1(t)*x2(t) ] + (1 - g1(k)^2) * E[ x2(t)^2 ]
= g1(k)^2 * Px + 0 + (1 - g1(k)^2) * Px = Px,
where we have used that g1(k) and g2(k) are deterministic and can thus be pulled outside the expectation operator E[ ], and that E[ x1(t)*x2(t) ] = 0 by definition because x1(t) and x2(t) are assumed to be uncorrelated. This means that no matter where we are in the crossfade (whatever k we choose) our output will still have the same power, Px, and thus hopefully equal perceived loudness.
Note that for completely correlated signals, you can achieve constant output power by doing a "linear" fade - using and two functions that sum to one ( g1(k) + g2(k) = 1 ). When mixing signals that are somewhat correlated, gain functions between those two would theoretically be appropriate.
What you're thinking of when you say
And same as before, because we perceive amplitude logarithmically,
this exponential mapping should actually make it where we hear the
crossfade as linear.
is that one signal should perceptually decrease in loudness as a linear function of slider position (k), while the other signal should perceptually increase in loudness as a linear function of slider position, when applying your derived crossfade. While your derivation of that seems pretty spot on, unfortunately that may not the best way to blend your dry and wet signals in terms of consistency - often, maintaining equal output loudness, regardless of slider position, is the better thing to shoot for. In any case, it might be worth trying a couple different functions to see what is most usable and consistent.
I'm working on a data mining algorithm where i want to pick a random direction from a particular point in the feature space.
If I pick a random number for each of the n dimensions from [-1,1] and then normalize the vector to a length of 1 will I get an even distribution across all possible directions?
I'm speaking only theoretically here since computer generated random numbers are not actually random.
One simple trick is to select each dimension from a gaussian distribution, then normalize:
from random import gauss
def make_rand_vector(dims):
vec = [gauss(0, 1) for i in range(dims)]
mag = sum(x**2 for x in vec) ** .5
return [x/mag for x in vec]
For example, if you want a 7-dimensional random vector, select 7 random values (from a Gaussian distribution with mean 0 and standard deviation 1). Then, compute the magnitude of the resulting vector using the Pythagorean formula (square each value, add the squares, and take the square root of the result). Finally, divide each value by the magnitude to obtain a normalized random vector.
If your number of dimensions is large then this has the strong benefit of always working immediately, while generating random vectors until you find one which happens to have magnitude less than one will cause your computer to simply hang at more than a dozen dimensions or so, because the probability of any of them qualifying becomes vanishingly small.
You will not get a uniformly distributed ensemble of angles with the algorithm you described. The angles will be biased toward the corners of your n-dimensional hypercube.
This can be fixed by eliminating any points with distance greater than 1 from the origin. Then you're dealing with a spherical rather than a cubical (n-dimensional) volume, and your set of angles should then be uniformly distributed over the sample space.
Pseudocode:
Let n be the number of dimensions, K the desired number of vectors:
vec_count=0
while vec_count < K
generate n uniformly distributed values a[0..n-1] over [-1, 1]
r_squared = sum over i=0,n-1 of a[i]^2
if 0 < r_squared <= 1.0
b[i] = a[i]/sqrt(r_squared) ; normalize to length of 1
add vector b[0..n-1] to output list
vec_count = vec_count + 1
else
reject this sample
end while
There is a boost implementation of the algorithm that samples from normal distributions: random::uniform_on_sphere
I had the exact same question when also developing a ML algorithm.
I got to the same conclusion as Jim Lewis after drawing samples for the 2-d case and plotting the resulting distribution of the angle.
Furthermore, if you try to derive the density distribution for the direction in 2d when you draw at random from [-1,1] for the x- and y-axis ,you will see that:
f_X(x) = 1/(4*cos²(x)) if 0 < x < 45⁰
and
f_X(x) = 1/(4*sin²(x)) if x > 45⁰
where x is the angle, and f_X is the probability density distribution.
I have written about this here:
https://aerodatablog.wordpress.com/2018/01/14/random-hyperplanes/
#define SCL1 (M_SQRT2/2)
#define SCL2 (M_SQRT2*2)
// unitrand in [-1,1].
double u = SCL1 * unitrand();
double v = SCL1 * unitrand();
double w = SCL2 * sqrt(1.0 - u*u - v*v);
double x = w * u;
double y = w * v;
double z = 1.0 - 2.0 * (u*u + v*v);
I am trying to understand the FFT algorithm and so far I think that I understand the main concept behind it. However I am confused as to the difference between 'framesize' and 'window'.
Based on my understanding, it seems that they are redundant with each other? For example, I present as input a block of samples with a framesize of 1024. So I have byte[1024] presented as input.
What then is the purpose of the windowing function? Since initially, I thought the purpose of the windowing function is to select the block of samples from the original data.
Thanks!
What then is the purpose of the windowing function?
It's to deal with so-called "spectral leakage": the FFT assumes an infinite series that repeats the given sample frame over and over again. If you have a sine wave that is an integral number of cycles within the sample frame, then all is good, and the FFT gives you a nice narrow peak at the proper frequency. But if you have a sine wave that is not an integral number of cycles, there's a discontinuity between the last and first sample, and the FFT gives you false harmonics.
Windowing functions lower the amplitudes at the beginning and the end of the sample frame, to reduce the harmonics caused by this discontinuity.
some diagrams from a National Instruments webpage on windowing:
integral # of cycles:
non-integer # of cycles:
for additional information:
http://www.tmworld.com/article/322450-Windowing_Functions_Improve_FFT_Results_Part_I.php
http://zone.ni.com/reference/en-XX/help/371361B-01/lvanlsconcepts/char_smoothing_windows/
http://www.physik.uni-wuerzburg.de/~praktiku/Anleitung/Fremde/ANO14.pdf
A rectangular window of length M has frequency response of sin(ω*M/2)/sin(ω/2), which is zero when ω = 2*π*k/M, for k ≠ 0. For a DFT of length N, where ω = 2*π*n/N, there are nulls at n = k * N/M. The ratio N/M isn't necessarily an integer. For example, if N = 40, and M = 32, then there are nulls at multiples of 1.25, but only the integer multiples will appear in the DFT, which is bins 5, 10, 15, and 20 in this case.
Here's a plot of the 1024-point DFT of a 32-point rectangular window:
M = 32
N = 1024
w = ones(M)
W = rfft(w, N)
K = N/M
nulls = abs(W[K::K])
plot(abs(W))
plot(r_[K:N/2+1:K], nulls, 'ro')
xticks(r_[:512:64])
grid(); axis('tight')
Note the nulls at every N/M = 32 bins. If N=M (i.e. the window length equals the DFT length), then there are nulls at all bins except at n = 0.
When you multiply a window by a signal, the corresponding operation in the frequency domain is the circular convolution of the window's spectrum with the signal's spectrum. For example, the DTFT of a sinusoid is a weighted delta function (i.e. an impulse with infinite height, infinitesimal extension, and finite area) located at the positive and negative frequency of the sinusoid. Convolving a spectrum with a delta function just shifts it to the location of the delta and scales it by the delta's weight. Therefore when you multiply a window by a sinusoid in the sample domain, the window's frequency response is scaled and shifted to the frequency of the sinusoid.
There are a couple of scenarios to examine regarding the length of a rectangular window. First let's look at the case where the window length is an integer multiple of the sinusoid's period, e.g. a 32-sample rectangular window of a cosine with a period of 32/8 = 4 samples:
x1 = cos(2*pi*8*r_[:32]/32) # ω0 = 8π/16, bin 8/32 * 1024 = 256
X1 = rfft(x1 * w, 1024)
plot(abs(X1))
xticks(r_[:513:64])
grid(); axis('tight')
As before, there are nulls at multiples of N/M = 32. But the window's spectrum has been shifted to bin 256 of the sinusoid and scaled by its magnitude, which is 0.5 split between the positive frequency and the negative frequency (I'm only plotting positive frequencies). If the DFT length had been 32, the nulls would line up at every bin, prompting the appearance that there's no leakage. But that misleading appearance is only a function of the DFT length. If you pad the windowed signal with zeros (as above), you'll get to see the sinc-like response at frequencies between the nulls.
Now let's look at a case where the window length is not an integer multiple of the sinusoid's period, e.g. a cosine with an angular frequency of 7.5π/16 (the period is 64 samples):
x2 = cos(2*pi*15*r_[:32]/64) # ω0 = 7.5π/16, bin 15/64 * 1024 = 240
X2 = rfft(x2 * w, 1024)
plot(abs(X2))
xticks(r_[-16:513:64])
grid(); axis('tight')
The center bin location is no longer at an integer multiple of 32, but shifted by a half down to bin 240. So let's see what the corresponding 32-point DFT would look like (inferring a 32-point rectangular window). I'll compute and plot the 32-point DFT of x2[n] and also superimpose a 32x decimated copy of the 1024-point DFT:
X2_32 = rfft(x2, 32)
X2_sample = X2[::32]
stem(r_[:17],abs(X2_32))
plot(abs(X2_sample), 'rs') # red squares
grid(); axis([0,16,0,11])
As you can see in the previous plot, the nulls are no longer aligned at multiples of 32, so the magnitude of the 32-point DFT is non-zero at each bin. In the 32 point DFT, the window's nulls are still spaced every N/M = 32/32 = 1 bin, but since ω0 = 7.5π/16, the center is at 'bin' 7.5, which puts the nulls at 0.5, 1.5, etc, so they're not present in the 32-point DFT.
The general message is that spectral leakage of a windowed signal is always present but can be masked in the DFT if the signal specrtum, window length, and DFT length come together in just the right way to line up the nulls. Beyond that you should just ignore these DFT artifacts and concentrate on the DTFT of your signal (i.e. pad with zeros to sample the DTFT at higher resolution so you can clearly examine the leakage).
Spectral leakage caused by convolving with a window's spectrum will always be there, which is why the art of crafting particularly shaped windows is so important. The spectrum of each window type has been tailored for a specific task, such as dynamic range or sensitivity.
Here's an example comparing the output of a rectangular window vs a Hamming window:
from pylab import *
import wave
fs = 44100
M = 4096
N = 16384
# load a sample of guitar playing an open string 6
# with a fundamental frequency of 82.4 Hz
g = fromstring(wave.open('dist_gtr_6.wav').readframes(-1),
dtype='int16')
L = len(g)/4
g_t = g[L:L+M]
g_t = g_t / float64(max(abs(g_t)))
# compute the response with rectangular vs Hamming window
g_rect = rfft(g_t, N)
g_hamm = rfft(g_t * hamming(M), N)
def make_plot():
fmax = int(82.4 * 4.5 / fs * N) # 4 harmonics
subplot(211); title('Rectangular Window')
plot(abs(g_rect[:fmax])); grid(); axis('tight')
subplot(212); title('Hamming Window')
plot(abs(g_hamm[:fmax])); grid(); axis('tight')
if __name__ == "__main__":
make_plot()
If you don't modify the sample values, and select the same length of data as the FFT length, this is equivalent to using a rectangular window, in which case the frame and the window are identical. However multiplying your input data by a rectangular window in the time domain is the same as convolving the input signal's spectrum with a Sinc function in the frequency domain, which will spread any spectral peaks for frequencies which are not exactly periodic in the FFT aperture across the entire spectrum.
Non-rectangular windows are often used so the the resulting FFT spectrum is convolved with something a bit more "focused" than a Sinc function.
You can also use a rectangular window that is a different size than the FFT length or aperture. In the case of a shorter data window, the FFT frame can be zero padded, which can result in an smoother looking interpolated FFT result spectrum. You can even use a rectangular window that is longer that the length of the FFT by wrapping data around the FFT aperture in a summed circular manner for some interesting effects with the frequency resolution.
ADDED due to a request:
Multiplying by a window in the time domain produces the same result as convolving with the transform of that window in the frequency domain.
In general, a narrower time domain window with produce a wider looking frequency domain transform. This is the reason that zero-padding produces a smoother frequency plot. The narrower time domain window produces a wider Sinc with fatter and smoother curves in relation to the frame width than would a window the full width of the FFT frame, thus making the interpolated frequency results look smoother than an non-zero padded FFT of the same frame length.
The converse is also true to some extent. A wider rectangular window will produce a narrower Sinc, with the nulls closer to the peak. Thus you might be able to use a carefully chosen wider window to produce a narrower looking Sinc to null a frequency closer to a bin of interest than 1 frequency bin away. How do you use a wider window? Wrap the data around and sum, which is identical to using FT basis vectors that are not truncated to 1 FFT frame in length. However, since when doing this the FFT result vector is shorter than the data, this is a lossy process which will introduce artifacts, and introduce some new novel aliasing. But it will give you a sharper frequency selection peak at each bin, and notch filters that can be placed less than 1 bin away, say halfway between bins, etc.
I want to calculate the average of a set of angles, which represents source bearing (0 to 360 deg) - (similar to wind-direction)
I know it has been discussed before (several times). The accepted answer was Compute unit vectors from the angles and take the angle of their average.
However this answer defines the average in a non intuitive way. The average of 0, 0 and 90 will be atan( (sin(0)+sin(0)+sin(90)) / (cos(0)+cos(0)+cos(90)) ) = atan(1/2)= 26.56 deg
I would expect the average of 0, 0 and 90 to be 30 degrees.
So I think it is fair to ask the question again: How would you calculate the average, so such examples will give the intuitive expected answer.
Edit 2014:
After asking this question, I've posted an article on CodeProject which offers a thorough analysis. The article examines the following reference problems:
Given time-of-day [00:00-24:00) for each birth occurred in US in the year 2000 - Calculate the mean birth time-of-day
Given a multiset of direction measurements from a stationary transmitter to a stationary receiver, using a measurement technique with a wrapped normal distributed error – Estimate the direction.
Given a multiset of azimuth estimates between two points, made by “ordinary” humans (assuming to subject to a wrapped truncated normal distributed error) – Estimate the direction.
[Note the OP's question (but not title) appears to have changed to a rather specialised question ("...the average of a SEQUENCE of angles where each successive addition does not differ from the running mean by more than a specified amount." ) - see #MaR comment and mine. My following answer addresses the OP's title and the bulk of the discussion and answers related to it.]
This is not a question of logic or intuition, but of definition. This has been discussed on SO before without any real consensus. Angles should be defined within a range (which might be -PI to +PI, or 0 to 2*PI or might be -Inf to +Inf. The answers will be different in each case.
The word "angle" causes confusion as it means different things. The angle of view is an unsigned quantity (and is normally PI > theta > 0. In that cases "normal" averages might be useful. Angle of rotation (e.g. total rotation if an ice skater) might or might not be signed and might include theta > 2PI and theta < -2PI.
What is defined here is angle = direction whihch requires vectors. If you use the word "direction" instead of "angle" you will have captured the OP's (apparent original) intention and it will help to move away from scalar quantities.
Wikipedia shows the correct approach when angles are defined circularly such that
theta = theta+2*PI*N = theta-2*PI*N
The answer for the mean is NOT a scalar but a vector. The OP may not feel this is intuitive but it is the only useful correct approach. We cannot redefine the square root of -4 to be -2 because it's more initutive - it has to be +-2*i. Similarly the average of bearings -90 degrees and +90 degrees is a vector of zero length, not 0.0 degrees.
Wikipedia (http://en.wikipedia.org/wiki/Mean_of_circular_quantities) has a special section and states (The equations are LaTeX and can be seen rendered in Wikipedia):
Most of the usual means fail on
circular quantities, like angles,
daytimes, fractional parts of real
numbers. For those quantities you need
a mean of circular quantities.
Since the arithmetic mean is not
effective for angles, the following
method can be used to obtain both a
mean value and measure for the
variance of the angles:
Convert all angles to corresponding
points on the unit circle, e.g., α to
(cosα,sinα). That is convert polar
coordinates to Cartesian coordinates.
Then compute the arithmetic mean of
these points. The resulting point will
lie on the unit disk. Convert that
point back to polar coordinates. The
angle is a reasonable mean of the
input angles. The resulting radius
will be 1 if all angles are equal. If
the angles are uniformly distributed
on the circle, then the resulting
radius will be 0, and there is no
circular mean. In other words, the
radius measures the concentration of
the angles.
Given the angles
\alpha_1,\dots,\alpha_n the mean is
computed by
M \alpha = \operatorname{atan2}\left(\frac{1}{n}\cdot\sum_{j=1}^n
\sin\alpha_j,
\frac{1}{n}\cdot\sum_{j=1}^n
\cos\alpha_j\right)
using the atan2 variant of the
arctangent function, or
M \alpha = \arg\left(\frac{1}{n}\cdot\sum_{j=1}^n
\exp(i\cdot\alpha_j)\right)
using complex numbers.
Note that in the OP's question an angle of 0 is purely arbitrary - there is nothing special about wind coming from 0 as opposed to 180 (except in this hemisphere it's colder on the bicycle). Try changing 0,0,90 to 289, 289, 379 and see how the simple arithmetic no longer works.
(There are some distributions where angles of 0 and PI have special significance but they are not in scope here).
Here are some intense previous discussions which mirror the current spread of views :-)
Link
How do you calculate the average of a set of circular data?
http://forums.xkcd.com/viewtopic.php?f=17&t=22435
http://www.allegro.cc/forums/thread/595008
Thank you all for helping me see my problem more clearly.
I found what I was looking for.
It is called Mitsuta method.
The inputs and output are in the range [0..360).
This method is good for averaging data that was sampled using constant sampling intervals.
The method assumes that the difference between successive samples is less than 180 degrees (which means that if we won't sample fast enough, a 330 degrees change in the sampled signal would be incorrectly detected as a 30 degrees change in the other direction and will insert an error into the calculation). Nyquist–Shannon sampling theorem anybody ?
Here is a c++ code:
double AngAvrg(const vector<double>& Ang)
{
vector<double>::const_iterator iter= Ang.begin();
double fD = *iter;
double fSigD= *iter;
while (++iter != Ang.end())
{
double fDelta= *iter - fD;
if (fDelta < -180.) fD+= fDelta + 360.;
else if (fDelta > 180.) fD+= fDelta - 360.;
else fD+= fDelta ;
fSigD+= fD;
}
double fAvrg= fSigD / Ang.size();
if (fAvrg >= 360.) return fAvrg -360.;
if (fAvrg < 0. ) return fAvrg +360.;
return fAvrg ;
}
It is explained on page 51 of Meteorological Monitoring Guidance for Regulatory Modeling Applications (PDF)(171 pp, 02-01-2000, 454-R-99-005)
Thank you MaR for sending the link as a comment.
If the sampled data is constant, but our sampling device has an inaccuracy with a Von Mises distribution, a unit-vectors calculation will be appropriate.
This is incorrect on every level.
Vectors add according to the rules of vector addition. The "intuitive, expected" answer might not be that intuitive.
Take the following example. If I have one unit vector (1, 0), with origin at (0,0) that points in the +x-direction and another (-1, 0) that also has its origin at (0,0) that points in the -x-direction, what should the "average" angle be?
If I simply add the angles and divide by two, I can argue that the "average" is either +90 or -90. Which one do you think it should be?
If I add the vectors according to the rules of vector addition (component by component), I get the following:
(1, 0) + (-1, 0) = (0, 0)
In polar coordinates, that's a vector with zero magnitude and angle zero.
So what should the "average" angle be? I've got three different answers here for a simple case.
I think the answer is that vectors don't obey the same intuition that numbers do, because they have both magnitude and direction. Maybe you should describe what problem you're solving a bit better.
Whatever solution you decide on, I'd advise you to base it on vectors. It'll always be correct that way.
What does it even mean to average source bearings? Start by answering that question, and you'll get closer to being to define what you mean by the average of angles.
In my mind, an angle with tangent equal to 1/2 is the right answer. If I have a unit force pushing me in the direction of the vector (1, 0), another force pushing me in the direction of the vector (1, 0) and third force pushing me in the direction of the vector (0, 1), then the resulting force (the sum of these forces) is the force pushing me in the direction of (1, 2). These the the vectors representing the bearings 0 degrees, 0 degrees and 90 degrees. The angle represented by the vector (1, 2) has tangent equal to 1/2.
Responding to your second edit:
Let's say that we are measuring wind direction. Our 3 measurements were 0, 0, and 90 degrees. Since all measurements are equivalently reliable, why shouldn't our best estimate of the wind direction be 30 degrees? setting it to 25.56 degrees is a bias toward 0...
Okay, here's an issue. The unit vector with angle 0 doesn't have the same mathematical properties that the real number 0 has. Using the notation 0v to represent the vector with angle 0, note that
0v + 0v = 0v
is false but
0 + 0 = 0
is true for real numbers. So if 0v represents wind with unit speed and angle 0, then 0v + 0v is wind with double unit speed and angle 0. And then if we have a third wind vector (which I'll representing using the notation 90v) which has angle 90 and unit speed, then the wind that results from the sum of these vectors does have a bias because it's traveling at twice unit speed in the horizontal direction but only unit speed in the vertical direction.
In my opinion, this is about angles, not vectors. For that reason the average of 360 and 0 is truly 180.
The average of one turn and no turns should be half a turn.
Edit: Equivalent, but more robust algorithm (and simpler):
divide angles into 2 groups, [0-180) and [180-360)
numerically average both groups
average the 2 group averages with proper weighting
if wraparound occurred, correct by 180˚
This works because number averaging works "logically" if all the angles are in the same hemicircle. We then delay getting wraparound error until the very last step, where it is easily detected and corrected. I also threw in some code for handling opposite angle cases. If the averages are opposite we favor the hemisphere that had more angles in it, and in the case of equal angles in both hemispheres we return None because no average would make sense.
The new code:
def averageAngles2(angles):
newAngles = [a % 360 for a in angles];
smallAngles = []
largeAngles = []
# split the angles into 2 groups: [0-180) and [180-360)
for angle in newAngles:
if angle < 180:
smallAngles.append(angle)
else:
largeAngles.append(angle)
smallCount = len(smallAngles)
largeCount = len(largeAngles)
#averaging each of the groups will work with standard averages
smallAverage = sum(smallAngles) / float(smallCount) if smallCount else 0
largeAverage = sum(largeAngles) / float(largeCount) if largeCount else 0
if smallCount == 0:
return largeAverage
if largeCount == 0:
return smallAverage
average = (smallAverage * smallCount + largeAverage * largeCount) / \
float(smallCount + largeCount)
if largeAverage < smallAverage + 180:
# average will not hit wraparound
return average
elif largeAverage > smallAverage + 180:
# average will hit wraparound, so will be off by 180 degrees
return (average + 180) % 360
else:
# opposite angles: return whichever has more weight
if smallCount > largeCount:
return smallAverage
elif smallCount < largeCount:
return largeAverage
else:
return None
>>> averageAngles2([0, 0, 90])
30.0
>>> averageAngles2([30, 350])
10.0
>>> averageAngles2([0, 200])
280.0
Here's a slightly naive algorithm:
remove all oposite angles from the list
take a pair of angles
rotate them to the first and second quadrant and average them
rotate average angle back by same amount
for each remaining angle, average in same way, but with successively increasing weight to the composite angle
some python code (step 1 not implemented)
def averageAngles(angles):
newAngles = [a % 360 for a in angles];
average = 0
weight = 0
for ang in newAngles:
theta = 0
if 0 < ang - average <= 180:
theta = 180 - ang
else:
theta = 180 - average
r_ang = (ang + theta) % 360
r_avg = (average + theta) % 360
average = ((r_avg * weight + r_ang) / float(weight + 1) - theta) % 360
weight += 1
return average
Here's the answer I gave to this same question:
How do you calculate the average of a set of circular data?
It gives answers inline with what the OP says he wants, but attention should be paid to this:
"I would also like to stress that even though this is a true average of angles, unlike the vector solutions, that does not necessarily mean it is the solution you should be using, the average of the corresponding unit vectors may well be the value you actually should to be using."
You are correct that the accepted answer of using traditional average is wrong.
An average of a set of points x_1 ... x_n in a metric space X is an element x in X that minimizes the sum of distances squares to each point (See Frechet mean). If you try to find this minimum using simple calculus with regular real numbers, you will recover the standard "add up and divide by n" formula.
For an angle, our elements are actually points on the unit circle S1. Our metric isn't euclidean distance, but arc length, which is proportional to angle.
So, the average angle is the one that minimizes the square of the angle difference between each other angle. In other words,
if you have a function angleBetween(a, b) you want to find the angle a
such that sum over i of angleBetween(a_i, a) is minimized.
This is an optimization problem which can be solved using a numerical optimizer. Several of the answers here claim to provide simpler closed forms, or at least better approximations.
Statistics
As you point out in your article, you need to assume errors follow a Gaussian distribution to justify using least squares as the maximum likelyhood estimator. So in this application, where is the error? Is the random error in the position of two things, and the angle is just the normal of the line between them? If so, that normal will not follow a Gaussian distribution, even if the error in point position does. Taking means of angles only really makes sense if the random error is observed in the angle itself.
You could do this: Say you have a set of angles in an array angle, then to compute the array first do: angle[i] = angle[i] mod 360, now perform a simple average over the array. So when you have 360, 10, 20, you are averaging 0, 10 and 20 - the results are intuitive.
What is wrong with taking the set of angles as real values and just computing the arithmetic average of those numbers? Then you would get the intuitive (0+0+90)/3 = 30 deg.
Edit: Thanks for useful comments and pointing out that angles may exceed 360. I believe the answer could be the normal arithmetic average reduced "modulo" 360: we sum all the values, divide by the number of angles and then subtract/add a multiple of 360 so that the result lies in the interval [0..360).
I think the problem stems from how you treat angles greater than 180 (and those greater than 360 as well). If you reduce the angles to a range of +180 to -180 before adding them to the total, you get something more reasonable:
int AverageOfAngles(int angles[], int count)
{
int total = 0;
for (int index = 0; index < count; index++)
{
int angle = angles[index] % 360;
if (angle > 180) { angle -= 360; }
total += angle;
}
return (int)((float)total/count);
}
Maybe you could represent angles as quaternions and take average of these quaternions and convert it back to angle.
I don't know If it gives you what you want because quaternions are rather rotations than angles. I also don't know if it will give you anything different from vector solution.
Quaternions in 2D simplify to complex numbers so I guess It's just vectors but maybe some interesting quaternion averaging algorithm like http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20070017872_2007014421.pdf when simplified to 2D will behave better than just vector average.
Here you go! The reference is https://www.wxforum.net/index.php?topic=8660.0
def avgWind(directions):
sinSum = 0
cosSum = 0
d2r = math.pi/180 #degree to radian
r2d = 180/math.pi
for i in range(len(directions)):
sinSum += math.sin(directions[i]*d2r)
cosSum += math.cos(directions[i]*d2r)
return ((r2d*(math.atan2(sinSum, cosSum)) + 360) % 360)
a= np.random.randint(low=0, high=360, size=6)
print(a)
avgWind(a)