I am running a speech enhancement algorithm based on Gaussian Mixture Model. The problem is that the estimation algorithm underflows during the training processing.
I am trying to calculate the PDF of a log spectrum frame X given a Gaussian cluster which is a product of the PDF of each frequnecy component X_k (fft is done for k=1..256)
what i get is a product of 256 exp(-v(k)) such that v(k)>=0
Here is a snippet of the MATLAB calculation:
N - number of frames; M- number of mixtures; c_i weight for each mixture;
gamma(n,i) = c_i*f(X_n|I = i)
for i=1 : N
rep_DataMat(:,:,i) = repmat(DataMat(:,i),1,M);
gamma_exp(:,:) = (1./sqrt((2*pi*sigmaSqr_curr))).*exp(((-1)*((rep_DataMat(:,:,i) - mue_curr).^2)./(2*sigmaSqr_curr)));
gamma_curr(i,:) = c_curr.*(prod(10*gamma_exp(:,:),1));
alpha_curr(i,:) = gamma_curr(i,:)./sum(gamma_curr(i,:));
end
The product goes quickly to zero due to K = 256 since the numbers being smaller then one. Is there a way I can calculate this with causing an underflow (like logsum or similar)?
You can perform the computations in the log domain.
The conversion of products into sums is straightforward.
Sums on the other hand can be converted with something such as logsumexp.
This works using the formula:
log(a + b) = log(exp(log(a)) + exp(log(b)))
= log(exp(loga) + exp(logb))
Where loga and logb are the respective representation of a and b in the log domain.
The basic idea is then to factorize the exponent with the largest argument (eg. loga for sake of illustration):
log(exp(loga)+exp(logb)) = log(exp(loga)*(1+exp(logb-loga)))
= loga + log(1+exp(logb-loga))
Note that the same idea applies if you have more than 2 terms to add.
Related
Is there a term for leveraging the fact that data is comprised of a few much-repeated values to speed computation?
As an example when trying to compute Sample Entropy on a long discrete sequence (Length=64.000.000.000, Distinct elements = 11, Length of substring=3) I was finding the running time too long (over 10 minutes). I realised that I should be able to make use of the relatively few distinct elements to speed up computation but was unable to find any literature relating to doing this (I suspect because I don't know what to Google).
The algorithm for Sample Entropy involves counting the pairs of substrings that are within a certain tolerance. This was the computationally expensive aspect of the algorithm O(n^2). By taking only the distinct substrings (of which there were at most 1331) I was able to find the pairs of distinct substrings within the tolerance, I then used the counts of each distinct substring to find the total number of pairs of (non-distinct) substrings that are within a certain tolerance. This method substantially sped up my computation.
Do algorithms that make use of the property of relatively few, much-repeated elements have a specific terminology.
def sampen(L, m, r):
N = len(L)
B = 0.0
A = 0.0
# Split time series and save all templates of length m
xmi = np.array([L[i : i + m] for i in range(N - m)])
xmj = np.array([L[i : i + m] for i in range(N - m + 1)])
# Save all matches minus the self-match, compute B
B = np.sum([np.sum(np.abs(xmii - xmj).max(axis=1) <= r) - 1 for xmii in xmi])
# Similar for computing A
m += 1
xm = np.array([L[i : i + m] for i in range(N - m + 1)])
A = np.sum([np.sum(np.abs(xmi - xm).max(axis=1) <= r) - 1 for xmi in xm])
# Return SampEn
return -np.log(A / B)
def sampen2(L, m, r):
N = L.shape[0]
# Split time series and save all templates of length m
xmi = np.array([L[i : i + m] for i in range(N - m)])
xmj = np.array([L[i : i + m] for i in range(N - m + 1)])
# Find the unique subsequences and their counts
uni_xmi, uni_xmi_counts = np.unique(xmi, axis=0, return_counts = True)
uni_xmj, uni_xmj_counts = np.unique(xmj, axis=0, return_counts = True)
# Save all matches minus the self-match, compute B
B = np.sum(np.array([np.sum((np.abs(unii - uni_xmi).max(axis=1) <= r)*uni_xmj_counts)-1 for unii in uni_xmi])*uni_xmi_counts)
# Similar for computing A
m +=1
xm = np.array([L[i: i + m] for i in range(N - m + 1)])
uni_xm, uni_xm_counts= np.unique(xm, axis=0, return_counts = True)
A = np.sum(np.array([np.sum((np.abs(unii - uni_xm).max(axis=1) <= r)*uni_xm_counts)-1 for unii in uni_xm])*uni_xm_counts)
return -np.log(A / B)
It's a broad concept with several related terms.
A common, closely related term is Memoization, wherein the results of computing a subproblem for different inputs are stored, and reused when a previously-seen input is re-encountered. That's slightly different from what you're doing here, since memoization is a form of lazy evaluation where values are recognized opportunistically rather than the code performing an up-front exhaustive enumeration of the inputs which will be processed.
Materialization is also worth mentioning. It's encountered in the context of databases, and refers to the results of a query (a.k.a. tabular processing including possible filtering and/or reduction) being stored for reuse. The active concerns with materialization are largely around long-term considerations like dynamic updates, so it's not a perfect match for a run-and-forget algorithm.
Speaking of 'dynamic', one could also maybe describe this as a form of dynamic programming, with a problem solved by exhaustively enumerating and solving a sequence of subproblems. In dynamic programming, though, one expects those subproblems to have a more regular and inductive form, so I think that one's a stretch.
I would describe the precise strategy here as a sort of "eager memoization", to contrast with the lazy-evaluation assumption normally inherent with memoization.
I have a question related to Fast Fourier transform. I want to calculate the phase and make FFT to draw power spectral density. However when I calculate the frequency f, there are some errors. This is my program code:
n = 1:32768;
T = 0.2*10^-9; % Sampling period
Fs = 1/T; % Sampling frequency
Fn = Fs/2; % Nyquist frequency
omega = 2*pi*200*10^6; % Carrier frequency
L = 32768; % % Length of signal
t = (0:L-1)*T; % Time vector
x_signal(n) = cos(omega*T*n + 0.1*randn(size(n))); % Additive phase noise (random)
y_signal(n) = sin(omega*T*n + 0.1*randn(size(n))); % Additive phase noise (random)
theta(n) = atan(y_signal(n)/x_signal(n));
f = (theta(n)-theta(n-1))/(2*pi)
Y = fft(f,t);
PSD = Y.*conj(Y); % Power Spectral Density
%Fv = linspace(0, 1, fix(L/2)+1)*Fn; % Frequency Vector
As posted, you would get the error
error: subscript indices must be either positive integers less than 2^31 or logicals
which refers to the operation theta(n-1) when n=1 which results in an index of 0 (which is out of bounds since Matlab uses 1-based indexing). To avoid that could use a subset of indices in n:
f = (theta(n(2:end))-theta(n(1:end-1)))/(2*pi);
That said, if you are doing this to try to obtain an instantaneous measure of the frequency, then you will have a few more issues to deal with. The most trivial one is that you should also divide by T. Not as obvious is the fact that as given, theta is a scalar due to the use of the / operator (see Matlab's mrdivide) rather than the ./ operator which performs element-wise division. So a better expression would be:
theta(n) = atan(y_signal(n)./x_signal(n));
Now, the next problem you might notice is that you are actually losing some phase information since the result of atan is [-pi/2,pi/2] instead of the full [-pi,pi] range. To avoid this you should instead be using atan2:
theta(n) = atan2(y_signal(n), x_signal(n));
Even with this, you are likely to notice that the estimated frequency regularly has spikes whenever the phase jumps between near -pi and near pi. This can be avoided by computing the phase difference modulo 2*pi:
f = mod(theta(n(2:end))-theta(n(1:end-1)),2*pi)/(2*pi*T);
A final thing to note: when calling the fft, you should not be passing in a time variable (the input is implicitly assumed to be sampled at regular time intervals). You may however specify the desired length of the FFT. So, you would thus compute Y as follow:
Y = fft(f, L);
And you could then plot the resulting PSD using:
Fv = linspace(0, 1, fix(L/2)+1)*Fn; % Frequency Vector
plot(Fv, abs(PSD(1:L/2+1)));
Consider (a-b)/(c-d) operation, where a,b,c and d are floating-point numbers (namely, double type in C++). Both (a-b) and (c-d) are (sum-correction) pairs, as in Kahan summation algorithm. Briefly, the specific of these (sum-correction) pairs is that sum contains a large value relatively to what's in correction. More precisely, correction contains what didn't fit in sum during summation due to numerical limitations (53 bits of mantissa in double type).
What is the numerically most precise way to calculate (a-b)/(c-d) given the above speciality of the numbers?
Bonus question: it would be better to get the result also as (sum-correction), as in Kahan summation algorithm. So to find (e-f)=(a-b)/(c-d), rather than just e=(a-b)/(c-d) .
The div2 algorithm of Dekker (1971) is a good approach.
It requires a mul12(p,q) algorithm which can exactly computes a pair u+v = p*q. Dekker uses a method known as Veltkamp splitting, but if you have access to an fma function, then a much simpler method is
u = p*q
v = fma(p,q,-u)
the actual division then looks like (I've had to change some of the signs since Dekker uses additive pairs instead of subtractive):
r = a/c
u,v = mul12(r,c)
s = (a - u - v - b + r*d)/c
The the sum r+s is an accurate approximation to (a-b)/(c-d).
UPDATE: The subtraction and addition are assumed to be left-associative, i.e.
s = ((((a-u)-v)-b)+r*d)/c
This works because if we let rr be the error in the computation of r (i.e. r + rr = a/c exactly), then since u+v = r*c exactly, we have that rr*c = a-u-v exactly, so therefore (a-u-v-b)/c gives a fairly good approximation to the correction term of (a-b)/c.
The final r*d arises due to the following:
(a-b)/(c-d) = (a-b)/c * c/(c-d) = (a-b)/c *(1 + d/(c-d))
= [a-b + (a-b)/(c-d) * d]/c
Now r is also a fairly good initial approximation to (a-b)/(c-d) so we substitute that inside the [...], so we find that (a-u-v-b+r*d)/c is a good approximation to the correction term of (a-b)/(c-d)
For tiny corrections, maybe think of
(a - b) / (c - d) = a/b (1 - b/a) / (1 - c/d) ~ a/b (1 - b/a + c/d)
I am writing a MATLAB implemention of the SARSA algorithm, and have successfully writtena one-step implementation.
I am now trying to extend it to use eligibility traces, but the results I obtain are worse than with one-step. (Ie: The algorithm converges at a slower rate and the final path followed by the agent is longer.)
e_trace(action_old, state_old) = e_trace(action_old, state_old) + 1;
% Update weights but only if we are past the first step
if(step > 1)
delta = (reward + discount*qval_new - qval_old);
% SARSA-lambda (Eligibility Traces)
dw = e_trace.*delta;
% One-step SARSA
dw = zeros(actions, states);
dw(action_old, state_old) = delta;
weights = weights + learning_rate*dw;
end
e_trace = discount*decay*e_trace;
Essentially, my q-values are stored in an nXm weights matrix where n = number of actions and m = number of states. Eligibility trace values are stored in the e_trace matrix. According to whether I want to use one-step or ET I use either of the two definitions of dw. I am not sure where I am going wrong. The algorithm is implemented as shown in here: http://webdocs.cs.ualberta.ca/~sutton/book/ebook/node77.html
The
dw = e_trace .* delta
Defines the weight change for all weights in the network (Ie: The change in value for all Q(s,a) pairs), which is then fed into the network adjusted by the learning-rate.
I should add that initially my weights and e-values are set to 0.
Any advice?
I want to make a linear fit to few data points, as shown on the image. Since I know the intercept (in this case say 0.05), I want to fit only points which are in the linear region with this particular intercept. In this case it will be lets say points 5:22 (but not 22:30).
I'm looking for the simple algorithm to determine this optimal amount of points, based on... hmm, that's the question... R^2? Any Ideas how to do it?
I was thinking about probing R^2 for fits using points 1 to 2:30, 2 to 3:30, and so on, but I don't really know how to enclose it into clear and simple function. For fits with fixed intercept I'm using polyfit0 (http://www.mathworks.com/matlabcentral/fileexchange/272-polyfit0-m) . Thanks for any suggestions!
EDIT:
sample data:
intercept = 0.043;
x = 0.01:0.01:0.3;
y = [0.0530642513911393,0.0600786706929529,0.0673485248329648,0.0794662409166333,0.0895915873196170,0.103837395346484,0.107224784565365,0.120300492775786,0.126318699218730,0.141508831492330,0.147135757370947,0.161734674733680,0.170982455701681,0.191799936622712,0.192312642057298,0.204771365716483,0.222689541632988,0.242582251060963,0.252582727297656,0.267390860166283,0.282890010610515,0.292381165948577,0.307990544720676,0.314264952297699,0.332344368808024,0.355781519885611,0.373277721489254,0.387722683944356,0.413648156978284,0.446500064130389;];
What you have here is a rather difficult problem to find a general solution of.
One approach would be to compute all the slopes/intersects between all consecutive pairs of points, and then do cluster analysis on the intersepts:
slopes = diff(y)./diff(x);
intersepts = y(1:end-1) - slopes.*x(1:end-1);
idx = kmeans(intersepts, 3);
x([idx; 3] == 2) % the points with the intersepts closest to the linear one.
This requires the statistics toolbox (for kmeans). This is the best of all methods I tried, although the range of points found this way might have a few small holes in it; e.g., when the slopes of two points in the start and end range lie close to the slope of the line, these points will be detected as belonging to the line. This (and other factors) will require a bit more post-processing of the solution found this way.
Another approach (which I failed to construct successfully) is to do a linear fit in a loop, each time increasing the range of points from some point in the middle towards both of the endpoints, and see if the sum of the squared error remains small. This I gave up very quickly, because defining what "small" is is very subjective and must be done in some heuristic way.
I tried a more systematic and robust approach of the above:
function test
%% example data
slope = 2;
intercept = 1.5;
x = linspace(0.1, 5, 100).';
y = slope*x + intercept;
y(1:12) = log(x(1:12)) + y(12)-log(x(12));
y(74:100) = y(74:100) + (x(74:100)-x(74)).^8;
y = y + 0.2*randn(size(y));
%% simple algorithm
[X,fn] = fminsearch(#(ii)P(ii, x,y,intercept), [0.5 0.5])
[~,inds] = P(X, y,x,intercept)
end
function [C, inds] = P(ii, x,y,intercept)
% ii represents fraction of range from center to end,
% So ii lies between 0 and 1.
N = numel(x);
n = round(N/2);
ii = round(ii*n);
inds = min(max(1, n+(-ii(1):ii(2))), N);
% Solve linear system with fixed intercept
A = x(inds);
b = y(inds) - intercept;
% and return the sum of squared errors, divided by
% the number of points included in the set. This
% last step is required to prevent fminsearch from
% reducing the set to 1 point (= minimum possible
% squared error).
C = sum(((A\b)*A - b).^2)/numel(inds);
end
which only finds a rough approximation to the desired indices (12 and 74 in this example).
When fminsearch is run a few dozen times with random starting values (really just rand(1,2)), it gets more reliable, but I still wouln't bet my life on it.
If you have the statistics toolbox, use the kmeans option.
Depending on the number of data values, I would split the data into a relative small number of overlapping segments, and for each segment calculate the linear fit, or rather the 1-st order coefficient, (remember you know the intercept, which will be same for all segments).
Then, for each coefficient calculate the MSE between this hypothetical line and entire dataset, choosing the coefficient which yields the smallest MSE.