continous speech recocnition end point detection

continous speech recocnition end point detection - algorithm

does somebody know the algorithm for end point detection in continuous speech? because I can't find one, the existing algorithm is for isolated word, and not continuous, plis help. If may matlab source code would be helpfull
this is my algorithm
index1=[];
for i=1:length(spektral)
if abs(spektral(i))> 0.025
y(i)=spektral(i);
index1=[index1 i];
else y(i)=0;
end
end
spasi=[];
for i=2:length(index1)-1
if index1(i)>(index1(i-1)+1)
spasi=[spasi ; index1(i-1) index1(i)]; %penentuan spasi antarkata
end
end

The first loop can be omitted completely:
[row,col,val] = find(spektral>0.025);
This will output val the same as you have defined y above. Depending on the size of spektral, either row or col will contain your index1. If spektral is a column vector it will be row, if spektral is a row vector it will be col.
The second loop you can omit as well:
[row,col,val] = find(index1(2:end,:)>index1(1:end-1,:)+1);
Note that index1 will have to be either row or col as output from the first find command.
If I understand correctly, you want to have the spectral energy below the threshold to be considered as noise and want to have more than four seconds of this spectral energy below the threshold to classify it as a silence. In that case:
[row,col,val] = find(spektral<0.025);
tmp = cummin(row); % use cummin(col) if spektral is a row vector
Here I always struggle with find a short, vectorised way to check to subsequent amount of ones in the column, I'll add it when I find the solution.
You can do this with a nested while loop, but there's bound to be a vectorised way:
kk = 1;
while kk<length(tmp)-1
silence1 = 0;
while tmp(kk) = tmp(kk+1)
silence1 = silence1+1; % Sum the length of each silence
kk = kk+1;
end
silence(kk) = silence1;
end
silence(silence1==0)=[]; % Remove zero entries
TotalSilences = (sum(silence>4)); % Find the total number of silences

Related

Algorithm to find matching real values in a list

I have a complex algorithm which calculates the result of a function f(x). In the real world f(x) is a continuous function. However due to rounding errors in the algorithm this is not the case in the computer program. The following diagram gives an example:
Furthermore I have a list of several thousands values Fi.
I am looking for all the x values which meet an Fi value i.e. f(xi)=Fi
I can solve this problem with by simply iterating through the x values like in the following pseudo code:
for i=0 to NumberOfChecks-1 do
begin
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
//loop through the value list to see if the function result matches a value in the list
for j=0 to NumberOfValuesInTheList-1 do
begin
if Abs(FunctionResult-ListValues[j])<Epsilon then
begin
//mark that element j of the list matches
//and store the corresponding x value in the list
end
end
end
Of course it is necessary to use a high number of checks. Otherwise I will miss some x values. The higher the number of checks the more complete and accurate is the result. It is acceptable that the list is 90% or 95% complete.
The problem is that this brute force approach takes too much time. As I mentioned before the algorithm for f(x) is quite complex and with a high number of checks it takes too much time.
What would be a better solution for this problem?

Another way to do this is in two parts: generate all of the results, sort them, and then merge with the sorted list of existing results.
First step is to compute all of the results and save them along with the x value that generated them. That is:
results = list of <x, result>
for i = 0 to numberOfChecks
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
results.Add(x, FunctionResult)
end for
Now, sort the results list by FunctionResult, and also sort the FunctionResult-ListValues array by result.
You now have two sorted lists that you can move through linearly:
i = 0, j = 0;
while (i < results.length && j < ListValues.length)
{
diff = ListValues[j] - results[i];
if (Abs(diff) < Episilon)
{
// mark this one with the x value
// and move to the next result
i = i + 1
}
else if (diff > 0)
{
// list value is much larger than result. Move to next result.
i = i + 1
}
else
{
// list value is much smaller than result. Move to next list value.
j = j + 1
}
}

Sort the list, producing an array SortedListValues that contains
the sorted ListValues and an array SortedListValueIndices that
contains the index in the original array of each entry in
SortedListValues. You only actually need the second of these and
you can create both of them with a single sort by sorting an array
of tuples of (value, index) using value as the sort key.
Iterate over your range in 0..NumberOfChecks-1 and compute the
value of the function at each step, and then use a binary chop
method to search for it in the sorted list.
Pseudo-code:
// sort as described above
SortedListValueIndices = sortIndices(ListValues);
for i=0 to NumberOfChecks-1 do
begin
//calculate the function result with the algorithm
x=i*(xmax-xmin)/NumberOfChecks;
FunctionResult=CalculateFunctionResultWithAlgorithm(x);
// do a binary chop to find the closest element in the list
highIndex = NumberOfValuesInTheList-1;
lowIndex = 0;
while true do
begin
if Abs(FunctionResult-ListValues[SortedListValueIndices[lowIndex]])<Epsilon then
begin
// find all elements in the range that match, breaking out
// of the loop as soon as one doesn't
for j=lowIndex to NumberOfValuesInTheList-1 do
begin
if Abs(FunctionResult-ListValues[SortedListValueIndices[j]])>=Epsilon then
break
//mark that element SortedListValueIndices[j] of the list matches
//and store the corresponding x value in the list
end
// break out of the binary chop loop
break
end
// break out of the loop once the indices match
if highIndex <= lowIndex then
break
// do the binary chop searching, adjusting the indices:
middleIndex = (lowIndex + 1 + highIndex) / 2;
if ListValues[SortedListValueIndices[middleIndex] < FunctionResult then
lowIndex = middleIndex;
else
begin
highIndex = middleIndex;
lowIndex = lowIndex + 1;
end
end
end
Possible complications:
The binary chop isn't taking the epsilon into account. Depending on
your data this may or may not be an issue. If it is acceptable that
the list is only 90 or 95% complete this might be ok. If not then
you'll need to widen the range to take it into account.
I've assumed you want to be able to match multiple x values for each FunctionResult. If that's not necessary you can simplify the code.

Naturally this depends very much on the data, and especially on the numeric distribution of Fi. Another problem is that the f(x) looks very jumpy, eliminating the concept of "assumption of nearby value".
But one could optimise the search.
Picture below.
Walking through F(x) at sufficient granularity, define a rough min
(red line) and max (green line), using suitable tolerance (the "air"
or "gap" in between). The area between min and max is "AREA".
See where each Fi-value hits AREA, do a stacked marking ("MARKING") at X-axis accordingly (can be multiple segments of X).
Where lots of MARKINGs at top of each other (higher sum - the vertical black "sum" arrows), do dense hit tests, hence increasing the overall
chance to get as many hits as possible. Elsewhere do more sparse tests.
Tighten this schema (decrease tolerance) as much as you dare.
EDIT: Fi is a bit confusing. Is it an ordered array or does it have random order (as i assumed)?

Jim Mischel's solution would work in a O(i+j) instead of the O(i*j) solution that you currently have. But, there is a (very) minor bug in his code. The correct code would be :
diff = ListValues[j] - results[i]; //no abs() here
if (abs(diff) < Episilon) //add abs() here
{
// mark this one with the x value
// and move to the next result
i = i + 1
}

the best methods will relay on the nature of your function f(x).
The best solution is if you can create the reversing to F(x) and use it
as you said F(x) is continuous:
therefore you can start evaluating small amount of far points, then find ranges that makes sense, and refine your "assumption" for x that f(x)=Fi
it is not bullet proof, but it is an option.
e.g. Fi=5.7; f(1)=1.4 ,f(4)=4,f(16)=12.6, f(10)=10.1, f(7)=6.5, f(5)=5.1, f(6)=5.8, you can take 5 < x < 7
on the same line as #1, and IF F(x) is hard to calculate, you can use Interpolation, and then evaluate F(x) only at the values that are probable.

How can I make my code faster?

I write this code for solving problem 4- project euler, but it takes too long for giving me the answer.
Is there any trick to make it faster?
function S=Problem4(n)
tic
Interval=10^(n-1):10^(n)-1;
[Product1,Product2]=meshgrid(Interval);
Func=#(X,Y) X*Y;
Temp=cell2mat(arrayfun(Func,Product1,Product2,'UniformOutput',false));
Palindrome=#(X) all(num2str(X)==fliplr(num2str(X)));
Temp2=unique(Temp(:));
S=max(Temp2(arrayfun(Palindrome,Temp2)));
toc
end
and it takes about 39 secs.
Any help would be appreciated.

Only a partial answer here, but a big performance hit is often caused by using strings to handle numbers.
And here you have a function that even does it twice in one line!
First try to get rid of one by saving the intermediate result in a variable. If that saves a significant amount of time it is probably worth removing the other one as well.
Here is my own approach from a few years back. It is not that great, but perhaps it can inspire you.
Note that it does use num2str, but only once and on all relevant numbers at once. In your code you use arrayfun which basically uses a loop internally, and probably results in many calls to num2str.
clear
field = (100:999)'*(100:999);
field = field(:);
fieldstr = num2str(field);
idx = fieldstr(:,1) == fieldstr(:,end);
idx2 = fieldstr(:,2) == fieldstr(:,end-1);
idx3 = fieldstr(:,3) == fieldstr(:,end-2);
list = fieldstr(idx & idx2 & idx3,:);
listnum = str2num(list);
max(listnum)

From Project Euler:
Largest palindrome product
Problem 4
A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99.
Find the largest palindrome made from the product of two 3-digit numbers.
Instead of analyzing your code, I'll give you another way of doing it, which you might find useful. It makes use of vectorization, avoiding arrayfun and anonymous functions, which may be slow:
[n1, n2] = ndgrid(100:999); %// all combinations of 3-digit numbers
pr = n1(:).*n2(:); %// product of each combination
de = dec2base(pr, 10); %// decimal expression of those products
sm = pr<1e5; %// these have 5 figures: initial digit "0" should be disregarded
pa = false(1,numel(pr)); %// this will indicate if each product is palindromic or not
pa(sm) = all(de(sm,2:end) == fliplr(de(sm,2:end)), 2); %// 5-figure palindromic
pa(~sm) = all(de(~sm,:) == fliplr(de(~sm,:)), 2); %// 6-figure palindromic
result = max(pr(pa)); %// find maximum among all products indicated by pa
You can save almost half the time by avoiding duplicate products, as follows. The three new lines are marked:
[n1, n2] = ndgrid(100:999); %// all combinations of 3-digit numbers
un = n1(:)<=n2(:); %// NEW
n1 = n1(un); %// NEW
n2 = n2(un); %// NEW
pr = n1(:).*n2(:); %// product of each combination
de = dec2base(pr, 10); %// decimal expression of those products
sm = pr<1e5; %// these have 5 figures: initial digit "0" should be disregarded
pa = false(1,numel(pr)); %// this will indicate if each product is palindromic or not
pa(sm) = all(de(sm,2:end) == fliplr(de(sm,2:end)), 2); %// 5-figure palindromic
pa(~sm) = all(de(~sm,:) == fliplr(de(~sm,:)), 2); %// 6-figure palindromic
result = max(pr(pa)); %// find maximum among all products indicated by pa

Some Discussion and Solution Code
Since you are looking for the maximum palindrome, after you have collected possible product numbers with that Interval, for every possible number of digits for all the numbers you can iteratively look for the maximum possible number. Thus, with n = 3, you would have from 10000 to 998001 as the products. So, you can look for the maximum palindrome number within the 6 digit numbers first, then go for 5 digits ones and so on. The benefit with such an iterative approach would be that you can get out of the function as soon as you have the max number. Here's the code to fulfil the promises laid in the discussion -
function S = problem4_try1(n)
Interval=10^(n-1):10^(n)-1; %// Define interval definition here
prods = bsxfun(#times,Interval,Interval'); %//'# Or Use: Interval'*Interval
allnums = prods(:);
numd = ceil(log10(allnums)); %// number of digits
dig = sort(unique(numd),'descend'); %// unique digits starting from highest one
for iter = 1:numel(dig)
numd_iter = dig(iter);
numd_iter_halflen = floor(numd_iter/2);
all_crit = allnums(numd==numd_iter); %//all numbers in current iteration
all_crit_dg = dec2base(all_crit,10)-'0'; %// separate digits for a 2D array
all_curit_digits_pal = all_crit(all(all_crit_dg(:,1:numd_iter_halflen) == ...
all_crit_dg(:,end:-1:end-numd_iter_halflen+1) ,2)); %// palindrome matches
%// Find the max of palindrom matches and get out
if ~isempty(all_curit_digits_pal)
S = max(all_curit_digits_pal);
return; %// *** Get Outta Here!!
end
end
Few things about the code itself
bsxfun(#times,Interval,Interval') efficiently gets us the product values, which you have in Temp and as such this must be pretty efficient, as don't have to deal with the intermediate Product1 and Product2.
Because of the iterative nature, it must be efficient enough for higher n's, given the system can handle the pre-processing part of the calculation of the products at the start.

I encourage you to see this link (improving matlab performance), also, i recommend using a "cache/memory" funcion, when you have iterations of calculations you can store the parameters and answers so next time you have the same parameters you just return the stored answer skipping some calculations in the process.
Hope it helps someway, tell me if you have more doubts.

Best way to make a random matrix subject to full rank

I want to make a matrix k x n (k rows and n columns) that its rank is k. My idea is that I will check the current rank of matrix at each generation of column. If the current rank is small than number of current column j, I will make the column again until the rank equals current column. This is my code. However, it work very slowly (due to check rank at every step). Please help me to modify it.
function G=fullRank(k,n)
%% make matrix kxn
j=0;
while(j<n)
d=randi(k,1)
column = [ones(1,d) zeros(1,k-d)];
column = column(randperm(k));
G(:,j)=column';
%% check full rank- Modify here
if((j>=2)&(rank(full(G))<j)&&(j<=k))
%% Set current column of G to zeros
column =zeros(1,k);
G(:,j) = column';
else
j=j+1;
end
end

The probability of your matrix not being full-rank depends on how you choose the random values for its entries, but I guess it is low. In that case, you can save time checking only at the end, and generating the full matrix again if needed:
maxrank = min(k,n); %// precompute to save a little time
G = []; %// this is just to enter the while loop at least once
while rank(G)<maxrank
G = randi(k,k,n); %// replace by your procedure to generate G
end

Matlab - if exists a faster way to assign values to big matrix?

I am a new student learning to use Matlab.
Could anyone please tell me is there a faster way possibly without loops:
to assign for each row only two values 1, -1 into different positions of a big sparse matrix.
My code to build a bimatrix or bibimatrix for the MILP problem of condition :
f^k_{ij} <= y_{ij} for every arc (i,j) and all k ~=r; in a multi-commodity flow model.
Naive approach:
bimatrix=[];
% create each row and then add to bimatrix
newrow4= zeros(1,n*(n+1)^2);
for k=1:n
for i=0:n
for j=1: n
if j~=i
%change value of some positions to -1 and 1
newrow4(i*n^2+(j-1)*n+k)=1;
newrow4((n+1)*n^2+i*n+j)=-1;
% add to bimatrix
bimatrix=[bimatrix; newrow4];
% change newrow4 back to zeros row.
newrow4(i*n^2+(j-1)*n+k)=0;
newrow4((n+1)*n^2+i*n+j)=0;
end
end
end
end
OR:
% Generate the big sparse matrix first.
bibimatrix=zeros(n^3 ,n*(n+1)^2);
t=1;
for k=1:n
for i=0:n
for j=1: n
if j~=i
%Change 2 positions in each row to -1 and 1 in each row.
bibimatrix(t,i*n^2+(j-1)*n+k)=1;
bibimatrix(t,(n+1)*n^2+i*n+j)=-1;
t=t+1
end
end
end
end
With these above code in Matlab, the time to generate this matrix, with n~12, is more than 3s. I need to generate a larger matrix in less time.
Thank you.

Suggestion: Use sparse matrices.
You should be able to create two vectors containing the column number where you want your +1 and -1 in each row. Let's call these two vectors vec_1 and vec_2. You should be able to do this without loops (if not, I still think the procedure below will be faster).
Let the size of your matrix be (max_row X max_col). Then you can create your matrix like this:
bibimatrix = sparse(1:max_row,vec_1,1,max_row,max_col);
bibimatrix = bibimatrix + sparse(1:max_row, vec_2,-1,max_row,max_col)
If you want to see the entire matrix (which you don't, since it's huge) you can write: full(bibimatrix).
EDIT:
You may also do it this way:
col_vec = [vec_1, vec_2];
row_vec = [1:max_row, 1:max_row];
s = [ones(1,max_row), -1*ones(1,max_row)];
bibimatrix = sparse(row_vec, col_vec, s, max_row, max_col)
Disclaimer: I don't have MATLAB available, so it might not be error-free.

matlab code optimization - clustering algorithm KFCG

Background
I have a large set of vectors (orientation data in an axis-angle representation... the axis is the vector). I want to apply a clustering algorithm to. I tried kmeans but the computational time was too long (never finished). So instead I am trying to implement KFCG algorithm which is faster (Kirke 2010):
Initially we have one cluster with the entire training vectors and the codevector C1 which is centroid. In the first iteration of the algorithm, the clusters are formed by comparing first element of training vector Xi with first element of code vector C1. The vector Xi is grouped into the cluster 1 if xi1< c11 otherwise vector Xi is grouped into cluster2 as shown in Figure 2(a) where codevector dimension space is 2. In second iteration, the cluster 1 is split into two by comparing second element Xi2 of vector Xi belonging to cluster 1 with that of the second element of the codevector. Cluster 2 is split into two by comparing the second element Xi2 of vector Xi belonging to cluster 2 with that of the second element of the codevector as shown in Figure 2(b). This procedure is repeated till the codebook size is reached to the size specified by user.
I'm unsure what ratio is appropriate for the codebook, but it shouldn't matter for the code optimization. Also note mine is 3-D so the same process is done for the 3rd dimension.
My code attempts
I've tried implementing the above algorithm into Matlab 2013 (Student Version). Here's some different structures I've tried - BUT take way too long (have never seen it completed):
%training vectors:
Atgood = Nx4 vector (see test data below if want to test);
vecA = Atgood(:,1:3);
roA = size(vecA,1);
%Codebook size, Nsel, is ratio of data
remainFrac2=0.5;
Nseltemp = remainFrac2*roA; %codebook size
%Ensure selected size after nearest power of 2 is NOT greater than roA
if 2^round(log2(Nseltemp)) &lt roA
NselIter = round(log2(Nseltemp));
else
NselIter = ceil(log2(Nseltemp)-1);
end
Nsel = 2^NselIter; %power of 2 - for LGB and other algorithms
MAIN BLOCK TO OPTIMIZE:
%KFCG:
%%cluster = cell(1,Nsel); %Unsure #rows - Don't know how to initialize if need mean...
codevec(1,1:3) = mean(vecA,1);
count1=1;
count2=1;
ind=1;
for kk = 1:NselIter
hh2 = 1:2:size(codevec,1)*2;
for hh1 = 1:length(hh2)
hh=hh2(hh1);
% for ii = 1:roA
% if vecA(ii,ind) &lt codevec(hh1,ind)
% cluster{1,hh}(count1,1:4) = Atgood(ii,:); %want all 4 elements
% count1=count1+1;
% else
% cluster{1,hh+1}(count2,1:4) = Atgood(ii,:); %want all 4
% count2=count2+1;
% end
% end
%EDIT: My ATTEMPT at optimizing above for loop:
repcv=repmat(codevec(hh1,ind),[size(vecA,1),1]);
splitind = vecA(:,ind)&gt=repcv;
splitind2 = vecA(:,ind)&ltrepcv;
cluster{1,hh}=vecA(splitind,:);
cluster{1,hh+1}=vecA(splitind2,:);
end
clear codevec
%Only mean the 1x3 vector portion of the cluster - for centroid
codevec = cell2mat((cellfun(#(x) mean(x(:,1:3),1),cluster,'UniformOutput',false))');
if ind &lt 3
ind = ind+1;
else
ind=1;
end
end
if length(codevec) ~= Nsel
warning('codevec ~= Nsel');
end
Alternatively, instead of cells I thought 3D Matrices would be faster? I tried but it was slower using my method of appending the next row each iteration (temp=[]; for...temp=[temp;new];)
Also, I wasn't sure what was best to loop with, for or while:
%If initialize cell to full length
while length(find(~cellfun('isempty',cluster))) < Nsel
Well, anyways, the first method was fastest for me.
Questions
Is the logic standard? Not in the sense that it matches with the algorithm described, but from a coding perspective, any weird methods I employed (especially with those multiple inner loops) that slows it down? Where can I speed up (you can just point me to resources or previous questions)?
My array size, Atgood, is 1,000,000x4 making NselIter=19; - do I just need to find a way to decrease this size or can the code be optimized?
Should this be asked on CodeReview? If so, I'll move it.
Testing Data
Here's some random vectors you can use to test:
for ii=1:1000 %My size is ~ 1,000,000
omega = 2*rand(3,1)-1;
omega = (omega/norm(omega))';
Atgood(ii,1:4) = [omega,57];
end

Your biggest issue is re-iterating through all of vecA FOR EACH CODEVECTOR, rather than just the ones that are part of the corresponding cluster. You're supposed to split each cluster on it's codevector. As it is, your cluster structure grows and grows, and each iteration is processing more and more samples.
Your second issue is the loop around the comparisons, and the appending of samples to build up the clusters. Both of those can be solved by vectorizing the comparison operation. Oh, I just saw your edit, where this was optimized. Much better. But codevec(hh1,ind) is just a scalar, so you don't even need the repmat.
Try this version:
% (preallocs added in edit)
cluster = cell(1,Nsel);
codevec = zeros(Nsel, 3);
codevec(1,:) = mean(Atgood(:,1:3),1);
cluster{1} = Atgood;
nClusters = 1;
ind = 1;
while nClusters < Nsel
for c = 1:nClusters
lower_cluster_logical = cluster{c}(:,ind) < codevec(c,ind);
cluster{nClusters+c} = cluster{c}(~lower_cluster_logical,:);
cluster{c} = cluster{c}(lower_cluster_logical,:);
codevec(c,:) = mean(cluster{c}(:,1:3), 1);
codevec(nClusters+c,:) = mean(cluster{nClusters+c}(:,1:3), 1);
end
ind = rem(ind,3) + 1;
nClusters = nClusters*2;
end

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

continous speech recocnition end point detection - algorithm

Related

Algorithm to find matching real values in a list

How can I make my code faster?

Best way to make a random matrix subject to full rank

Matlab - if exists a faster way to assign values to big matrix?

matlab code optimization - clustering algorithm KFCG

Categories

Resources