Increase speed of array population when using nested for loops - matlab

Increase speed of array population when using nested for loops - matlab - performance

I have written this function to find indices based on certain criteria. It should work, the problem is that it will take 2-3 days to run on my pc. Is there any way to get it down below an hour (or faster at all) ? This really doesn't need to be very fast. But 2 days is unacceptably slow.
I don't expect an in depth analysis on the function (Though it would be nice). Just some general improvements.
All it essentially is is 3 for-loops used to populate 8 large 3d arrays using another 256x8 matrix Logic. Then a few logic tests to find the desired index.
%These are sample values from the g.u.i. and other functions -
%ignore up til the loops unless you need it to understand something in the loops.
PriceMat=[58867 55620 16682 97384 11660 18175 25896 16300];
CapMat=[1400 1200 450 3600 150 1330 2000 250];
RepMat=[58 53 31 127 15 164 242 27];
DesiredRep=293.04;
DesiredCap=2600;
prevmin=99999999;
P=perms(0:7);
D=zeros(256,8,40320);
Cap=zeros(size(D,3),8);
Rep=zeros(size(D,3),8);
Price=zeros(size(D,3),8);
SufRep=zeros(1,size(D,3));
SufCap=zeros(1,size(D,3));
CapTot=zeros(1,size(D,3));
RepTot=zeros(1,size(D,3));
PriceTot=zeros(1,size(D,3));
for i=1:40320
for x=1:8
for j=1:256
D(j,x,i)=P(i,x)*Logic(j,x);
Cap(i,x)=D(j,x,i)*CapMat(x);
Price(i,x)=D(j,x,i)*PriceMat(x);
Rep(i,x)=D(j,x,i)*RepMat(x);
CapTot=sum(Cap,2);
RepTot=sum(Rep,2);
PriceTot=sum(Price,2);
if CapTot(i)>=DesiredCap
SufCap(i)=true;
else
SufCap(i)=false;
end
if RepTot(i)>=DesiredRep
SufRep(i)=true;
else
SufRep(i)=false;
end
if SufRep(i)==true && SufCap(i)==true
if PriceTot(i)<=prevmin
prevmin=i;
end
end
end
end
end
return prevmin

use bsxfun it's so much FUN!
Here's how you can compute matrix D in a single line (no loops):
D = bsxfun( #times, permute( P, [3 2 1] ), Logic );
I guess you can take it from here and compute the rest of the matrices this way - no loops.

You said "it would be nice" to get some in depth analysis of your function. It's pretty complex -and can be greatly simplified. I am a little bit worried about the amount of memory that my solution would take - one of your 256x8x40320 arrays is about 660 MB, and I create four. If that's not a problem, great. Otherwise you might have to choose a more conservative data type to keep memory requirements down - if you start swapping to disk you are dead, timing wise.
So let's assume you are not limited by RAM, then the following will speed things up considerably (note - I am stealing Shai's suggestion to use bsxfun). Note also that I am clearing the "really big" arrays after taking their sum - this could all be done in one line but it would be even harder for you to follow:
D = bsxfun( #times, permute( P, [ 3 2 1] ), Logic );
Cap = bsxfun( #times, D, CapMat );
CapTot = sum( Cap, 2 );
clear Cap
Price = bsxfun( #times, PriceMat );
PriceTot = sum( Price, 2 );
clear Price
Rep = bsxfun( #times, D, RepMat ); % <<<<< STRONGLY recommend not to use RepMat -
% <<<<< to avoid confusion with built in repmat()
RepTot = sum( Rep, 2 );
clear Rep
CapRepOK = ( CapTot >= DesiredCap && RepTot >= DesiredRep ); % logical array - fast, small
[minPrice minPriceInd ] = min(PriceTot(CapRepOK)); % find minimum value and index
% convert index to correct value of `i` in original code:
cs = cumsum(ok(:)); % increases by one for every value that meets criteria
% but we need to know which original index that corresponds to:
possibleValues = find( cs == minPriceInd );
[j i] = ind2sub( size(CapRepOK), possibleValues(1) );
prevmin = i;
Obviously I don't have your data so it's a bit hard to be sure this replicates your functionality exactly - but I believe it does. If not - that's what comments are for.
I suspect it is possible never to create the largest arrays (D etc) with some careful thought - if you are truly memory starved that may be needed.

Related

Julia: FAST way of calculating the smallest distances between two sets of points

I have 5000 3D points in a Matrix A and another 5000 3D point in a matrix B.
For each point in A i want to find the smallest distance to a point in B. These distances should be stored in an array with 5000 entries.
So far I have this solution, running in about 0.145342 seconds (23 allocations: 191.079 MiB). How can I improve this further?
using Distances
A = rand(5000, 3)
B = rand(5000, 3)
mis = #time minimum(Distances.pairwise(SqEuclidean(), A, B, dims=1), dims=2)

This is a standard way to do it as it will have a better time complexity (especially for larger data):
using NearestNeighbors
nn(KDTree(B'; leafsize = 10), A')[2] .^ 2
Two comments:
by default Euclidean distance is computed (so I square it)
by default NearestNeigbors.jl assumes observations are stored in columns (so I need B' and A' in the solution; if your original data were transposed it would not be needed; the reason why it is designed this way is that Julia uses column major matrix storage)

Generating a big distance matrix using Distances.pairwise(SqEuclidean(), A, B, dims=1) is not efficient because the main memory is pretty slow nowadays compared to CPU caches and the computing power of modern CPUs and this is not gonna be better any time soon (see "memory wall"). It is faster to compute the minimum on-the-fly using two basic nested for loops. Additionally, one can use multiple cores to compute this faster using multiple threads.
function computeMinDist(A, B)
n, m = size(A, 1), size(B, 1)
result = zeros(n)
Threads.#threads for i = 1:n
minSqDist = Inf
#inbounds for j = 1:m
dx = A[i,1] - B[j,1]
dy = A[i,2] - B[j,2]
dz = A[i,3] - B[j,3]
sqDist = dx*dx + dy*dy + dz*dz
if sqDist < minSqDist
minSqDist = sqDist
end
end
result[i] = minSqDist
end
return result
end
mis = #time computeMinDist(A, B)
Note the Julia interpreter uses 1 thread by default but this can be tuned using the environment variable JULIA_NUM_THREADS=auto or just by running it using the flag --threads=auto. See the multi-threading documentation for more information.
Performance results
Here are performance results on my i5-9600KF machine with 6 cores (with two 5000x3 matrices):
Initial implementation: 93.4 ms
This implementation: 4.4 ms
This implementation is thus 21 times faster.
Results are the same to few ULP.
Note the code can certainly be optimized further using loop tiling, and possibly by transposing A and B so the JIT can generate a more efficient implementation using SIMD instructions.

How to speed up this matlab code which is already vectorized

I'm trying to speed up steps 1-4 in the following code (the rest is setup that will be predetermined for my actual problem.)
% Given sizes:
m = 200;
n = 1e8;
% Given vectors:
value_vector = rand(m, 1);
index_vector = randi([0 200], n, 1);
% Objective: Determine the values for the values_grid based on indices provided by index_grid, which
% correspond to the indices of the value in value_vector
% 0. Preallocate
values = zeros(n, 1);
% 1. Remove "0" indices since these won't have values assigned
nonzero_inds = (index_vector ~= 0);
% 2. Examine only nonzero indices
value_inds = index_vector(nonzero_inds);
% 3. Get the values for these indices
nonzero_values = value_vector(value_inds);
% 4. Assign values to output (0 for those with 0 index)
values(nonzero_inds) = nonzero_values;
Here's my analysis of these portions of the code:
Necessary since the index_vector will contain zeros which need to be ferreted out. O(n) since it's just a matter of going through the vector one element at a time and checking (value ∨ 0)
Should be O(n) to go through index_vector and retain those that are nonzero from the previous step
Should be O(n) since we have to check each nonzero index_vector element, and for each element we access the value_vector which is O(1).
Should be O(n) to go through each element of nonzero_inds, access corresponding values index, access the corresponding nonzero_values element, and assign it to the values vector.
The code above takes about 5 seconds to run through steps 1-4 on 4 cores, 3.8GHz. Do you all have any ideas on how this could be sped up? Thanks.

Wow, I found something really interesting. I saw this link in the "related" section about indexing vectors being inefficient in Matlab sometimes, so I decided to try a for loop. This code ended up being an order of magnitude faster!
for i = 1:n
if index_vector(i) > 0
values(i) = value_vector(index_vector(i));
end
end
EDIT: Another interesting thing, unfortunately detrimental to my problem though. The speed of this solution depends on the amount of zeros in the index_vector. With index_vector = randi([0 200]);, a small proportion of the values are zeros, but if I try index_vector = randi([0 1]), approximately half of the values will be zero and then the above for loop is actually an order of magnitude slower. However, using ~= instead of > speeds the loop back up so that it's on a similar order of magnitude. Very interesting and odd behavior.

if you stick to matlab and the flow of the algorithm you want , and not doing this in fortran or c, here's a small start:
change the randi to rand, and round by casting to uint8 and use the > logical operation that for some reason is faster at my end
to sum up:
value_vector = rand(m, 1 );
index_vector = uint8(-0.5+201*rand(n,1) );
values = zeros(n, 1);
values=value_vector(index_vector(index_vector>0));
this improved at my end by a factor 1.6

Huge memory allocation running a julia function?

I try to run the following function in julia command, but when timing the function I see too much memory allocations which I can't figure out why.
function pdpf(L::Int64, iters::Int64)
snr_dB = -10
snr = 10^(snr_dB/10)
Pf = 0.01:0.01:1
thresh = rand(100)
Pd = rand(100)
for m = 1:length(Pf)
i = 0
for k = 1:iters
n = randn(L)
s = sqrt(snr) * randn(L)
y = s + n
energy_fin = (y'*y) / L
#inbounds thresh[m] = erfcinv(2Pf[m]) * sqrt(2/L) + 1
if energy_fin[1] >= thresh[m]
i += 1
end
end
#inbounds Pd[m] = i/iters
end
#thresh = erfcinv(2Pf) * sqrt(2/L) + 1
#Pd_the = 0.5 * erfc(((thresh - (snr + 1)) * sqrt(L)) / (2*(snr + 1)))
end
Running that function in the julia command on my laptop, I get the following shocking numbers:
julia> #time pdpf(1000, 10000)
17.621551 seconds (9.00 M allocations: 30.294 GB, 7.10% gc time)
What is wrong with my code? Any help is appreciated.

I don't think this memory allocation is so surprising. For instance, consider all of the times that the inner loop gets executed:
for m = 1:length(Pf) this gives you 100 executions
for k = 1:iters this gives you 10,000 executions based on the arguments you supply to the function.
randn(L) this gives you a random vector of length 1,000, based on the arguments you supply to the function.
Thus, just considering these, you've got 100*10,000*1000 = 1 billion Float64 random numbers being generated. Each one of them takes 64 bits = 8 bytes. I.e. 8GB right there. And, you've got two calls to randn(L) which means that you're at 16GB allocations already.
You then have y = s + n which means another 8GB allocations, taking you up to 24GB. I haven't looked in detail on the remaining code to get you from 24GB to 30GB allocations, but this should show you that it's not hard for the GB allocations to start adding up in your code.
If you're looking at places to improve, I'll give you a hint that these lines can be improved by using the properties of normal random variables:
n = randn(L)
s = sqrt(snr) * randn(L)
y = s + n
You should easily be able to cut down the allocations here from 24GB to 8GB in this way. Note that y will be a normal random variable here as you've defined it, and think up a way to generate a normal random variable with an identical distribution to what y has now.
Another small thing, snr is a constant inside your function. Yet, you keep taking its sqrt 1 million separate times. In some settings, 'checking your work' can be helpful, but I think that you can be confident the computer will get it right the first time and thus you don't need to make it keep re-doing this calculation ; ). There are other similar places you can improve your code to avoid duplicate computations here that I'll leave to you to locate.

aireties gives a good answer for why you have so many allocations. You can do more to reduce the number of allocations. Using this property we know that y = s+n is really y = sqrt(snr) * randn(L) + randn(L) and so we can instead do y = rvvar*randn(L) where rvvar= sqrt(1+sqrt(snr)^2) is defined outside the loop (thanks for the fix!). This will halve the number of random variables needed.
Outside the loop you can save sqrt(2/L) to cut down a little bit of time.
I don't think transpose is special-cased yet, so try using dot(y,y) instead of y'*y. I know dot for sure is just a loop without having to transpose, while the other may transpose depending on the version of Julia.
Something that would help performance (but not allocations) would be to use one big randn(L,iters) and loop through that. The reason is because if you make all of your random numbers all at once it's faster since it can use SIMD and a bunch of other goodies. If you want to implicitly do that without changing your code much, you can use ChunkedArrays.jl where you can use rands = ChunkedArray(randn,L) to initialize it and then everytime you want a randn(L), you instead use next(rands). Inside the ChunkedArray it actually makes bigger vectors and replenishes them as needed, but like this you can just get your randn(L) without having to keep track of all of that.
Edit:
ChunkedArrays probably only save time when L is smaller. This gives the code:
function pdpf(L::Int64, iters::Int64)
snr_dB = -10
snr = 10^(snr_dB/10)
Pf = 0.01:0.01:1
thresh = rand(100)
Pd = rand(100)
rvvar= sqrt(1+sqrt(snr)^2)
for m = 1:length(Pf)
i = 0
for k = 1:iters
y = rvvar*randn(L)
energy_fin = (y'*y) / L
#inbounds thresh[m] = erfcinv(2Pf[m]) * sqrt(2/L) + 1
if energy_fin[1] >= thresh[m]
i += 1
end
end
#inbounds Pd[m] = i/iters
end
end
which runs in half the time as using two randn calls. Indeed from the ProfileViewer we get:
#profile pdpf(1000, 10000)
using ProfileView
ProfileView.view()
I circled the two parts for the line y = rvvar*randn(L), so the vast majority of the time is random number generation. Last time I checked you could still get a decent speedup on random number generation by changing to to VSL.jl library, but you need MKL linked to your Julia build. Note that from the Google Summer of Code page you can see that there is a project to make a repo RNG.jl with faster psudo-rngs. It looks like it already has a few new ones implemented. You may want to check them out and see if they give speedups (or help out with that project!)

matlab code optimization - clustering algorithm KFCG

Background
I have a large set of vectors (orientation data in an axis-angle representation... the axis is the vector). I want to apply a clustering algorithm to. I tried kmeans but the computational time was too long (never finished). So instead I am trying to implement KFCG algorithm which is faster (Kirke 2010):
Initially we have one cluster with the entire training vectors and the codevector C1 which is centroid. In the first iteration of the algorithm, the clusters are formed by comparing first element of training vector Xi with first element of code vector C1. The vector Xi is grouped into the cluster 1 if xi1< c11 otherwise vector Xi is grouped into cluster2 as shown in Figure 2(a) where codevector dimension space is 2. In second iteration, the cluster 1 is split into two by comparing second element Xi2 of vector Xi belonging to cluster 1 with that of the second element of the codevector. Cluster 2 is split into two by comparing the second element Xi2 of vector Xi belonging to cluster 2 with that of the second element of the codevector as shown in Figure 2(b). This procedure is repeated till the codebook size is reached to the size specified by user.
I'm unsure what ratio is appropriate for the codebook, but it shouldn't matter for the code optimization. Also note mine is 3-D so the same process is done for the 3rd dimension.
My code attempts
I've tried implementing the above algorithm into Matlab 2013 (Student Version). Here's some different structures I've tried - BUT take way too long (have never seen it completed):
%training vectors:
Atgood = Nx4 vector (see test data below if want to test);
vecA = Atgood(:,1:3);
roA = size(vecA,1);
%Codebook size, Nsel, is ratio of data
remainFrac2=0.5;
Nseltemp = remainFrac2*roA; %codebook size
%Ensure selected size after nearest power of 2 is NOT greater than roA
if 2^round(log2(Nseltemp)) &lt roA
NselIter = round(log2(Nseltemp));
else
NselIter = ceil(log2(Nseltemp)-1);
end
Nsel = 2^NselIter; %power of 2 - for LGB and other algorithms
MAIN BLOCK TO OPTIMIZE:
%KFCG:
%%cluster = cell(1,Nsel); %Unsure #rows - Don't know how to initialize if need mean...
codevec(1,1:3) = mean(vecA,1);
count1=1;
count2=1;
ind=1;
for kk = 1:NselIter
hh2 = 1:2:size(codevec,1)*2;
for hh1 = 1:length(hh2)
hh=hh2(hh1);
% for ii = 1:roA
% if vecA(ii,ind) &lt codevec(hh1,ind)
% cluster{1,hh}(count1,1:4) = Atgood(ii,:); %want all 4 elements
% count1=count1+1;
% else
% cluster{1,hh+1}(count2,1:4) = Atgood(ii,:); %want all 4
% count2=count2+1;
% end
% end
%EDIT: My ATTEMPT at optimizing above for loop:
repcv=repmat(codevec(hh1,ind),[size(vecA,1),1]);
splitind = vecA(:,ind)&gt=repcv;
splitind2 = vecA(:,ind)&ltrepcv;
cluster{1,hh}=vecA(splitind,:);
cluster{1,hh+1}=vecA(splitind2,:);
end
clear codevec
%Only mean the 1x3 vector portion of the cluster - for centroid
codevec = cell2mat((cellfun(#(x) mean(x(:,1:3),1),cluster,'UniformOutput',false))');
if ind &lt 3
ind = ind+1;
else
ind=1;
end
end
if length(codevec) ~= Nsel
warning('codevec ~= Nsel');
end
Alternatively, instead of cells I thought 3D Matrices would be faster? I tried but it was slower using my method of appending the next row each iteration (temp=[]; for...temp=[temp;new];)
Also, I wasn't sure what was best to loop with, for or while:
%If initialize cell to full length
while length(find(~cellfun('isempty',cluster))) < Nsel
Well, anyways, the first method was fastest for me.
Questions
Is the logic standard? Not in the sense that it matches with the algorithm described, but from a coding perspective, any weird methods I employed (especially with those multiple inner loops) that slows it down? Where can I speed up (you can just point me to resources or previous questions)?
My array size, Atgood, is 1,000,000x4 making NselIter=19; - do I just need to find a way to decrease this size or can the code be optimized?
Should this be asked on CodeReview? If so, I'll move it.
Testing Data
Here's some random vectors you can use to test:
for ii=1:1000 %My size is ~ 1,000,000
omega = 2*rand(3,1)-1;
omega = (omega/norm(omega))';
Atgood(ii,1:4) = [omega,57];
end

Your biggest issue is re-iterating through all of vecA FOR EACH CODEVECTOR, rather than just the ones that are part of the corresponding cluster. You're supposed to split each cluster on it's codevector. As it is, your cluster structure grows and grows, and each iteration is processing more and more samples.
Your second issue is the loop around the comparisons, and the appending of samples to build up the clusters. Both of those can be solved by vectorizing the comparison operation. Oh, I just saw your edit, where this was optimized. Much better. But codevec(hh1,ind) is just a scalar, so you don't even need the repmat.
Try this version:
% (preallocs added in edit)
cluster = cell(1,Nsel);
codevec = zeros(Nsel, 3);
codevec(1,:) = mean(Atgood(:,1:3),1);
cluster{1} = Atgood;
nClusters = 1;
ind = 1;
while nClusters < Nsel
for c = 1:nClusters
lower_cluster_logical = cluster{c}(:,ind) < codevec(c,ind);
cluster{nClusters+c} = cluster{c}(~lower_cluster_logical,:);
cluster{c} = cluster{c}(lower_cluster_logical,:);
codevec(c,:) = mean(cluster{c}(:,1:3), 1);
codevec(nClusters+c,:) = mean(cluster{nClusters+c}(:,1:3), 1);
end
ind = rem(ind,3) + 1;
nClusters = nClusters*2;
end

finding the best/ scale/shift between two vectors

I have two vectors that represents a function f(x), and another vector f(ax+b) i.e. a scaled and shifted version of f(x). I would like to find the best scale and shift factors.
*best - by means of least squares error , maximum likelihood, etc.
any ideas?
for example:
f1 = [0;0.450541598502498;0.0838213779969326;0.228976968716819;0.91333736150167;0.152378018969223;0.825816977489547;0.538342435260057;0.996134716626885;0.0781755287531837;0.442678269775446;0];
f2 = [-0.029171964726699;-0.0278570165494982;0.0331454732535324;0.187656956432487;0.358856370923984;0.449974662483267;0.391341738643094;0.244800719791534;0.111797007617227;0.0721767235173722;0.0854437239807415;0.143888234591602;0.251750993723227;0.478953530572365;0.748209818420035;0.908044924557262;0.811960826711455;0.512568916956487;0.22669198638799;0.168136111568694;0.365578085161896;0.644996661336714;0.823562159983554;0.792812945867018;0.656803251999341;0.545799498053254;0.587013303815021;0.777464637372241;0.962722388208354;0.980537136457874;0.734416947254272;0.375435649393553;0.106489547770962;0.0892376361668696;0.242467741982851;0.40610516900965;0.427497319032133;0.301874099075184;0.128396341665384;0.00246347624097456;-0.0322120242872125]
*note that f(x) may be irreversible...
Thanks,
Ohad

For each f(x), take the absolute value of f(x) and normalize it such that it can be considered a probability mass function over its support. Calculate the expected value E[x] and variance of Var[x]. Then, we have that
E[a x + b] = a E[x] + b
Var[a x + b] = a^2 Var[x]
Use the above equations and the known values of E[x] and Var[x] to calculate a and b. Taking your values of f1 and f2 from your example, the following Octave script performs this procedure:
% Octave script
% f1, f2 are defined as given in your example
f1 = [zeros(length(f2) - length(f1), 1); f1];
save_f1 = f1; save_f2 = f2;
f1 = abs( f1 ); f2 = abs( f2 );
f1 = f1 ./ sum( f1 ); f2 = f2 ./ sum( f2 );
mean = #(x)sum(((1:length(x))' .* x));
var = #(x)sum((((1:length(x))'-mean(x)).^2) .* x);
m1 = mean(f1); m2 = mean(f2);
v1 = var(f1); v2 = var(f2)
a = sqrt( v2 / v1 ); b = m2 - a * m1;
plot( a .* (1:length( save_f1 )) + b, save_f1, ...
1:length( save_f2 ), save_f2 );
axis([0 length( save_f1 )];
And the output is

Here's a simple, effective, but perhaps somewhat naive approach.
First make sure you make a generic interpolator through both functions. That way you can evaluate both functions in between the given data points. I used a cubic-splines interpolator, since that seems general enough for the type of smooth functions you provided (and does not require additional toolboxes).
Then you evaluate the source function ("original") at a large number of points. Use this number also as a parameter in an inline function, that takes as input X, where
X = [a b]
(as in ax+b). For any input X, this inline function will compute
the function values of the target function at the same x-locations, but then scaled and offset by a and b, respectively.
The sum of the squared-differences between the resulting function values, and the ones of the source function you computed earlier.
Use this inline function in fminsearch with some initial estimate (one that you have obtained visually or by via automatic means). For the example you provided, I used a few random ones, which all converged to near-optimal fits.
All of the above in code:
function s = findScaleOffset
%% initialize
f2 = [0;0.450541598502498;0.0838213779969326;0.228976968716819;0.91333736150167;0.152378018969223;0.825816977489547;0.538342435260057;0.996134716626885;0.0781755287531837;0.442678269775446;0];
f1 = [-0.029171964726699;-0.0278570165494982;0.0331454732535324;0.187656956432487;0.358856370923984;0.449974662483267;0.391341738643094;0.244800719791534;0.111797007617227;0.0721767235173722;0.0854437239807415;0.143888234591602;0.251750993723227;0.478953530572365;0.748209818420035;0.908044924557262;0.811960826711455;0.512568916956487;0.22669198638799;0.168136111568694;0.365578085161896;0.644996661336714;0.823562159983554;0.792812945867018;0.656803251999341;0.545799498053254;0.587013303815021;0.777464637372241;0.962722388208354;0.980537136457874;0.734416947254272;0.375435649393553;0.106489547770962;0.0892376361668696;0.242467741982851;0.40610516900965;0.427497319032133;0.301874099075184;0.128396341665384;0.00246347624097456;-0.0322120242872125];
figure(1), clf, hold on
h(1) = subplot(2,1,1); hold on
plot(f1);
legend('Original')
h(2) = subplot(2,1,2); hold on
plot(f2);
linkaxes(h)
axis([0 max(length(f1),length(f2)), min(min(f1),min(f2)),max(max(f1),max(f2))])
%% make cubic interpolators and test points
pp1 = spline(1:numel(f1), f1);
pp2 = spline(1:numel(f2), f2);
maxX = max(numel(f1), numel(f2));
N = 100 * maxX;
x2 = linspace(1, maxX, N);
y1 = ppval(pp1, x2);
%% search for parameters
s = fminsearch(#(X) sum( (y1 - ppval(pp2,X(1)*x2+X(2))).^2 ), [0 0])
%% plot results
y2 = ppval( pp2, s(1)*x2+s(2));
figure(1), hold on
subplot(2,1,2), hold on
plot(x2,y2, 'r')
legend('before', 'after')
end
Results:
s =
2.886234493867320e-001 3.734482822175923e-001
Note that this computes the opposite transformation from the one you generated the data with. Reversing the numbers:
>> 1/s(1)
ans =
3.464721948700991e+000 % seems pretty decent
>> -s(2)
ans =
-3.734482822175923e-001 % hmmm...rather different from 7/11!
(I'm not sure about the 7/11 value you provided; using the exact values you gave to make a plot results in a less accurate approximation to the source function...Are you sure about the 7/11?)
Accuracy can be improved by either
using a different optimizer (fmincon, fminunc, etc.)
demanding a higher accuracy from fminsearch through optimset
having more sample points in both f1 and f2 to improve the quality of the interpolations
Using a better initial estimate
Anyway, this approach is pretty general and gives nice results. It also requires no toolboxes.
It has one major drawback though -- the solution found may not be the global optimizer, e.g., the quality of the outcomes of this method could be quite sensitive to the initial estimate you provide. So, always make a (difference) plot to make sure the final solution is accurate, or if you have a large number of such things to do, compute some sort of quality factor upon which you decide to re-start the optimization with a different initial estimate.
It is of course very possible to use the results of the Fourier+Mellin transforms (as suggested by chaohuang below) as an initial estimate to this method. That might be overkill for the simple example you provide, but I can easily imagine situations where this could indeed be very useful.

For the scale factor a, you can estimate it by computing the ratio of the amplitude spectra of the two signals since the Fourier transform is invariant to shift.
Similarly, you can estimate the shift factor b by using the Mellin transform, which is scale invariant.

Here's a super simple approach to estimate the scale a that works on your example data:
a = length(f2) / length(f1)
This gives 3.4167 which is close to your stated value of 3.4. If that estimate is good enough, you can use correlation to estimate the shift.
I realize that this is not exactly what you asked, but it may be an acceptable alternative depending on the data.

Both Rody Oldenhuis and jstarr's answers are correct. I'm adding my own answer just to sum things up, and connect between them.
I've messed up Rody's code a little bit and ended up with the following:
function findScaleShift
load f1f2
x0 = [length(f1)/length(f2) 0]; %initial guess, can do better
n=length(f1);
costFunc = #(z) sum((eval_f1(z,f2,n)-f1).^2);
opt.TolFun = eps;
xopt=fminsearch(costFunc,x0,opt);
f1r=eval_f1(xopt,f2,n);
subplot(211);
plot(1:n,f1,1:n,f1r,'--','linewidth',5)
title(xopt);
subplot(212);
plot(1:n,(f1-f1r).^2);
title('squared error')
end
function y = eval_f1(x,f2,n)
t = maketform('affine',[x(1) 0 x(2); 0 1 0 ; 0 0 1]');
y=imtransform(f2',t,'cubic','xdata',[1 n ],'ydata',[1 1])';
end
This gives zero results:
This method is accurate but exhaustive and may take some time. Another disadvantage is that it finds only a local minima, and may give false results if initial guess (x0) is far.
On the other hand, jstarr method gave the following results:
xopt = [ 3.49655562549115 -0.676062367063033]
which is 10% deviation from the correct answer. Pretty fast solution, but not as accurate as I requested, but still should be noted.
I think in order to get the best results jstarr method should be used as an initial guess for the method purposed by Rody, giving an accurate solution.
Ohad

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio