matlab curve fitting: restrictions on parameters - algorithm

I have 5 non-parametric models all with 5 to 8 parameters. This models are used to fit longitudinal data y(t) with t being time. Every datafile is fitted by all 5 models for comparison. The model itself cannot be altered.
For fitting starting values are used and these are fitted into a lsqcurvefit model using a levenberg-marquardt algortihm. So I've written a script for several models and one function for curvefitting
if i perform the curve fitting a lot of the starting values are wandering off to extreme values. This is the thing I want to avoid since these parameters should stay in the proximity off it's starting values and should only change between a well defined range or so that only curve fits within a standard deviation are included.Important to note here is that this restrictions should be imposed during the curve fitting (iterative numerization techique) and not afterwards.
The function I've written to fit models into height:
% Fit a specific model for all valid persons
opts = optimoptions(#lsqcurvefit, 'Algorithm', 'levenberg-marquardt');
[personalParams,personalRes,personalResidual] = lsqcurvefit(heightModel,initialValues,personalData(:,1),personalData(:,2),[],[],opts);
The function I've written for one of my models
elseif strcmpi(model,'jpss')
% y = h_1(1-(1/(1+((t+0.75)^c_1/d_1)+((t+0.75)^c_2/d_2)+((t+0.75)^c_3/d_3)))
% heightModel = #(params,ages) params(1).*(1-1./(1+((ages+0.75).^params(2))./params(3) + ((ages+0.75).^params(4))./params(5) + ((ages+0.75).^params(6))./params(7)));
heightModel = #(params,ages) params(1).*(1-1./(1+(((ages+0.75)./params(3)).^params(2)) + (((ages+0.75)./params(5)).^params(4)) + ((ages+0.75)./params(7)).^params(6))); % Adapted 25/07
modelStrings = {'h1','c1','d1','c2','d2','c3','d3'};
% Define initial values
if strcmpi('male',gender)
initialValues = [174.8 0.6109 2.9743 3.614 9.88 22.393 13.59];
initialValues = [162.7 0.6546 2.43 4.011 8.579 18.394 11.846];
What I would like to do:
Is it possible to place restrictions on every startingvalue #initial values? Putting restrictions on lsqcurvefit wouldn't be a good idea I think since there are different models with different starting values and different ranges that are allowed.
I had 2 things in my mind:
1. using range and place this between the initial values
initialValues = [162.7 0.6546 2.43 4.011 8.579 18.394 11.846]`
if range a1=[150,180]; range a2=[0.3,0.8] and so one
place lb and ub restrictions seperatly on all my initialvalues between lsqcurvefit
if Heightmodel='name model'
initial value* 1.2 and lb = initial value* 0.8
Can someone give me some hints or pointers because I can't make it work.
Thanks in advance
You state: there are different models with different starting values and different ranges that are allowed. This is where you can use ub and lb. How to do this is outlined in the lsqcurvefit documentation:
X=LSQCURVEFIT(FUN,X0,XDATA,YDATA,LB,UB) defines a set of lower and
upper bounds on the design variables, X, so that the solution is in the
range LB <= X <= UB. Use empty matrices for LB and UB if no bounds
exist. Set LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if
X(i) is unbounded above.
For instance in the following example the parameters are constrained within limits during the fit. The lower bound (lb) and upper bound (ub) are set to 20% below and above the starting values, respectively.
heightModel = #(params,ages) abs(params(1).*(1-1./(1+(params(2).* (ages+params(8) )).^params(5) +(params(3).* (ages+params(8) )).^params(6) +(params(4) .*(ages+params(8) )).^params(7) )));
initialValues = [161.92 0.4173 0.1354 0.090 0.540 2.87 14.281 0.3701];
lb = 0.8*initialValues; % <-- lower bound is 20% smaller than initial par values
ub = 1.2*initialValues;
[parsout,resnorm,residual] = lsqcurvefit(heightModel,initialValues,t,ht,lb,ub);


Why do higher learning rates in logistic regression produce NaN costs?

I am building a classifier for spam vs. ham emails using Octave and the Ling-Spam corpus; my method of classification is logistic regression.
Higher learning rates lead to NaN values being calculated for the cost, yet it does not break/decrease the performance of the classifier itself.
My Attempts
NB: My dataset is already normalised using mean normalisation.
When trying to choose my learning rate, I started with it as 0.1 and 400 iterations. This resulted in the following plot:
1 - Graph 1
When he lines completely disappear after a few iterations, it is due to a NaN value being produced; I thought this would result in broken parameter values and thus bad accuracy, but when checking the accuracy, I saw it was 95% on the test set (meaning that gradient descent was apparently still functioning). I checked different values of the learning rate and iterations to see how the graphs changed:
2 - Graph 2
The lines no longer disappeared, meaning no NaN values, BUT the accuracy was 87% which is substantially lower.
I did two more tests with more iterations and a slightly higher learning rate, and in both of them, the graphs both decreased with iterations as expected, but the accuracy was ~86-88%. No NaNs there either.
I realised that my dataset was skewed, with only 481 spam emails and 2412 ham emails. I therefore calculated the FScore for each of these different combinations, hoping to find the later ones had a higher FScore and the accuracy was due to the skew. That was not the case either - I have summed up my results in a table:
3 - Table
So there is no overfitting and the skew does not seem to be the problem; I don't know what to do now!
The only thing I can think of is that my calculations for accuracy and FScore are wrong, or that my initial debugging of the line 'disappearing' was wrong.
EDIT: This question is crucially about why the NaN values occur for those chosen learning rates. So the temporary fix I had of lowering the learning rate did not really answer my question - I always thought that higher learning rates simply diverged instead of converging, not producing NaN values.
My Code
My main.m code (bar getting the dataset from files):
numRecords = length(labels);
trainingSize = ceil(numRecords*0.6);
CVSize = trainingSize + ceil(numRecords*0.2);
featureData = normalise(data);
featureData = [ones(numRecords, 1), featureData];
numFeatures = size(featureData, 2);
featuresTrain = featureData(1:(trainingSize-1),:);
featuresCV = featureData(trainingSize:(CVSize-1),:);
featuresTest = featureData(CVSize:numRecords,:);
labelsTrain = labels(1:(trainingSize-1),:);
labelsCV = labels(trainingSize:(CVSize-1),:);
labelsTest = labels(CVSize:numRecords,:);
paramStart = zeros(numFeatures, 1);
learningRate = 0.0001;
iterations = 400;
[params] = gradDescent(featuresTrain, labelsTrain, learningRate, iterations, paramStart, featuresCV, labelsCV);
threshold = 0.5;
[accuracy, precision, recall] = predict(featuresTest, labelsTest, params, threshold);
fScore = (2*precision*recall)/(precision+recall);
My gradDescent.m code:
function [optimParams] = gradDescent(features, labels, learningRate, iterations, paramStart, featuresCV, labelsCV)
x_axis = [];
J_axis = [];
J_CV = [];
params = paramStart;
for i=1:iterations,
[cost, grad] = costFunction(features, labels, params);
[cost_CV] = costFunction(featuresCV, labelsCV, params);
params = params - (learningRate.*grad);
x_axis = [x_axis;i];
J_axis = [J_axis;cost];
J_CV = [J_CV;cost_CV];
plot(x_axis, J_axis, 'r', x_axis, J_CV, 'b');
legend("Training", "Cross-Validation");
title("Cost as a function of iterations");
optimParams = params;
My costFunction.m code:
function [cost, grad] = costFunction(features, labels, params)
numRecords = length(labels);
hypothesis = sigmoid(features*params);
cost = (-1/numRecords)*sum((labels).*log(hypothesis)+(1-labels).*log(1-hypothesis));
grad = (1/numRecords)*(features'*(hypothesis-labels));
My predict.m code:
function [accuracy, precision, recall] = predict(features, labels, params, threshold)
predictions = sigmoid(features*params)>threshold;
correct = predictions == labels;
truePositives = sum(predictions == labels == 1);
falsePositives = sum((predictions == 1) != labels);
falseNegatives = sum((predictions == 0) != labels);
precision = truePositives/(truePositives+falsePositives);
recall = truePositives/(truePositives+falseNegatives);
accuracy = 100*(sum(correct)/numRecords);
Credit where it's due:
A big help here was this answer: so this question is kind of a duplicate, but I didn't realise it, and it isn't obvious at first... I will do my best to try to explain why the solution works too, to avoid simply copying the answer.
The issue was in fact the 0*log(0) = NaN result that occurred in my data. To fix it, in my calculation of the cost, it became:
cost = (-1/numRecords)*sum((labels).*log(hypothesis)+(1-labels).*log(1-hypothesis+eps(numRecords, 1)));
(see the question for the variables' values etc., it seems redundant to include the rest when just this line changes)
The eps() function is defined as follows:
Return a scalar, matrix or N-dimensional array whose elements are all
eps, the machine precision.
More precisely, eps is the relative spacing between any two adjacent
numbers in the machine’s floating point system. This number is
obviously system dependent. On machines that support IEEE floating
point arithmetic, eps is approximately 2.2204e-16 for double precision
and 1.1921e-07 for single precision.
When called with more than one argument the first two arguments are
taken as the number of rows and columns and any further arguments
specify additional matrix dimensions. The optional argument class
specifies the return type and may be either "double" or "single".
So this means that adding this value onto the value calculated by the Sigmoid function (which was previously so close to 0 it was taken as 0) will mean that it is the closest value to 0 that is not 0, making the log() not return -Inf.
When testing with the learning rate as 0.1 and iterations as 2000/1000/400, the full graph was plotted and no NaN values were produced when checking.
NB: Just in case anyone was wondering, the accuracy and FScores did not change after this, so the accuracy really was that good despite the error in calculating the cost with a higher learning rate.

MATLAB: Speeding up a discretization function using bsxfun

For a current project, I have to discretize quasi-continuous values into bins defined by some pre-defined binning resolution. For this purpose, I have written a function, which I expected to be highly efficient as it is able to both process scalar inputs as well as vector inputs using bsxfun. However, after some profiling, I found out that almost all processing time of my much larger project is produced in this function, and within the function, it's mainly the bsxfun part that takes time, with the min-query following on second place. Long story short, I am looking for advice on how to solve this task MUCH faster in MATLAB. Side note: I am usually passing vectors with some 50k elements.
Here's the code:
function sampleNo = value2sample(value,bins)
%Make sure both vectors have orientations fitting bsxfun
value = value(:);
bins = bins(:)';
%Recover bin resolution (avoids passing another parameter)
delta = median(diff(bins));
%Calculate distance matrix between all combinations
dist = abs(bsxfun(#minus,value,bins));
%What we really want to know is the minimum distance per row
[minval,ind] = min(dist,[],2);
%Make sure we don't accidentally further process NaNs as 1st bin
sampleNo = ind;
sampleNo(minval>delta) = NaN;
The reason that your function is slow is because you are computing the distance between every element of values and bins and storing them all in an array - if there are N values and M bins then you will require NM elements to store all the distances, and this is probably a really big number (e.g. if each input has 50,000 elements then you need 2.5 billion elements in the output array).
Moreover, since your bins are sorted (you didn't state this, but it looks like you are assuming it in your code) you do not need to compute the distance from every value to every bin. You can be much smarter,
function ind = value2sample(value, bins)
% Find median bin distance
delta = median(diff(bins));
% Bucket into 'nearest' bin by using midpoints
bins = bins(:);
mids = [-Inf; 0.5 * (bins(1:end-1) + bins(2:end))];
[~, ind] = histc(value, mids);
% Ensure that NaN values and points that aren't near any bin are returned as NaN
ind(isnan(value)) = NaN;
ind(abs(value - bins(ind)) > delta) = NaN;
In my tests, with values = randn(10000, 1) and bins = -50:50 it takes around 4.5 milliseconds to run the original function, and 485 microseconds to run the code above, so you are getting around a 10x speedup (and the speedup will be even greater as you increase the size of the inputs).
Thanks to #Chris Taylor, I was able to solve the problem very efficiently. The code now runs almost 400 times faster than before. The only changes I had to make from his version are reflected in the code below. Main issue was to replace histc (whose use is not encouraged anymore) by discretize.
function ind = value2sample(value, bins)
% Make sure the vectors are standing
value = value(:);
bins = bins(:);
% Bucket into 'nearest' bin by using midpoints
mids = [eps; 0.5 * (bins(1:end-1) + bins(2:end))];
ind = discretize(value, mids);
The only thing is, that in this implementation your bins must be non-negative. Other than that, this code does exactly what I want, including the fact that ind has the same size as value and contains NaNs whenever a value is NaN or out of the range of bins.

Matlab - Least Squares data fitting - Cost function with extra constraint

I am currently working on some MatLab code to fit experimental data to a sum of exponentials following a method described in this paper.
According to the paper, the data has to follow the following equation (written in pseudo-code):
y = sum(v(i)*exp(-x/tau(i)),i=1..n)
Here tau(i) is a set of n predefined constants. The number of constants also determines the size of the summation, and hence the size of v. For example, we can try to fit a sum of 100 exponentials, each with a different tau(i) to our data. However, due to the nature of the fitting and the exponential sum, we need to add another constraint to the problem, and hence to the cost function of the least-squares method used.
Normally, the cost function of the least-squares method is given by:
(y_data - sum(v(i)*exp(-x/tau(i)),i=1..n)^2
And this has to be minimized. However, to prevent over-fitting that would make the time-constant spectrum extremely noisy, the paper adds the following constraint to the cost function:
|v(i) - v(i+1)|^2
Because of this extra constraint, as far as I know, the regular algorithms, like lsqcurvefit aren't useable any longer, and I have to use fminsearch to search the minimum of my least-squares cost function with a constraint. The function that has to be minimized, according to me, is the following:
(y_data - sum(v(i)*exp(-x/tau(i)),i=1..n)^2 + sum(|v(j) - v(j+1)|^2,j=1..n-1)
My attempt to code this in MatLab is the following. Initially we define the function in a function script, then we use fminsearch to actually minimize the function and get values for v.
function res = funcost( v )
%FUNCOST Definition of the function that has to be minimised
%We define a function yvalues with 2 exponentials with known time-constants
% so we know the result that should be given by minimising.
xvalues = linspace(0,50,10000);
yvalues = 3-2*exp(-xvalues/1)-exp(-xvalues/10);
%Definition of 30 equidistant point in the logarithmic scale
terms = 30;
termsvector = [1:terms];
tau = termsvector;
for i = 1:terms
tau(i) = 10^(-1+3/terms*i);
%Definition of the regular function
res_1 = 3;
for i=1:terms
res_1 =res_1+ v(i).*exp(-xvalues./tau(i));
res_1 = res_1-yvalues;
%Added constraint
for i=1:terms-1
res_2 = res_2 + (v(i)-v(i+1))^2;
res=sum(res_1.*res_1) + k*res_2;
However, this code is giving me inaccurate results (no error, just inaccurate results), which leads me to believe I either made a mistake in the coding or in the interpretation of the added constraint for the least-squares method.
I would try to introduce the additional constrain in following way:
res_2 = max((v(1:(end-1))-v(2:end)).^2);
e.g. instead of minimizing an integrated (summed up) error, it does minmax.
You may also make this constrain stiff by
if res_2 > some_number
k = a_very_big_number;
k=0; % or k = a_small_number

Parallelising gradient calculation in Julia

I was persuaded some time ago to drop my comfortable matlab programming and start programming in Julia. I have been working for a long with neural networks and I thought that, now with Julia, I could get things done faster by parallelising the calculation of the gradient.
The gradient need not be calculated on the entire dataset in one go; instead one can split the calculation. For instance, by splitting the dataset in parts, we can calculate a partial gradient on each part. The total gradient is then calculated by adding up the partial gradients.
Though, the principle is simple, when I parallelise with Julia I get a performance degradation, i.e. one process is faster then two processes! I am obviously doing something wrong... I have consulted other questions asked in the forum but I could still not piece together an answer. I think my problem lies in that there is a lot of unnecessary data moving going on, but I can't fix it properly.
In order to avoid posting messy neural network code, I am posting below a simpler example that replicates my problem in the setting of linear regression.
The code-block below creates some data for a linear regression problem. The code explains the constants, but X is the matrix containing the data inputs. We randomly create a weight vector w which when multiplied with X creates some targets Y.
# This code implements a simple linear regression problem
MAXITER = 100 # number of iterations for simple gradient descent
N = 10000 # number of data items
D = 50 # dimension of data items
X = randn(N, D) # create random matrix of data, data items appear row-wise
Wtrue = randn(D,1) # create arbitrary weight matrix to generate targets
Y = X*Wtrue # generate targets
The next code-block below defines functions for measuring the fitness of our regression (i.e. the negative log-likelihood) and the gradient of the weight vector w:
#everywhere begin
function negative_loglikelihood(Y,X,W)
# number of data items
N = size(X,1)
# accumulate here log-likelihood
ll = 0
for nn=1:N
ll = ll - 0.5*sum((Y[nn,:] - X[nn,:]*W).^2)
return ll
function negative_loglikelihood_grad(Y,X,W, first_index,last_index)
# number of data items
N = size(X,1)
# accumulate here gradient contributions by each data item
grad = zeros(similar(W))
for nn=first_index:last_index
grad = grad + X[nn,:]' * (Y[nn,:] - X[nn,:]*W)
return grad
Note that the above functions are on purpose not vectorised! I choose not to vectorise, as the final code (the neural network case) will also not admit any vectorisation (let us not get into more details regarding this).
Finally, the code-block below shows a very simple gradient descent that tries to recover the parameter weight vector w from the given data Y and X:
# start from random initial solution
W = randn(D,1)
# learning rate, set here to some arbitrary small constant
eta = 0.000001
# the following for-loop implements simple gradient descent
for iter=1:MAXITER
# get gradient
ref_array = Array(RemoteRef, nworkers())
# let each worker process part of matrix X
for index=1:length(workers())
# first index of subset of X that worker should work on
first_index = (index-1)*int(ceil(N/nworkers())) + 1
# last index of subset of X that worker should work on
last_index = min((index)*(int(ceil(N/nworkers()))), N)
ref_array[index] = #spawn negative_loglikelihood_grad(Y,X,W, first_index,last_index)
# gather the gradients calculated on parts of matrix X
grad = zeros(similar(W))
for index=1:length(workers())
grad = grad + fetch(ref_array[index])
# now that we have the gradient we can update parameters W
W = W + eta*grad;
# report progress, monitor optimisation
#printf("Iter %d neg_loglikel=%.4f\n",iter, negative_loglikelihood(Y,X,W))
As is hopefully visible, I tried to parallelise the calculation of the gradient in the easiest possible way here. My strategy is to break the calculation of the gradient in as many parts as available workers. Each worker is required to work only on part of matrix X, which part is specified by first_index and last_index. Hence, each worker should work with X[first_index:last_index,:]. For instance, for 4 workers and N = 10000, the work should be divided as follows:
worker 1 => first_index = 1, last_index = 2500
worker 2 => first_index = 2501, last_index = 5000
worker 3 => first_index = 5001, last_index = 7500
worker 4 => first_index = 7501, last_index = 10000
Unfortunately, this entire code works faster if I have only one worker. If add more workers via addprocs(), the code runs slower. One can aggravate this issue by create more data items, for instance use instead N=20000.
With more data items, the degradation is even more pronounced.
In my particular computing environment with N=20000 and one core, the code runs in ~9 secs. With N=20000 and 4 cores it takes ~18 secs!
I tried many many different things inspired by the questions and answers in this forum but unfortunately to no avail. I realise that the parallelisation is naive and that data movement must be the problem, but I have no idea how to do it properly. It seems that the documentation is also a bit scarce on this issue (as is the nice book by Ivo Balbaert).
I would appreciate your help as I have been stuck for quite some while with this and I really need it for my work. For anyone wanting to run the code, to save you the trouble of copying-pasting you can get the code here.
Thanks for taking the time to read this very lengthy question! Help me turn this into a model answer that anyone new in Julia can then consult!
I would say that GD is not a good candidate for parallelizing it using any of the proposed methods: either SharedArray or DistributedArray, or own implementation of distribution of chunks of data.
The problem does not lay in Julia, but in the GD algorithm.
Consider the code:
Main process:
for iter = 1:iterations #iterations: "the more the better"
δ = _gradient_descent_shared(X, y, θ)
θ = θ - α * (δ/N)
The problem is in the above for-loop which is a must. No matter how good _gradient_descent_shared is, the total number of iterations kills the noble concept of the parallelization.
After reading the question and the above suggestion I've started implementing GD using SharedArray. Please note, I'm not an expert in the field of SharedArrays.
The main process parts (simple implementation without regularization):
run_gradient_descent(X::SharedArray, y::SharedArray, θ::SharedArray, α, iterations) = begin
N = length(y)
for iter = 1:iterations
δ = _gradient_descent_shared(X, y, θ)
θ = θ - α * (δ/N)
_gradient_descent_shared(X::SharedArray, y::SharedArray, θ::SharedArray, op=(+)) = begin
if size(X,1) <= length(procs(X))
return _gradient_descent_serial(X, y, θ)
rrefs = map(p -> (#spawnat p _gradient_descent_serial(X, y, θ)), procs(X))
return mapreduce(r -> fetch(r), op, rrefs)
The code common to all workers:
#= Returns the range of indices of a chunk for every worker on which it can work.
The function splits data examples (N rows into chunks),
not the parts of the particular example (features dimensionality remains intact).=#
#everywhere function _worker_range(S::SharedArray)
idx = indexpids(S)
if idx == 0
return 1:size(S,1), 1:size(S,2)
nchunks = length(procs(S))
splits = [round(Int, s) for s in linspace(0,size(S,1),nchunks+1)]
splits[idx]+1:splits[idx+1], 1:size(S,2)
#Computations on the chunk of the all data.
#everywhere _gradient_descent_serial(X::SharedArray, y::SharedArray, θ::SharedArray) = begin
prange = _worker_range(X)
pX = sdata(X[prange[1], prange[2]])
py = sdata(y[prange[1],:])
tempδ = pX' * (pX * sdata(θ) .- py)
The data loading and training. Let me assume that we have:
features in X::Array of the size (N,D), where N - number of examples, D-dimensionality of the features
labels in y::Array of the size (N,1)
The main code might look like this:
X=[ones(size(X,1)) X] #adding the artificial coordinate
N, D = size(X)
α = 0.01
initialθ = SharedArray(Float64, (D,1))
sX = convert(SharedArray, X)
sy = convert(SharedArray, y)
X = nothing
y = nothing
finalθ = run_gradient_descent(sX, sy, initialθ, α, MAXITER);
After implementing this and run (on 8-cores of my Intell Clore i7) I got a very slight acceleration over serial GD (1-core) on my training multiclass (19 classes) training data (715 sec for serial GD / 665 sec for shared GD).
If my implementation is correct (please check this out - I'm counting on that) then parallelization of the GD algorithm is not worth of that. Definitely you might get better acceleration using stochastic GD on 1-core.
If you want to reduce the amount of data movement, you should strongly consider using SharedArrays. You could preallocate just one output vector, and pass it as an argument to each worker. Each worker sets a chunk of it, just as you suggested.

matlab code optimization - clustering algorithm KFCG

I have a large set of vectors (orientation data in an axis-angle representation... the axis is the vector). I want to apply a clustering algorithm to. I tried kmeans but the computational time was too long (never finished). So instead I am trying to implement KFCG algorithm which is faster (Kirke 2010):
Initially we have one cluster with the entire training vectors and the codevector C1 which is centroid. In the first iteration of the algorithm, the clusters are formed by comparing first element of training vector Xi with first element of code vector C1. The vector Xi is grouped into the cluster 1 if xi1< c11 otherwise vector Xi is grouped into cluster2 as shown in Figure 2(a) where codevector dimension space is 2. In second iteration, the cluster 1 is split into two by comparing second element Xi2 of vector Xi belonging to cluster 1 with that of the second element of the codevector. Cluster 2 is split into two by comparing the second element Xi2 of vector Xi belonging to cluster 2 with that of the second element of the codevector as shown in Figure 2(b). This procedure is repeated till the codebook size is reached to the size specified by user.
I'm unsure what ratio is appropriate for the codebook, but it shouldn't matter for the code optimization. Also note mine is 3-D so the same process is done for the 3rd dimension.
My code attempts
I've tried implementing the above algorithm into Matlab 2013 (Student Version). Here's some different structures I've tried - BUT take way too long (have never seen it completed):
%training vectors:
Atgood = Nx4 vector (see test data below if want to test);
vecA = Atgood(:,1:3);
roA = size(vecA,1);
%Codebook size, Nsel, is ratio of data
Nseltemp = remainFrac2*roA; %codebook size
%Ensure selected size after nearest power of 2 is NOT greater than roA
if 2^round(log2(Nseltemp)) &lt roA
NselIter = round(log2(Nseltemp));
NselIter = ceil(log2(Nseltemp)-1);
Nsel = 2^NselIter; %power of 2 - for LGB and other algorithms
%%cluster = cell(1,Nsel); %Unsure #rows - Don't know how to initialize if need mean...
codevec(1,1:3) = mean(vecA,1);
for kk = 1:NselIter
hh2 = 1:2:size(codevec,1)*2;
for hh1 = 1:length(hh2)
% for ii = 1:roA
% if vecA(ii,ind) &lt codevec(hh1,ind)
% cluster{1,hh}(count1,1:4) = Atgood(ii,:); %want all 4 elements
% count1=count1+1;
% else
% cluster{1,hh+1}(count2,1:4) = Atgood(ii,:); %want all 4
% count2=count2+1;
% end
% end
%EDIT: My ATTEMPT at optimizing above for loop:
splitind = vecA(:,ind)&gt=repcv;
splitind2 = vecA(:,ind)&ltrepcv;
clear codevec
%Only mean the 1x3 vector portion of the cluster - for centroid
codevec = cell2mat((cellfun(#(x) mean(x(:,1:3),1),cluster,'UniformOutput',false))');
if ind &lt 3
ind = ind+1;
if length(codevec) ~= Nsel
warning('codevec ~= Nsel');
Alternatively, instead of cells I thought 3D Matrices would be faster? I tried but it was slower using my method of appending the next row each iteration (temp=[]; for...temp=[temp;new];)
Also, I wasn't sure what was best to loop with, for or while:
%If initialize cell to full length
while length(find(~cellfun('isempty',cluster))) < Nsel
Well, anyways, the first method was fastest for me.
Is the logic standard? Not in the sense that it matches with the algorithm described, but from a coding perspective, any weird methods I employed (especially with those multiple inner loops) that slows it down? Where can I speed up (you can just point me to resources or previous questions)?
My array size, Atgood, is 1,000,000x4 making NselIter=19; - do I just need to find a way to decrease this size or can the code be optimized?
Should this be asked on CodeReview? If so, I'll move it.
Testing Data
Here's some random vectors you can use to test:
for ii=1:1000 %My size is ~ 1,000,000
omega = 2*rand(3,1)-1;
omega = (omega/norm(omega))';
Atgood(ii,1:4) = [omega,57];
Your biggest issue is re-iterating through all of vecA FOR EACH CODEVECTOR, rather than just the ones that are part of the corresponding cluster. You're supposed to split each cluster on it's codevector. As it is, your cluster structure grows and grows, and each iteration is processing more and more samples.
Your second issue is the loop around the comparisons, and the appending of samples to build up the clusters. Both of those can be solved by vectorizing the comparison operation. Oh, I just saw your edit, where this was optimized. Much better. But codevec(hh1,ind) is just a scalar, so you don't even need the repmat.
Try this version:
% (preallocs added in edit)
cluster = cell(1,Nsel);
codevec = zeros(Nsel, 3);
codevec(1,:) = mean(Atgood(:,1:3),1);
cluster{1} = Atgood;
nClusters = 1;
ind = 1;
while nClusters < Nsel
for c = 1:nClusters
lower_cluster_logical = cluster{c}(:,ind) < codevec(c,ind);
cluster{nClusters+c} = cluster{c}(~lower_cluster_logical,:);
cluster{c} = cluster{c}(lower_cluster_logical,:);
codevec(c,:) = mean(cluster{c}(:,1:3), 1);
codevec(nClusters+c,:) = mean(cluster{nClusters+c}(:,1:3), 1);
ind = rem(ind,3) + 1;
nClusters = nClusters*2;
