Multiple linear optimizations - performance

I'm interested in solving a few hundred linear systems in MATLAB. At the moment this is done by a for-loop with linprog
The vectors used have identical dimensions and are lines of one matrix.
for combination_id = 1:1000
[tempOperatingPointsVectors,tempTargetValue, exitflag] = ...
linprog( lo_c(combination_id,:), ...
[], [], ...
lo_G(:,:,combination_id), lo_d(:,combination_id), ...
lo_u(:,combination_id), lo_v(:,combination_id), ...
x0_in, options);
end
Is there a way of using linprog with the whole vectors instead of picking each line?
I also tried a parfor loop but since the operations in each loop are very small there is no speed improvement.

Why can't you set up one big linear program and then solve all of it at once?
Since I don't have your data, I cannot test the following code, but the basic idea should work.
xVar = 1:size(lo_c,2);
uBound = lo_u(:, 1);
vBound = lo_v(:, 1);
dMat = lo_d(:, 1);
gMat = lo_G(:,:, 1);
objMat = lo_c(1,:);
x0_inMat = x0_in;
for combination_id = 2:1000
xVar = [xVar, xVar(end)+1:xVar(end)+size(lo_c,2)];
uBound = [uBound; lo_u(:, combination_id);
vBound = [vBound; lo_v(:, combination_id);
dMat = [dMat; lo_d(:, combination_id);
gMat = [gMat; lo_G(:,:, combination_id)];
objMat = [objMat; lo_c(combination_id,:)];
x0_inMat = [xo_inMat; x0_in];
end
[tempOperatingPointsVectors,tempTargetValue, exitflag] = ...
linprog( objMat, ...
[], [], ...
gMat, dMat, ...
uBound, vBound, ...
x0_in, options);
Should do the trick.

Related

Parallel for-loop with SharedArray

I would like to have the most time and memory efficient load balanced application of an operation to each column of a SharedArray producing corresponding columns that are modifying in-place a pre-allocated output SharedArray. How should I improve the following code ?
using SharedArrays, Distributed
nCores = length(Sys.cpu_info())
addprocs(nCores - 1);
#everywhere using SharedArrays, Distributed
#everywhere addConstant = 10000;
#everywhere function divideColByMeanConstant(x)
return (x ./ mean(x)) .+ addConstant
end
inputMAT = SharedArray(rand(10000,20000))
outputMAT = SharedArray(Array{Float64,2}(undef, size(inputMAT)))
function Array2ArrayColumnwise_forloop!(output::SharedArray{Float64,2},operation::Function, input::SharedArray{Float64,2}, rowRange::Array{Int64,1},colRange::Array{Int64,1})
#async #distributed for colInd in colRange
#views output[rowRange,colInd] = operation(input[rowRange, colInd])
end
end
rowRange = [1,4,6,8,10,15];
colRange = [1,4,5, 6, 7,10,15];
Array2ArrayColumnwise_forloop!(outputMAT,divideColByMeanConstant, inputMAT, rowRange,colRange)
Thank you in advance

How to do parallel processing in pytorch

I am working on a deep learning problem. I am solving it using pytorch. I have two GPU's which are on the same machine (16273MiB,12193MiB). I want to use both the GPU's for my training (video dataset).
I get a warning:
There is an imbalance between your GPUs. You may want to exclude GPU 1 which
has less than 75% of the memory or cores of GPU 0. You can do so by setting
the device_ids argument to DataParallel, or by setting the CUDA_VISIBLE_DEVICES
environment variable.
warnings.warn(imbalance_warn.format(device_ids[min_pos], device_ids[max_pos]))
I also get an error:
raise TypeError('Broadcast function not implemented for CPU tensors')
TypeError: Broadcast function not implemented for CPU tensors
if __name__ == '__main__':
opt.scales = [opt.initial_scale]
for i in range(1, opt.n_scales):
opt.scales.append(opt.scales[-1] * opt.scale_step)
opt.arch = '{}-{}'.format(opt.model, opt.model_depth)
opt.mean = get_mean(opt.norm_value)
opt.std = get_std(opt.norm_value)
print("opt",opt)
with open(os.path.join(opt.result_path, 'opts.json'), 'w') as opt_file:
json.dump(vars(opt), opt_file)
torch.manual_seed(opt.manual_seed)
model, parameters = generate_model(opt)
#print(model)
pytorch_total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print("Total number of trainable parameters: ", pytorch_total_params)
# Define Class weights
if opt.weighted:
print("Weighted Loss is created")
if opt.n_finetune_classes == 2:
weight = torch.tensor([1.0, 3.0])
else:
weight = torch.ones(opt.n_finetune_classes)
else:
weight = None
criterion = nn.CrossEntropyLoss()
if not opt.no_cuda:
criterion = nn.DataParallel(criterion.cuda())
if opt.no_mean_norm and not opt.std_norm:
norm_method = Normalize([0, 0, 0], [1, 1, 1])
elif not opt.std_norm:
norm_method = Normalize(opt.mean, [1, 1, 1])
else:
norm_method = Normalize(opt.mean, opt.std)
train_loader = torch.utils.data.DataLoader(
training_data,
batch_size=opt.batch_size,
shuffle=True,
num_workers=opt.n_threads,
pin_memory=True)
train_logger = Logger(
os.path.join(opt.result_path, 'train.log'),
['epoch', 'loss', 'acc', 'precision','recall','lr'])
train_batch_logger = Logger(
os.path.join(opt.result_path, 'train_batch.log'),
['epoch', 'batch', 'iter', 'loss', 'acc', 'precision', 'recall', 'lr'])
if opt.nesterov:
dampening = 0
else:
dampening = opt.dampening
optimizer = optim.SGD(
parameters,
lr=opt.learning_rate,
momentum=opt.momentum,
dampening=dampening,
weight_decay=opt.weight_decay,
nesterov=opt.nesterov)
# scheduler = lr_scheduler.ReduceLROnPlateau(
# optimizer, 'min', patience=opt.lr_patience)
if not opt.no_val:
spatial_transform = Compose([
Scale(opt.sample_size),
CenterCrop(opt.sample_size),
ToTensor(opt.norm_value), norm_method
])
print('run')
for i in range(opt.begin_epoch, opt.n_epochs + 1):
if not opt.no_train:
adjust_learning_rate(optimizer, i, opt.lr_steps)
train_epoch(i, train_loader, model, criterion, optimizer, opt,
train_logger, train_batch_logger)
I have also made changes in my train file:
model = nn.DataParallel(model(),device_ids=[0,1]).cuda()
outputs = model(inputs)
It does not seem to work properly and is giving error. Please advice, I am new to pytorch.
Thanks
As mentioned in this link, you have to do model.cuda() before passing it to nn.DataParallel.
net = nn.DataParallel(model.cuda(), device_ids=[0,1])
https://github.com/pytorch/pytorch/issues/17065

How to speed up MATLAB integration?

I have the following code:
function [] = Solver( t )
%pre-declaration
foo=[1,1,1];
fooCell = num2cell(foo);
[q, val(q), star]=fooCell{:};
%functions used in prosomoiwsh
syms q val(q) star;
qd1=symfun(90*pi/180+30*pi/180*cos(q),q);
qd2=symfun(90*pi/180+30*pi/180*sin(q),q);
p1=symfun(79*pi/180*exp(-1.25*q)+pi/180,q);
p2=symfun(79*pi/180*exp(-1.25*q)+pi/180,q);
e1=symfun(val-qd1,q);
e2=symfun(val-qd2,q);
T1=symfun(log(-(1+star)/star),star);
T2=symfun(log(star/(1-star)),star);
%anonymous function handles
lambda=[0.75;10.494441313222076];
calcEVR_handles={#(t,x)[double(subs(diff(subs(T1,star,e1/p1),q)+subs(lambda(1)*T1,star,e1/p1),{diff(val,q);val;q},{x(2);x(1);t})),double(subs(diff(subs(T1,star,e1/p1),q)+subs(lambda(1)*T1,star,e1/p1),{diff(val,q);val;q},{0;x(1);t})),double(subs(double(subs(subs(diff(T1,star),star,e1/p1),{val;q},{x(1);t}))/p1,q,t))];#(t,x)[double(subs(diff(subs(T2,star,e2/p2),q)+subs(lambda(2)*T2,star,e2/p2),{diff(val,q);val;q},{x(4);x(3);t})),double(subs(diff(subs(T2,star,e2/p2),q)+subs(lambda(2)*T2,star,e2/p2),{diff(val,q);val;q},{0;x(3);t})),double(subs(double(subs(subs(diff(T2,star),star,e2/p2),{val;q},{x(3);t}))/p2,q,t))]};
options = odeset('AbsTol',1e-1,'RelTol',1e-1);
[T,x_r] = ode23(#prosomoiwsh,[0 t],[80*pi/180;0;130*pi/180;0;2.4943180186983711;11.216948999754299],options);
save newresult T x_r
function dx_th = prosomoiwsh(t,x_th)
%declarations
k=0.80773938740480955;
nf=6.2860930902603602;
hGa=0.16727117784664769;
hGb=0.010886618389781832;
dD=0.14062935253218495;
s=0.64963817519705203;
IwF={[4.5453398382686956 5.2541234145178066 -6.5853972592002235 7.695225990702979];[-4.4358339284697337 -8.1138542053372298 -8.2698210582548395 3.9739729629084071]};
IwG={[5.7098975358444752 4.2470526600975802 -0.83412489434697168 0.53829395964565041] [1.8689492167233894 -0.0015017513794517434 8.8666804106266461 -1.0775021663921467];[6.9513235639494155 -0.8133752392893685 7.4032432556804162 3.1496138243338709] [5.8037182454981568 2.0933267947187457 4.852362963697928 -0.10745559204132382]};
IbF={-1.2165533594615545;7.9215291787744917};
IbG={2.8425752327892844 2.5931576770598168;9.4789237295474873 7.9378928037841252};
p=2;
m=2;
signG=1;
n_vals=[2;2];
nFixedStates=4;
gamma_nn=[0.31559428834175318;9.2037894041383641];
th_star_guess=[2.4943180186983711;11.216948999754299];
%solution
x = x_th(1:nFixedStates);
th = x_th(nFixedStates+1:nFixedStates+p);
f = zeros(m,1);
G = zeros(m,m);
ZF = zeros(p,m);
ZG = zeros(p,m,m);
for i=1:m
[f(i), ZF(:,i)] = calculate_neural_output(x, IwF{i}, IbF{i}, th);
for j=1:m
[G(i,j), ZG(:,i,j)] = calculate_neural_output(x, IwG{i,j}, IbG{i,j}, th);
end
end
detG = det(G);
if m == 1
adjG = 1;
else
adjG = detG*G^-1;
end
E = zeros(m,1);
V = zeros(m,1);
R = zeros(m,m);
for i=1:m
EVR=calcEVR_handles{i}(t,x);
E(i)=EVR(1);
V(i)=EVR(2);
R(i,i)=EVR(3);
end
Rinv = R^-1;
prod_R_E = R*E;
ub = f + Rinv * (V + k*E) + nf*prod_R_E;
ua = - detG / (detG^2+dD) * (adjG * ub) ;
u = ua - signG * (hGa*(ua'*ua) + hGb*(ub'*ub)) * prod_R_E;
dx_th = zeros(nFixedStates+p, 1); %preallocation
%System in form (1) of the IEEE paper
[vec_sys_f, vec_sys_G] = sys_f_G(x);
dx_nm = vec_sys_f + vec_sys_G*u;
%Calculation of dx
index_start = 1;
index_end = -1;
for i=1:m
index_end = index_end + n_vals(i);
for j=index_start:index_end
dx_th(j) = x(j+1);
end
dx_th(index_end+1) = dx_nm(i);
index_start = index_end + 2;
end
%Calculation of dth
AFvalueT = zeros(p,m);
for i=1:m
AFvalueT(:,i) = 0;
for j=1:m
AFvalueT(:,i) = AFvalueT(:,i)+ZG(:,i,j)*ua(j);
end
end
dx_th(nFixedStates+1:nFixedStates+p) = diag(gamma_nn)*( (ZF+AFvalueT)*prod_R_E -s*(th-th_star_guess) );
display(t)
end
function [y, Z] = calculate_neural_output(input, Iw, Ib, state)
Z = [tanh(Iw*input+Ib);1];
y = state' * Z;
end
function [ f,g ] = sys_f_G( x )
Iz1=0.96;
Iz2=0.81;
m1=3.2;
m2=2.0;
l1=0.5;
l2=0.4;
g=9.81;
q1=x(1);
q2=x(3);
q1dot=x(2);
q2dot=x(4);
M=[Iz1+Iz2+m1*l1^2/4+m2*(l1^2+l2^2/4+l1*l2*cos(q2)),Iz2+m2*(l2^2/4+l1*l2*cos(q2)/2);Iz2+m2*(l2^2/4+l1*l2*cos(q2)/2),Iz2+m2*l2^2/4];
c=0.5*m2*l1*l2*sin(q2);
C=[-c*q2dot,-c*(q1dot+q2dot);c*q1dot,0];
G=[0.5*m1*g*l1*cos(q1)+m2*g*(l1*cos(q1)+0.5*l2*cos(q1+q2));0.5*m2*g*l2*cos(q1+q2)];
f=-M\(C*[q1dot;q2dot]+G);
g=inv(M);
end
end
Its target is to simulate the control of a 2-DOF robotic arm using a certain control law. The results I get after running the simulation are correct(I have a graph of the output I should expect), but it takes ages to finish!
Is there anything I could do to speed up the process?
In order to improve the computational speed of any integration in Matlab, a few options are available to you:
Reduce the required accuracy (which you already have done)
Use an adapted integrator. As mentioned by #sanchises, sometimes ode23 can be longer than another ode solver in Matlab (if your equation is stiff for instance). You could try to determine which solver is most adapted from the documentation... Or simply try them all!
The best solution, but by far the most time consuming, would be to use a compiled language, such as C or Fortran. If the integration is but a part of your Matlab program, you could use Mex files, and translate only the integration to a compiled language. You could also create dynamic libraries in your compiled language and load them in Matlab using loadlibrary. I use loadlibrary and an integration routine written in Fortran for the integration of orbits and trajectories, and I get over 100 times speedup with Fortran vs. Matlab! Of course, technically, the integration is not in Matlab anymore... But the library or Mex files trick allows you to only convert the integration part of your program to a different language! A number of open source integrators are available, such as ODEPACK or RKSUITE in Fortran. Then, you only need to create a wrapper and your dynamics function in the correct language.
So to put it in a nutshell, if you're going to use this integration a lot, I would advise using a compiled language. If not, you should make do with Matlab, and be patient!

Dealing with under flow while calculating GMM parameters using EM

I am currently runnuing training in matlab on a matrix of logspecrum samples I am constantly dealing with underflow problems.I understood that I need to work with log's in order to deal with underflowing.
I am still strugling with uderflow though , when i calculate the mean (mue) bucause it is negetive i cant work with logs so i need the real values that underflow.
These are equasions i am working with:
In MATLAB code i calulate log_tau in oreder avoid underflow but when calulating mue i need exp(log(tau)) which goes to zero.
I am attaching relevent MATLAB code
**in the code i called the variable alpha is tau ...
for i = 1 : 50
log_c = Logsum(log_alpha,1) - log(N);
c = exp(log_c);
mue = DataMat*alpha./(repmat(exp(Logsum(log_alpha,1)),FrameSize,1));
log_abs_mue = log(abs(mue));
log_SigmaSqr = log((DataMat.^2)*alpha) - repmat(Logsum(log_alpha,1),FrameSize,1) - 2*log_abs_mue;
SigmaSqr = exp(log_SigmaSqr);
for j=1:N
rep_DataMat(:,:,j) = repmat(DataMat(:,j),1,M);
log_gamma(j,:) = log_c - 0.5*(FrameSize*log(2*pi)+sum(log_SigmaSqr)) + sum((rep_DataMat(:,:,j) - mue).^2./(2*SigmaSqr));
end
log_alpha = log_gamma - repmat(Logsum(log_gamma,2),1,M);
alpha = exp(log_alpha);
end
c = exp(log_c);
SigmaSqr = exp(log_SigmaSqr);
does any one see how i can avoid this? or what needs to be fixed in code?
What i did was add this line to the MATLAB code:
mue(isnan(mue))=0; %fix 0/0 problem
and this one:
SigmaSqr(SigmaSqr==0)=1;%fix if mue_k = x_k
not sure if this is the best solution but is seems to work...
any have a better idea?

Viterbi Block Decoding

I don't know if this fits here but here goes:
I have some noisy data encoded using a Hamming Block Code and I'd like to decode them using a Viterbi Decoder.
I made my homework thus I know how the Viterbi Block Decoder works but I'd like to avoid implementing it all by myself as it would take quite some time and might be suboptimal.
My question is the following: do you know of any matlab function for Viterbi Block Decoder?
I found comm.ViterbiDecoder but it is only for convolutional encoding.
In the mean time I tried to retrieve as many information as I could from the encoder in case it might be needed in the future (see code below).
% Parameters
codewordLength = 31;
messageLength = 26;
data = randi([0, 1], 1, messageLength);
% Encoder 1
encoder1 = comm.BCHEncoder(...
'CodewordLength', codewordLength, ...
'MessageLength', messageLength);
% Encoder 2
M = ceil(log2(codewordLength+1));
primitivePolynomialDe = primpoly(M, 'nodisplay');
primitivePolynomialBi = fliplr(de2bi(primitivePolynomialDe));
SL = (2^M-1) - codewordLength;
generatorPolynomial = bchgenpoly(codewordLength + SL, ...
messageLength + SL, primitivePolynomialDe);
encoder2 = comm.BCHEncoder(...
'CodewordLength', codewordLength, ...
'MessageLength', messageLength, ...
'PrimitivePolynomialSource', 'Property', ...
'PrimitivePolynomial', primitivePolynomialBi, ...
'GeneratorPolynomialSource', 'Property', ...
'GeneratorPolynomial', generatorPolynomial);
dataEncoded1 = step(encoder1, data.').';
dataEncoded2 = step(encoder2, data.').';
sum(dataEncoded1~=dataEncoded2)

Resources