How to solve ode dx= A*x + B *u in MATLAB ode, where B matrix have delay (Delay in input)? - controls

I have an ode equation written in matrix format dx=Ax + Bu, where A=[1 2; 3 4] is 2 by 2 matrix; and B*u = [1*v + f; 0.9*v(t-5) +f] where v=8 is step input in 1st row and with delay in 2nd row. f is also an input f=0.05;.
How to implement this in ode matlab code? Another question is how to incorporate input delay in state space model. In s-domain transfer function model it is very easy to incorporate delay as exp(-td*s) or Pade approximation of this.

Related

How to fix skew trapezoidal distribution sampling output sample size

I am trying to generate a skewed trapezoidal distribution using inverse transform sampling.
The inputs are the values where the ramps start and end (a, b, c, d) and the sample size.
a=-3;b=-1;c=1;d=8;
SampleSize=10e4;
h=2/(d+c-a-b);
Then I calculate the ratio of the length of ramps and flat components to get sample size for each:
firstramp=round(((b-a)/(d-a)),3);
flat=round((c-b)/(d-a),3);
secondramp=round((d-c)/(d-a),3);
n1=firstramp*SampleSize; %sample size for first ramp
n3=secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
And then finally I get the histogram from the following code:
quartile1=h/2*(b-a);
quartile2=1-h/2*(d-c);
y1=linspace(0,quartile1,n1);
y2=linspace(quartile1,quartile2,n2);
y3=linspace(quartile2,1,n3);
%inverse cumulative distribution functions
invcdf1=a+sqrt(2*(b-a)/h)*sqrt(y1);
invcdf2=(a+b)/2+y2/h;
invcdf3=d-sqrt(2*(d-c)/h)*sqrt(1-y3);
distr=[invcdf1 invcdf2 invcdf3];
histogram(distr,100)
However the sampling of ramps and flat components are not equal, looks like this:
I fixed this by trial and error, by reducing the sample size of the ramps by half:
n1=0.5*firstramp*SampleSize; %sample size for first ramp
n3=0.5*secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
This made the distribution look like this:
However this makes the output sample less than what is given in input.
I've also tried different combinations of changing the sample sizes of ramps and flat.
This also works:
n1=0.75*firstramp*SampleSize; %sample size for first ramp
n3=0.75*secondramp*SampleSize; %sample size for second ramp
n2=1.5*flat*SampleSize;
It increases the output samples, but it's still not close.
Any help will be appreciated.
Full code:
a=-3;b=-1;c=1;d=8;
SampleSize=10e4;%*1.33333333333333;
h=2/(d+c-a-b);
firstramp=round(((b-a)/(d-a)),3);
flat=round((c-b)/(d-a),3);
secondramp=round((d-c)/(d-a),3);
n1=firstramp*SampleSize; %sample size for first ramp
n3=secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
quartile1=h/2*(b-a);
quartile2=1-h/2*(d-c);
y1=linspace(0,quartile1,.75*n1);
y2=linspace(quartile1,quartile2,1.5*n2);
y3=linspace(quartile2,1,.75*n3);
%inverse cumulative distribution functions
invcdf1=a+sqrt(2*(b-a)/h)*sqrt(y1);
invcdf2=(a+b)/2+y2/h;
invcdf3=d-sqrt(2*(d-c)/h)*sqrt(1-y3);
distr=[invcdf1 invcdf2 invcdf3];
histogram(distr,100)
%end
I don't know Matlab so I was hoping somebody else would jump in on this, but since nobody did here goes.
If I'm reading your code correctly what you did is not an inversion. Inversion is 1-1, i.e., one uniform input produces one outcome. You seem to be using a technique known as the "composition method". In composition the overall distribution is comprised of component pieces, each of which is straightforward to generate. You choose which component to generate from based on their proportions/probabilities relative to the whole. For density functions, probability is found as the area under the density curve, so your first mistake was in sampling the components relative to the width of each component rather than using their areas. The correct sampling proportions are 2/13, 4/13, and 7/13 for what you designated the firstramp, flat, and secondramp components, respectively. A second mistake (which is relatively minor) was to assign exact sample sizes to each of the components. Having probability 2/13 does not mean that exactly 2*SampleSize/13 of your samples will be from the firstramp, it means that's the expected sample size for that component. The expected value of a random variate is not necessarily (or even likely to be) the outcome you will actually get.
In pseudocode, the composition approach would be
generate U ~ Uniform(0,1)
if U <= 2/13:
generate and return a value from firstramp
else if U <= 6/13:
generate and return a value from flat
else:
generate and return a value from secondramp
Note that since each of the generate options will use one or more uniforms, and choosing between the options requires a uniform U, this is not an inversion.
If you want an actual inversion, you need to quantify your density, integrate it to get the cumulative distribution function, then apply the inversion technique by setting F(X) = U and solving for X. Since your distribution is made of distinct components, both the density and cumulative density will be piecewise functions.
After deriving the height based on the requirement that the areas of the two triangles and the flat section must add up to 1, I came up with the following for your density:
| (x + 3) / 13 -3 <= x <= -1
|
f(x) = | 2 / 13 -1 <= x <= 1
|
| 2 * (8 - x) / 91 1 <= x <= 8
Integrating this and collecting terms produces the CDF:
| (x + 3)**2 / 26 -3 <= x <= -1
|
F(x) = | (2 + x) * 2 / 13 -1 <= x <= 1
|
| 6 / 13 + [49 - (x - 8)**2] / 91 1 <= x <= 8
Finally, determining the values of F(x) at the break points between the segments and applying inversion yields the following pseudocode algorithm:
generate U ~ Uniform(0,1)
if U <= 2 / 13:
return 2 * sqrt( (13 * U) / 2 ) - 3
else if U <= 6 / 13:
return (13 * U) / 2 - 2:
else:
return 8 - sqrt( 91 * (1 - U) )
Note that this is a true inversion. The outcome is determined by generating a single U, and transforming it in different ways depending on which range it falls in.

I don't understand how the coefficients for the Givens rotation matrix have been coded here

I found this fortran90 code for finding the eigenvalues and eigenvectors of a real symmetric matrix using the Jacobi algorithm. I understand the general concept but, there are 3-4 lines that I don't understand.
Here is a piece of the code :
1 do while (b2.gt.abserr)
2 do i=1,n-1
3 do j=i+1,n
4 if (a(j,i)**2 <= bar) cycle ! do not touch small elements
5 b2 = b2 - 2.0*a(j,i)**2
6 bar = 0.5*b2/float(n*n)
7 ! calculate coefficient c and s for Givens matrix
8 beta = (a(j,j)-a(i,i))/(2.0*a(j,i))
9 coeff = 0.5*beta/sqrt(1.0+beta**2)
10 s = sqrt(max(0.5+coeff,0.0))
11 c = sqrt(max(0.5-coeff,0.0))
So, b2 is defined as the sum of the squares of the off–diagonal entries; abserr is the threshold for the precision; bar is another threshold in order to take the biggest entry in absolute value we want to zeroed and a(i,j) are the entries of the initial matrix.
What I don't understand would be the line 9,10 and 11. In every courses, videos, pdf and so on, I have read or watched, the coefficient coeff here ( t most of time) is never defined like this. Instead, it is defined like t = 1.0/(abs(beta) + sqrt(beta**2 + 1.0)) and if beta is negative then t = -t. And therefore, I'm also not sure about lines 10 and 11 since the values of c and s are related to beta.
If anyone knows the Jacobi eigenvalue method for symmetric matrix and understand the code above, I would highly appreciate any explanation.

Parallelising gradient calculation in Julia

I was persuaded some time ago to drop my comfortable matlab programming and start programming in Julia. I have been working for a long with neural networks and I thought that, now with Julia, I could get things done faster by parallelising the calculation of the gradient.
The gradient need not be calculated on the entire dataset in one go; instead one can split the calculation. For instance, by splitting the dataset in parts, we can calculate a partial gradient on each part. The total gradient is then calculated by adding up the partial gradients.
Though, the principle is simple, when I parallelise with Julia I get a performance degradation, i.e. one process is faster then two processes! I am obviously doing something wrong... I have consulted other questions asked in the forum but I could still not piece together an answer. I think my problem lies in that there is a lot of unnecessary data moving going on, but I can't fix it properly.
In order to avoid posting messy neural network code, I am posting below a simpler example that replicates my problem in the setting of linear regression.
The code-block below creates some data for a linear regression problem. The code explains the constants, but X is the matrix containing the data inputs. We randomly create a weight vector w which when multiplied with X creates some targets Y.
######################################
## CREATE LINEAR REGRESSION PROBLEM ##
######################################
# This code implements a simple linear regression problem
MAXITER = 100 # number of iterations for simple gradient descent
N = 10000 # number of data items
D = 50 # dimension of data items
X = randn(N, D) # create random matrix of data, data items appear row-wise
Wtrue = randn(D,1) # create arbitrary weight matrix to generate targets
Y = X*Wtrue # generate targets
The next code-block below defines functions for measuring the fitness of our regression (i.e. the negative log-likelihood) and the gradient of the weight vector w:
####################################
## DEFINE FUNCTIONS ##
####################################
#everywhere begin
#-------------------------------------------------------------------
function negative_loglikelihood(Y,X,W)
#-------------------------------------------------------------------
# number of data items
N = size(X,1)
# accumulate here log-likelihood
ll = 0
for nn=1:N
ll = ll - 0.5*sum((Y[nn,:] - X[nn,:]*W).^2)
end
return ll
end
#-------------------------------------------------------------------
function negative_loglikelihood_grad(Y,X,W, first_index,last_index)
#-------------------------------------------------------------------
# number of data items
N = size(X,1)
# accumulate here gradient contributions by each data item
grad = zeros(similar(W))
for nn=first_index:last_index
grad = grad + X[nn,:]' * (Y[nn,:] - X[nn,:]*W)
end
return grad
end
end
Note that the above functions are on purpose not vectorised! I choose not to vectorise, as the final code (the neural network case) will also not admit any vectorisation (let us not get into more details regarding this).
Finally, the code-block below shows a very simple gradient descent that tries to recover the parameter weight vector w from the given data Y and X:
####################################
## SOLVE LINEAR REGRESSION ##
####################################
# start from random initial solution
W = randn(D,1)
# learning rate, set here to some arbitrary small constant
eta = 0.000001
# the following for-loop implements simple gradient descent
for iter=1:MAXITER
# get gradient
ref_array = Array(RemoteRef, nworkers())
# let each worker process part of matrix X
for index=1:length(workers())
# first index of subset of X that worker should work on
first_index = (index-1)*int(ceil(N/nworkers())) + 1
# last index of subset of X that worker should work on
last_index = min((index)*(int(ceil(N/nworkers()))), N)
ref_array[index] = #spawn negative_loglikelihood_grad(Y,X,W, first_index,last_index)
end
# gather the gradients calculated on parts of matrix X
grad = zeros(similar(W))
for index=1:length(workers())
grad = grad + fetch(ref_array[index])
end
# now that we have the gradient we can update parameters W
W = W + eta*grad;
# report progress, monitor optimisation
#printf("Iter %d neg_loglikel=%.4f\n",iter, negative_loglikelihood(Y,X,W))
end
As is hopefully visible, I tried to parallelise the calculation of the gradient in the easiest possible way here. My strategy is to break the calculation of the gradient in as many parts as available workers. Each worker is required to work only on part of matrix X, which part is specified by first_index and last_index. Hence, each worker should work with X[first_index:last_index,:]. For instance, for 4 workers and N = 10000, the work should be divided as follows:
worker 1 => first_index = 1, last_index = 2500
worker 2 => first_index = 2501, last_index = 5000
worker 3 => first_index = 5001, last_index = 7500
worker 4 => first_index = 7501, last_index = 10000
Unfortunately, this entire code works faster if I have only one worker. If add more workers via addprocs(), the code runs slower. One can aggravate this issue by create more data items, for instance use instead N=20000.
With more data items, the degradation is even more pronounced.
In my particular computing environment with N=20000 and one core, the code runs in ~9 secs. With N=20000 and 4 cores it takes ~18 secs!
I tried many many different things inspired by the questions and answers in this forum but unfortunately to no avail. I realise that the parallelisation is naive and that data movement must be the problem, but I have no idea how to do it properly. It seems that the documentation is also a bit scarce on this issue (as is the nice book by Ivo Balbaert).
I would appreciate your help as I have been stuck for quite some while with this and I really need it for my work. For anyone wanting to run the code, to save you the trouble of copying-pasting you can get the code here.
Thanks for taking the time to read this very lengthy question! Help me turn this into a model answer that anyone new in Julia can then consult!
I would say that GD is not a good candidate for parallelizing it using any of the proposed methods: either SharedArray or DistributedArray, or own implementation of distribution of chunks of data.
The problem does not lay in Julia, but in the GD algorithm.
Consider the code:
Main process:
for iter = 1:iterations #iterations: "the more the better"
δ = _gradient_descent_shared(X, y, θ)
θ = θ - α * (δ/N)
end
The problem is in the above for-loop which is a must. No matter how good _gradient_descent_shared is, the total number of iterations kills the noble concept of the parallelization.
After reading the question and the above suggestion I've started implementing GD using SharedArray. Please note, I'm not an expert in the field of SharedArrays.
The main process parts (simple implementation without regularization):
run_gradient_descent(X::SharedArray, y::SharedArray, θ::SharedArray, α, iterations) = begin
N = length(y)
for iter = 1:iterations
δ = _gradient_descent_shared(X, y, θ)
θ = θ - α * (δ/N)
end
θ
end
_gradient_descent_shared(X::SharedArray, y::SharedArray, θ::SharedArray, op=(+)) = begin
if size(X,1) <= length(procs(X))
return _gradient_descent_serial(X, y, θ)
else
rrefs = map(p -> (#spawnat p _gradient_descent_serial(X, y, θ)), procs(X))
return mapreduce(r -> fetch(r), op, rrefs)
end
end
The code common to all workers:
#= Returns the range of indices of a chunk for every worker on which it can work.
The function splits data examples (N rows into chunks),
not the parts of the particular example (features dimensionality remains intact).=#
#everywhere function _worker_range(S::SharedArray)
idx = indexpids(S)
if idx == 0
return 1:size(S,1), 1:size(S,2)
end
nchunks = length(procs(S))
splits = [round(Int, s) for s in linspace(0,size(S,1),nchunks+1)]
splits[idx]+1:splits[idx+1], 1:size(S,2)
end
#Computations on the chunk of the all data.
#everywhere _gradient_descent_serial(X::SharedArray, y::SharedArray, θ::SharedArray) = begin
prange = _worker_range(X)
pX = sdata(X[prange[1], prange[2]])
py = sdata(y[prange[1],:])
tempδ = pX' * (pX * sdata(θ) .- py)
end
The data loading and training. Let me assume that we have:
features in X::Array of the size (N,D), where N - number of examples, D-dimensionality of the features
labels in y::Array of the size (N,1)
The main code might look like this:
X=[ones(size(X,1)) X] #adding the artificial coordinate
N, D = size(X)
MAXITER = 500
α = 0.01
initialθ = SharedArray(Float64, (D,1))
sX = convert(SharedArray, X)
sy = convert(SharedArray, y)
X = nothing
y = nothing
gc()
finalθ = run_gradient_descent(sX, sy, initialθ, α, MAXITER);
After implementing this and run (on 8-cores of my Intell Clore i7) I got a very slight acceleration over serial GD (1-core) on my training multiclass (19 classes) training data (715 sec for serial GD / 665 sec for shared GD).
If my implementation is correct (please check this out - I'm counting on that) then parallelization of the GD algorithm is not worth of that. Definitely you might get better acceleration using stochastic GD on 1-core.
If you want to reduce the amount of data movement, you should strongly consider using SharedArrays. You could preallocate just one output vector, and pass it as an argument to each worker. Each worker sets a chunk of it, just as you suggested.

Gaussian Mixture Model - Matlab training for parameters

I am running a speech enhancement algorithm based on Gaussian Mixture Model. The problem is that the estimation algorithm underflows during the training processing.
I am trying to calculate the PDF of a log spectrum frame X given a Gaussian cluster which is a product of the PDF of each frequnecy component X_k (fft is done for k=1..256)
what i get is a product of 256 exp(-v(k)) such that v(k)>=0
Here is a snippet of the MATLAB calculation:
N - number of frames; M- number of mixtures; c_i weight for each mixture;
gamma(n,i) = c_i*f(X_n|I = i)
for i=1 : N
rep_DataMat(:,:,i) = repmat(DataMat(:,i),1,M);
gamma_exp(:,:) = (1./sqrt((2*pi*sigmaSqr_curr))).*exp(((-1)*((rep_DataMat(:,:,i) - mue_curr).^2)./(2*sigmaSqr_curr)));
gamma_curr(i,:) = c_curr.*(prod(10*gamma_exp(:,:),1));
alpha_curr(i,:) = gamma_curr(i,:)./sum(gamma_curr(i,:));
end
The product goes quickly to zero due to K = 256 since the numbers being smaller then one. Is there a way I can calculate this with causing an underflow (like logsum or similar)?
You can perform the computations in the log domain.
The conversion of products into sums is straightforward.
Sums on the other hand can be converted with something such as logsumexp.
This works using the formula:
log(a + b) = log(exp(log(a)) + exp(log(b)))
= log(exp(loga) + exp(logb))
Where loga and logb are the respective representation of a and b in the log domain.
The basic idea is then to factorize the exponent with the largest argument (eg. loga for sake of illustration):
log(exp(loga)+exp(logb)) = log(exp(loga)*(1+exp(logb-loga)))
= loga + log(1+exp(logb-loga))
Note that the same idea applies if you have more than 2 terms to add.

Using Fourier Analysis to fit function to data

I have 24 values for Y and corresponding 24 values for the Y values are measured experimentally,
while t has values : t=[1,2,3........24]
I want to find the relationship between Y and t as an equation using Fourier analysis,
what I have tried and done is:
I wrote the following MATLAB code:
Y=[10.6534
9.6646
8.7137
8.2863
8.2863
8.7137
9.0000
9.5726
11.0000
12.7137
13.4274
13.2863
13.0000
12.7137
12.5726
13.5726
15.7137
17.4274
18.0000
18.0000
17.4274
15.7137
14.0297
12.4345];
ts=1; % step
t=1:ts:24; % the period is 24
f=[-length(t)/2:length(t)/2-1]/(length(t)*ts); % computing frequency interval
M=abs(fftshift(fft(Y)));
figure;plot(f,M,'LineWidth',1.5);grid % plot of harmonic components
figure;
plot(t,Y,'LineWidth',1.5);grid % plot of original data Y
figure;bar(f,M);grid % plot of harmonic components as bar shape
the results of the bar figure is:
Now, I want to find the equation for these harmonic components which represent the data. After that I want to draw the original data Y with the data found from the fitting function and the two curves should be close to each other.
Should I use cos or sin or -sin or -cos?
In another way, what is the rule to represent these harmonics as a function: Y = f (t) ?
An example done with your data and Mathematica using Discrete sine transform. Hope you can extrapolate to Matlab:
n = 24;
xg = N[Range[n]]/n
fg = l (*your list *)
fp = ListPlot[Transpose[{xg, fg}], PlotRange -> All] (*points plot*)
coef = FourierDST[fg, 1]/Sqrt[n/2]; (*Fourier transform*)
Show[fp, Plot[Sum[coef[[r]]*Sin[Pi r x], {r, n - 1}], {x, -1, 1},
PlotRange -> All]]
The coefficients are:
{16.6411, -4.00062, 5.31557, -1.38863, 2.89762, 0.898562,
1.54402, -0.116046, 1.54847, 0.136079, 1.16729, 0.156489,
0.787476, -0.0879736, 0.747845, 0.00903859, 0.515012, 0.021791,
0.35001, 0.0159676, 0.215619, 0.0122281, 0.0943376, -0.00150218}
More detailed view:
Edit
However, as an even function seems to be better, I made also a discrete fourier cosine transform of type 3, which works much better:
In this case the coefficients are:
{14.7384, -8.93197, 4.56404, -2.85262, 2.42847, -0.249488,
0.565181,-0.848594, 0.958699, -0.468337, 0.660136, -0.317903,
0.390689,-0.457621, 0.427875, -0.260669, 0.278931, -0.166846,
0.18547, -0.102438, 0.111731, -0.0425396, 0.0484102, -0.00559378}
And the plotting of coeffs and function are obtained by:
coef = FourierDCT[fg, 3]/Sqrt[n];(*Fourier transform*)
f[x_]:= Sum[coef[[r]]*Cos[Pi (r - 1/2) x], {r, n - 1}]
You'll have to experiment a little ...
Depends on what MATLAB gave you back. It's either sine and cosine or a complex exponential.
Most FFT algorithms that I know of usually demand that the number of data points be an integer power of two. The closest one for your data set is 32, so you should pad it out with zeros.
Thanks for your help.
I found the solution I was aiming to get but for some reason everything is shifted by 1
Here is the code:
ts = 1; % time step
t = [1:ts:24];
fs = 1/ts; % frequency step
f=[-length(t)/2:length(t)/2-1]/(length(t)*ts); % frequency formula
%data
P=[10.7083
9.7003
8.9780
8.4531
8.1653
8.2633
8.8795
9.9850
11.3289
12.5172
13.2012
13.2720
12.9435
12.6647
12.8940
13.8516
15.3819
17.0033
18.1227
18.3039
17.4531
15.8322
13.9056
12.1154];
plot(t,P,'LineWidth',1.5);grid
xlabel('time (hours)');ylabel('Power (MW)')
title('Power Profile for 2nd Feb, 1998')
% fourier transform analysis
P1 = fft(P)/length(t);
P2=fftshift(P1);
amp=abs(P2); % amplitude
phi = angle(P2); % phase angle
figure
subplot(211),stem(f,amp,'LineWidth',1.5),grid
xlabel('frequency (Hz)');ylabel('amplitude (MW)')
subplot(212),stem(f,phi,'LineWidth',1.5),grid
xlabel('frequency (Hz)');ylabel('phase angle (rad)')
% NOW, I WILL CONSTRUCT THE MODEL FROM THE FIGURE
% THE STRUCTURE IS:
% Pmodel=Ai*COS(i*w*t+phii)
% where, w=2*pi/24 and i is the harmonic order
% Here, up to the third harmonic is enough
% and using Parseval's Theorem, the model is:
% PP=12.6635+2*(1.9806*cos(w*tt+1.807)+0.86388*cos(2*w*tt+2.0769)+0.39683*cos(3*w*tt- 1.8132));
w=2*pi/24;
Pmodel=12.6635+2*(1.9806*cos(w*t+1.807)+0.86388*cos(2*w*t+2.0769)+0.39686*cos(3*w*t-1.8132));
figure
plot(t,P,'LineWidth',1.5);grid on
hold on;
plot(t,Pmodel,'r','LineWidth',1.5)
legend('original','model');xlabel('time (hours )');ylabel('Power (MW)')
% But here is a problem, the modeled signal is shifted
% by 1 comparing to the original one
% I redraw the two figures together by plotting Pmodeled vs t+1
% Actually, I don't know why it is shifted, but they are
% exactly identical with shifting by 1
figure
plot(t,P,'LineWidth',1.5);grid on
hold on;
plot(t+1,Pmodel,'r','LineWidth',1.5)
legend('original','model');xlabel('time (hours )');ylabel('Power (MW)')
Why has this shifting problem happened, and how can I solve it?
The problem is with
line 2
"t = [1:ts:24];"
it should be "t= 0:ts:23;"

Resources