How to deal with discrete time system in GEKKO? - gekko

I am dealing with a discrete time system with sampling time of 300s.
My question is that how to express the state equation or output eqatuin like
x(k+1)=A*x(k)+B*u(k)
y(k)=C*x(k)
where x(k) is the state and y(k) is the output. I have all the value of A, B, C matrix.
I found some information about discrete time system on webpage https://apmonitor.com/wiki/index.php/Apps/DiscreteStateSpace
I want to know whether there is another way to express state equation other than
x,y,u = m.state_space(A,B,C,D=None,discrete=True)

The discrete state space model is the preferred way to pose your model. You could also convert your equations to a discrete time series form or to a continuous state space form. These all are equivalent forms. Another way to write your model is to use IMODE=2 (algebraic equations) but this is much more complicated. Here is an example of MIMO identification where we estimate ARX parameters with IMODE=2. I recommend the m.state_space model and to use it with IMODE>=4.
Here is a pendulum state space model example.
and a flight control state space model.
These both use continuous state space models but the methods are similar to what is needed for your application.

Related

How to define lmfit model for complex functions?

I am working on cole cole model which basically exhibits how the permittivity varies with respect to frequency and is given by;
Where, ε_∞ is the higher permittivity,
ε_s is the static permittivity ε_s>ε_∞,
ε_(o) is 8.854e-12,
ω=2πf,
α is the relaxation time 0≪α≪1
σ is the conductivity (can be a CONSTANT)
I tried one of the examples given here https://lmfit.github.io/lmfit-py/examples/example_complex_resonator_model.html but was not able to follow along.
Currently I have frequency vs real permittivity data. I want to fit the model to my measured data by using lmfit and guess the values from the fit. Basically i want to fit the real part of permittivity to the model.

How do I intuitively interpret a sigmoidal neural network model?

There are multiple sources, but they explain at a bit too high a level for me a to actually understand.
Here is my knowledge of how this model works;
We feed-forward information in prior layer's nodes using the weight * value. We do NOT use the sigmoid function here. This is because any hidden layers will force the value to be POSITIVE if we use the sigmoid function here. If it is always positive, then subsequent values can never be less than 0.5.
When we have fed forward to the output, we then use the sigmoid function on the output.
So in total we only use the sigmoid function on the output layer values only.
I will try to include a hopefully not terrible diagram
https://imgur.com/a/4EzkpH5
I have tested with my own code, and evidently it should not be the sigmoid function on every value and weight, but I am unsure if it is just the sum of weight*value
So basically you have a set of features for your model. These features are independent variables which will be responsible for producing of the output. So features are the inputs and the predicted values are the outputs. This is indeed a function.
It is easy to understand neural networks if we study them in terms of functions.
First multiply the feature vector with the vector of weights. Meaning, the dot product of the both vectors must be produced.
The dot product is a scalar if you have a single node ( neuron ). Apply sigmoid function on the product. The output is the final prediction.
The whole model could be expressed as a single composite function like,
y = sigmoid( dot( w , x ) )
Also understanding back propogation ( gradient descent ) for NN makes some intuition if we treat NN as functions.
In the above function,
sigmoid : applies sigmoid activation function to the argument.
dot : returns the dot product of two vectors.
Also, use vector notation as far as possible. It saves you from the confusion related with summations.
Hope it helps.
Activation functions serve an important role in neural network models: they can, given the choice of activation function, grant the network the capability to model non-linear datasets.
The example illustrated in the figure you posted (rendered below) will be limited to model linear problems where the output value is between 0 and 1 (the range of the sigmoidal function). However, the model would support non-linear datasets if the sigmoidal was applied to the two nodes in the middle. StackOverflow is not the place to discuss the theoretic foundation of why this works, instead I recommend looking into some light reading like this ebook: Neural Networks and Deep Learning (no affiliation).
As a side note: the final, output layer of a network are sometimes instantiated as a simple sum, or a ReLU. This will widen the range of the network's output.

Validation Split and Checkpoint Best Model in Keras

Let us use a validation split of 0.3 when fitting a Sequential model. What will be used for validation, the first or the last 30% samples?
Secondly, checkpointing the best model saves the best model weights in .hdf5 file format. Does this mean that, for a certain experiment, the saved model is the best tuned model?
For your first question, the last 30% samples will be used for validation.
From Keras documentation:
validation_split: Float between 0 and 1. Fraction of the training data to be used as validation data. The model will set apart this fraction of the training data, will not train on it, and will evaluate the loss and any model metrics on this data at the end of each epoch. The validation data is selected from the last samples in the x and y data provided, before shuffling
For your second question, I assume that you're talking about ModelCheckpoint with save_best_only=True. In this case, this callback saves the weights of a given epoch only if monitor ('val_loss', by default) is better than the best monitored value. Concretely, this happens here. If monitor is 'val_loss', this should be the tuned model for a particular setting of hyperparameters, according to the validation loss.

Time series / state space model conceptual

I want to predict a value. I have a time series as well as a bunch of other time series that may be interesting to use to augment the prediction.
Someone is arguing with me that it is the same thing to find the correlation between 2 non stationary time series and finding the correlation when making both stationary by some sort of differencing. Their logic is that a state space model doesn't care.
Isn't the whole idea of regression to exploit correlations to predict values? Doesn't there have to exist a correlation to incorporate an explanation of variance in the data and not increase the variance in the predictions? Also, I am 100% convinced that finding the correlation between two non stationary time series without doing anything is wrong.... And you'll end up with correlations to time and not the variables themselves.
Any input is helpful. Thanks.
Depends on the models you're employing later on. You say that there has to exist a correlation or else the variance in the predictions will increase. That might hold for some models. Rather, I'd recommend you to go for models that have some model-election in themselves.
Think of LASSO, for example, that gives sparse vectors for the coefficients. Or think of a model that allows you to calculate Variable Importance and base your decisions on that outcome.
Second, let's do some math:
Correlation original = E[X(t)*Y(t)]
Correlation differencing = E[(X(t)-X(t-1))*(Y(t)-Y(t-1))] = E[X(t)Y(t)] + E[X(t-1)Y(t)] + E[X(t-1)Y(t-1)] + E[X(t)Y(t-1)]
If you assume that one time series is not correlated with the other time-series previous sample, then this reduces to
= E[X(t)Y(t)] + E[X(t-1)Y(t-1)]

How do I constrain the outputs of Gaussian Processes in PYMC?

So I have a very challenging MCMC run I would like to do in PyMC, which I have run several times before for much simpler analyses. However, my newest challenge requires me to combine many different Gaussian Processes in a very specific way, and I don't know enough about Gaussian processes in general or how they are implemented in PyMC to engineer the code I need.
Here is the problem I am trying to tackle:
The data I have is five time series (we'll call them A(t), B(t), C(t), D(t), and E(t)) , each measurement of which has Gaussian/Normal uncertainties. Each of these can be modeled as the product of one series-specific efficiency function and one underlying function shared between all five time series, so A(t) = a(t) * f(t), B(t) = b(t) * f(t), C(t) = c(t) * f(t), etc... I need to measure the posterior for f(t), or more specifically, the posterior of the integral of f(t) dt over a domain.
So I have read over some documentation about implementing Gaussian Processes in PyMC, but I have a few additional wrinkles with my efficiency functions that need to be addressed specifically before I can start coding up my model. Mainly -
1) I have no strong prior about the shape of the efficiency functions a(t), b(t), etc... So long as they vary smoothly there is no shape that is strongly forbidden.
2) These efficiency functions are physically bound to be between 0 and 1 for all times. So while I have no prior on the shape of the curve it has to fall between these bounds. I do have some prior about its typical value but since I need to marginalize over it I can't put too many other constraints on this.
Has anyone out there tackled a similar type of problem before, and what might be the most elegant way to guarantee that my efficiency priors are implemented in this complex MCMC run? I simply don't know enough about Gaussian Processes/Covariance functions to know how to force these constraints on the data.

Resources