I have a complex dynamical system which takes input as x1, x2, x3 and gives output as y1, y2, y3. I don't have any mathematical model of the system. x(k) is the present input to the system and y(k) is the present output of the system.
My Objective is to find x1(k+1),x2(k+1),x3(k+1) to maximize y1(k+1) + y2(k+1) + y3(k+1). I can only access present state(k) and past state(k-1) values.
Intuitively y_i(k)=f(x1(k),x2(k),x3(k)) where i=1,2,3 .
The system only take inputs s.t a < x1,x2,x3 < b.
Is there any online machine learning algorithm which can be applied to solve this problem?
I really appreciate any feedback.
Thanks and regards
Marcella
Related
I'm writing my thesis about attention mechanisms. In the paragraph in which I explain the decoder of transformer I wrote this:
The first sub-layer is called masked self-attention, in which the masking operation consists in preventing the decoder from paying attention to subsequent words.
That is to say, while training a transformer for translation purposes, it is possible to access the target translation; on the other hand, during the inference, that is the translation of new sentences, it is not possible to access the target translation. Therefore, when calculating the probabilities of the next word in the sequence, the network must not access that word. Otherwise, the translation task would be banal and the network would not learn to predict the translation correctly.
I don't know if I said something wrong also in the previous part, but my professor thinks I made mistakes in the following part:
To understand in a simple way the functioning of the masked self-attention level, let's go back to the example “Orlando Bloom loves Miranda Kerr” (x1 x2 x3 x4 x5).
If we consider the inputs as vectors x1, x2. x3. x4. x5 and we want to translate the word x3 corresponding to "loves", you need to make sure that the following words x4 and x5 do not influence the translation y3. To prevent this influence, masking sets the weights of x4 and x5 to zero. Then a normalization of the weights is performed so that the sum of the elements of each column in the matrix is equal to 1. The result is a matrix with normalized weights in each column.
Can someone please tell me where the miskates are?
I am quite new to this platform and I was searching for an aid for my currently running project. I currently have some problems over writing a sum function in a sum function in CPLEX.
To give a brief information about my problem, here goes a tiny part of my decision variables and my objective function:
dvar boolean y[Amount][Address][Floor][Lane];
minimize sum(i in Amount, j in Address, k in Floor, l in Lane) y[i][j][k][l];
As parameters, I do not face any trouble except the Address parameter. I have the Address parameter in a form as follows:
The general formulation is Address[i], and I have Address[1]=40 , Address[2]=12 , Address [3]=24 etc...
I need to implement the Adress[i] parameter to my decision variable and objective function. So I definitely need to change the Address part to Address[i] and need to have another sum in the objective function. The following one was my idea:
minimize sum(i in Amount, j in (sum(i in Address[i]), k in Floor, l in Lane) y[i][j][k][l];
But CPLEX does not accept this syntax. It says I have an "syntax error, unexpected ','" usage. The ',' is the one that comes after "j in (sum(i in Address[i])". I can clearly see that I am not able to code down my idea in the given form, and I was wondering if it is possible to have such a sum function in a sum function. I took a look at the internet links but I failed to find sufficient information about my situation.
So, is it possible to implement a sum in another sum function?
I am very sorry if this problem was asked before, but I couldn't really find something sufficient. Thank you for your kind answers and mind blowing advices. You are the bests.
Regards,
I'm a beginner of Prolog. And I'm wondering how to generate normal distributed random numbers in Prolog.
What I know is using the maybe from library(random) one can set up the probabilities. But what about when it comes to random distributions?
In general, languages provide you with a uniform distribution over 0 to 1. There are various algorithms for getting from that uniform distribution to another distribution, but this case is particularly common so there are a few ways to do it.
If you need a modest amount of random values in a normal distribution, the Box-Muller transform is a very simple algorithm, it amounts to a little math on a few uniform random values:
random_normal(N) :-
random(U1), random(U2),
Z0 is sqrt(-2 * log(U1)) * cos(2*pi*U2),
Z1 is sqrt(-2 * log(U1)) * sin(2*pi*U2),
(N = Z0 ; N = Z1).
This algorithm consumes two uniform values and produces two normal values. I'm providing both solutions. Other ways of doing this might be better for some applications. For instance, you could use asserta/1 and retract/1 to cache the second value and use it without computing, though messing around in the dynamic store may be about as bad as doing the other work (you'd have to benchmark it). Here's the use:
?- random_normal(Z).
Z = -1.2418135230345024 ;
Z = -1.1135242997982466.
?- random_normal(Z).
Z = 0.6266801862581797 ;
Z = -0.4934840828548163.
?- random_normal(Z).
Z = 0.5525713772053663 ;
Z = -0.7118660644436128.
I'm not greatly confident of this but it may get you over the hump.
If you are using SWI-Prolog or SWISH then another option would be to use embedded R, which gives you a lot of flexibility with stats and probabilities.
http://swish.swi-prolog.org/example/Rserve.swinb
The R project provides statistical computing and data vizualization. SWISH can access R through Rserve.
Integrative statistics with R:
Real is a c-based interface for connecting R to Prolog. See the documentation at doc/html/real.html for more information. There is also a paper [1] and a user's guide in doc/guide.pdf.
Real works on current versions of SWI and YAP. As of version 1.1 there is support for using Real on SWI web-servers.
I've looked and tried but i cant find anything really helpful so thank you in advance.
My problem is i have a changing variable, "balance" for the moment i have it represented as 200. I need to use this equation to find how much money i should withdraw in a game, but I don't know how to write a LUA script that solves algebra
The equation is: 200/(x+x^2+x^3+x^4+x^5)=0.00001001 how would i set about solving for x?
I have tried adding .0000001 if 200/(x+x^2+x^3+x^4+x^5) doesn't equal 0.00001001 but it is very impractical and I haven't gotten it to work. This is The only way I can come up with at the moment. Any help would be appreciated.
This solution finds zero of any continuous function (not only algebraical and not only differentiable) and requires knowing the diapazone of the root to be found.
local function find_zero(f, x_left, x_right, eps)
eps = eps or 0.0000000001 -- precision
local f_left, f_right = f(x_left), f(x_right)
assert(x_left <= x_right and f_left * f_right <= 0, "Wrong diapazone")
while x_right - x_left > eps do
local x_middle = (x_left + x_right) / 2
local f_middle = f(x_middle)
if f_middle * f_left > 0 then
x_left, f_left = x_middle, f_middle
else
x_right, f_right = x_middle, f_middle
end
end
return (x_left + x_right) / 2
end
local function my_func(x)
return 200/(x+x^2+x^3+x^4+x^5) - 0.00001001
end
-- Assuming that the root is between 1 and 1000
local x = find_zero(my_func, 1.0, 1000.0)
print(x) --> 28.643931367544
200/(x+x^2+x^3+x^4+x^5)=0.00001001 is equivalent to 200 = 0.00001001 * (x+x^2+x^3+x^4+x^5), so you have a polynomial equation to solve, and traditionally it is this form of the equation that people like to deal with.
If you want to stay in Lua, then if the form of the equation is predictable enough that you can find a place where the right side is always less than the left (e.g. x = 0) and a place where the right sight is always greater than the left (e.g. very large values of x) then you can use binary search - not terribly efficient, but certain and easy to code.
For general polynomial equations, one well known method is https://en.wikipedia.org/wiki/Newton's_method. Given f(x) = 0 and a guess for x, a better guess might be x - f(x) / f'(x), where f'(x) is the derivative of f(x). There are a few pathological cases where this fails for various reasons, though, so again you probably want to know that your equations is reliably tractable.
Since you have Lua, you may be able to bring in C code that calls out to a maths library such as http://commons.apache.org/proper/commons-math/. They have a routine called LaguerreSolver() which will reasonably reliably solve polynomial equations for you, defending itself against all of the pathological cases. Most math libraries contain a lot more work than any single person is likely to put in for an individual problem, and are of correspondingly higher quality than do it yourself approach such as I describe above.
I am a Mechanical engineer with a computer scientist question. This is an example of what the equations I'm working with are like:
x = √((y-z)×2/r)
z = f×(L/D)×(x/2g)
f = something crazy with x in it
etc…(there are more equations with x in it)
The situation is this:
I need r to find x, but I need x to find z. I also need x to find f which is a part of finding z. So I guess a value for x, and then I use that value to find r and f. Then I go back and use the value I found for r and f to find x. I keep doing this until the guess and the calculated are the same.
My question is:
How do I get the computer to do this? I've been using mathcad, but an example in another language like C++ is fine.
The very first thing you should do faced with iterative algorithms is write down on paper the sequence that will result from your idea:
Eg.:
x_0 = ..., f_0 = ..., r_0 = ...
x_1 = ..., f_1 = ..., r_1 = ...
...
x_n = ..., f_n = ..., r_n = ...
Now, you have an idea of what you should implement (even if you don't know how). If you don't manage to find a closed form expression for one of the x_i, r_i or whatever_i, you will need to solve one dimensional equations numerically. This will imply more work.
Now, for the implementation part, if you never wrote a program, you should seriously ask someone live who can help you (or hire an intern and have him write the code). We cannot help you beginning from scratch with, eg. C programming, but we are willing to help you with specific problems which should arise when you write the program.
Please note that your algorithm is not guaranteed to converge, even if you strongly think there is a unique solution. Solving non linear equations is a difficult subject.
It appears that mathcad has many abstractions for iterative algorithms without the need to actually implement them directly using a "lower level" language. Perhaps this question is better suited for the mathcad forums at:
http://communities.ptc.com/index.jspa
If you are using Mathcad, it has the functionality built in. It is called solve block.
Start with the keyword "given"
Given
define the guess values for all unknowns
x:=2
f:=3
r:=2
...
define your constraints
x = √((y-z)×2/r)
z = f×(L/D)×(x/2g)
f = something crazy with x in it
etc…(there are more equations with x in it)
calculate the solution
find(x, y, z, r, ...)=
Check Mathcad help or Quicksheets for examples of the exact syntax.
The simple answer to your question is this pseudo-code:
X = startingX;
lastF = Infinity;
F = 0;
tolerance = 1e-10;
while ((lastF - F)^2 > tolerance)
{
lastF = F;
X = ?;
R = ?;
F = FunctionOf(X,R);
}
This may not do what you expect at all. It may give a valid but nonsense answer or it may loop endlessly between alternate wrong answers.
This is standard substitution to convergence. There are more advanced techniques like DIIS but I'm not sure you want to go there. I found this article while figuring out if I want to go there.
In general, it really pays to think about how you can transform your problem into an easier problem.
In my experience it is better to pose your problem as a univariate bounded root-finding problem and use Brent's Method if you can
Next worst option is multivariate minimization with something like BFGS.
Iterative solutions are horrible, but are more easily solved once you think of them as X2 = f(X1) where X is the input vector and you're trying to reduce the difference between X1 and X2.
As the commenters have noted, the mathematical aspects of your question are beyond the scope of the help you can expect here, and are even beyond the help you could be offered based on the detail you posted.
However, I think that even if you understood the mathematics thoroughly there are computer science aspects to your question that should be addressed.
When you write your code, try to make organize it into functions that depend only upon the parameters you are passing in to a subroutine. So write a subroutine that takes in values for y, z, and r and returns you x. Make another that takes in f,L,D,G and returns z. Now you have testable routines that you can check to make sure they are computing correctly. Check the input values to your routines in the routines - for instance in computing x you will get a divide by 0 error if you pass in a 0 for r. Think about how you want to handle this.
If you are going to solve this problem interatively you will need a method that will decide, based on the results of one iteration, what the values for the next iteration will be. This also should be encapsulated within a subroutine. Now if you are using a language that allows only one value to be returned from a subroutine (which is most common computation languages C, C++, Java, C#) you need to package up all your variables into some kind of data structure to return them. You could use an array of reals or doubles, but it would be nicer to choose to make an object and then you can reference the variables by their name and not their position (less chance of error).
Another aspect of iteration is knowing when to stop. Certainly you'll do so when you get a solution that converges. Make this decision into another subroutine. Now when you need to change the convergence criteria there is only one place in the code to go to. But you need to consider other reasons for stopping - what do you do if your solution starts diverging instead of converging? How many iterations will you allow the run to go before giving up?
Another aspect of iteration of a computer is round-off error. Mathematically 10^40/10^38 is 100. Mathematically 10^20 + 1 > 10^20. These statements are not true in most computations. Your calculations may need to take this into account or you will end up with numbers that are garbage. This is an example of a cross-cutting concern that does not lend itself to encapsulation in a subroutine.
I would suggest that you go look at the Python language, and the pythonxy.com extensions. There are people in the associated forums that would be a good resource for helping you learn how to do iterative solving of a system of equations.