Assessing code for systems modeling and analysis - algorithm

I'm currently studying some practice coding questions for a systems modeling and analysis course. The code is as follows:
home
num_samples = 20;
agents = nan(1,num_samples); %Create a new empty array
for i=1:num_samples
agents(i) = rand(1); %random number between 0 and 1
end
for time=1:1000
for current=1:num_samples
x = 0.0;
for other=1:num_samples
x = x+agents(other);
end
pivot = x/num_samples;
rtmp = rand(1);
if agents(current) < pivot
agents(current) = agents(current)+rtmp;
else
agents(current) = agents(current)-rtmp;
end
end
end
I am expected to answer the following questions only by READING the code, not actually running it:
How do the agents affect one another?
What long-term behaviors will they show (or not show)?
Name one system this model could (roughly) represent.
Having some trouble figuring out what this code is actually doing. I think its mainly the pivot variable that's throwing me off. By the time pivot is calculated, x represents the cumulative sum of agents, so I dont really know what dividing it by its length (num_samples=20) actually gives you. How the agents affect one another depends on what pivot is doing, I think, and that should certainly help me answer at least the first two questions right?
Any insight would be very appreciated.

Related

How does one do Algebra in Lua?

I've looked and tried but i cant find anything really helpful so thank you in advance.
My problem is i have a changing variable, "balance" for the moment i have it represented as 200. I need to use this equation to find how much money i should withdraw in a game, but I don't know how to write a LUA script that solves algebra
The equation is: 200/(x+x^2+x^3+x^4+x^5)=0.00001001 how would i set about solving for x?
I have tried adding .0000001 if 200/(x+x^2+x^3+x^4+x^5) doesn't equal 0.00001001 but it is very impractical and I haven't gotten it to work. This is The only way I can come up with at the moment. Any help would be appreciated.
This solution finds zero of any continuous function (not only algebraical and not only differentiable) and requires knowing the diapazone of the root to be found.
local function find_zero(f, x_left, x_right, eps)
eps = eps or 0.0000000001 -- precision
local f_left, f_right = f(x_left), f(x_right)
assert(x_left <= x_right and f_left * f_right <= 0, "Wrong diapazone")
while x_right - x_left > eps do
local x_middle = (x_left + x_right) / 2
local f_middle = f(x_middle)
if f_middle * f_left > 0 then
x_left, f_left = x_middle, f_middle
else
x_right, f_right = x_middle, f_middle
end
end
return (x_left + x_right) / 2
end
local function my_func(x)
return 200/(x+x^2+x^3+x^4+x^5) - 0.00001001
end
-- Assuming that the root is between 1 and 1000
local x = find_zero(my_func, 1.0, 1000.0)
print(x) --> 28.643931367544
200/(x+x^2+x^3+x^4+x^5)=0.00001001 is equivalent to 200 = 0.00001001 * (x+x^2+x^3+x^4+x^5), so you have a polynomial equation to solve, and traditionally it is this form of the equation that people like to deal with.
If you want to stay in Lua, then if the form of the equation is predictable enough that you can find a place where the right side is always less than the left (e.g. x = 0) and a place where the right sight is always greater than the left (e.g. very large values of x) then you can use binary search - not terribly efficient, but certain and easy to code.
For general polynomial equations, one well known method is https://en.wikipedia.org/wiki/Newton's_method. Given f(x) = 0 and a guess for x, a better guess might be x - f(x) / f'(x), where f'(x) is the derivative of f(x). There are a few pathological cases where this fails for various reasons, though, so again you probably want to know that your equations is reliably tractable.
Since you have Lua, you may be able to bring in C code that calls out to a maths library such as http://commons.apache.org/proper/commons-math/. They have a routine called LaguerreSolver() which will reasonably reliably solve polynomial equations for you, defending itself against all of the pathological cases. Most math libraries contain a lot more work than any single person is likely to put in for an individual problem, and are of correspondingly higher quality than do it yourself approach such as I describe above.

How to save output of for loop operation in matlab

I have a matrix A which has a size of 54x100. For some specific condition I perform an operation on each row of A. I need to save the output of this for loop. I've tried the following but it did not work.
S=zeros(54,100);
for i=1:54;
Ri=A(i,:);
answer=mean(reshape(Ri,5,20),1);
S(i)=answer;
end
Firstly, judging by your question I'd recommend some basic Matlab tutorials like this or just detailed documentation like this.
To actually help you with your issue though; you can do this:
%% Make up A (since I don't know what it actually is)
n = 54; m = 100;
A = randn(n,m); % N x m matrix of random numbers
%% Loop over each row of A
S = cell(n,1);
for j = 1:n;
Rj = A(j,:); % j'th row
answer = mean(reshape(Rj,5,20),1); % some operation
S{j} = answer; % store the answer in cell S
end
The problem was that your answer was not a single number (1x1 matrix) but a vector and so you got a dimension mismatch error. Above I'm putting the answers into a cell object of size n. The result of your operation on j'th row can then be retrieved by calling S{j}.
Also:
Do not using i as an iterator since it also represents the imaginary unit.
Do not hard-code values but reference the existing ones. For example here I referenced n in the for-loop declaration as opposed to just writing for j = 1:54 because otherwise, if I got struck by a fancy to use my code for a 53x100 array it would not work anymore.
When you post your code I reccomend adding a minimal working example - a pece of code which people can just copy and paste into their Matlab (or whatever interpreter of whatever language) and run to reproduce your problem. Here you have not included anything which tells the code what A is, for example.
This is quite a good read in general and should help you in the future

time series simulation and logical checking with Matlab or with other tools

1) I have time series data and signals (indicators) that their value changes over time.
My question:
2) I need to do logical checking all the time, e.g. if signal 1 and 2 happened around the same time (were equal to a certain value e.g.=1) then I need to know the exact time in order to check what happened next.
3) to complicate things,e.g. if signal 3 happened in some time range after signal 1 and signal 2 were equal to 1, I would like to check other things.
4)The time series is very long and I need to deal with it segment by segment.
Please advice how to write it without inventing the wheel.
Is it recommended to write it in Matlab?, using a state machine? in C++?, using threads?
5) Does Matlab have a simulator ready for this kind of things?
How do I define the logical conditions in an efficient way?
6) Can I use data mining tools for this?
I saw this list of tools:
Data Mining open source tools
not sure where to start.
Thanks
The second and third question could be done like this in Matlab:
T = -range; % Assuming that t starts at 0.
for n = 1 : length(t)
if signal1(n) == 1 && signal2(n) == 1
T = t(n);
end
if t(n) - T < range && signal3(n) == 1
if % Conditions you want to get checked, could also be put in the previous if statement.
% Things you want to be executed if these coditions are met.
end
end
end
Using a lower level programming language like C++ would improve the rate at which it would be done. And if data is very long it could also reduce the amount of memory use by loading in an element of each array at the time.
Matlab has a simulator, called Simulink, but that is more meant for solving more complicated things, since you only conditionally want to do something.

SVM training performance

I'm using SVMLib to train a simple SVM over the MNIST dataset. It contains 60.000 training data. However, I have several performance issues: the training seems to be endless (after a few hours, I had to shut it down by hand, because it doesn't respond). My code is very simple, I just call ovrtrain on the dataset without any kernel and any special constants:
function features = readFeatures(fileName)
[fid, msg] = fopen(fileName, 'r', 'ieee-be');
header = fread(fid, 4, "int32" , 0, "ieee-be");
if header(1) ~= 2051
fprintf("Wrong magic number!");
end
M = header(2);
rows = header(3);
columns = header(4);
features = fread(fid, [M, rows*columns], "uint8", 0, "ieee-be");
fclose(fid);
return;
endfunction
function labels = readLabels(fileName)
[fid, msg] = fopen(fileName, 'r', 'ieee-be');
header = fread(fid, 2, "int32" , 0, "ieee-be");
if header(1) ~= 2049
fprintf("Wrong magic number!");
end
M = header(2);
labels = fread(fid, [M, 1], "uint8", 0, "ieee-be");
fclose(fid);
return;
endfunction
labels = readLabels("train-labels.idx1-ubyte");
features = readFeatures("train-images.idx3-ubyte");
model = ovrtrain(labels, features, "-t 0"); % doesn't respond...
My question: is it normal? I'm running it on Ubuntu, a virtual machine. Should I wait longer?
I don't know whether you took your answer or not, but let me tell you what I predict about your situation. 60.000 examples is not a lot for a power trainer like LibSVM. Currently, I am working on a training set of 6000 examples and it takes 3-to-5 seconds to train. However, the parameter selection is important and that is the one probably taking long time. If the number of unique features in your data set is too high, then for any example, there will be lots of zero feature values for non-existing features. If the tool is implementing data scaling on your training set, then most probably those lots of zero feature values will be scaled to a certain non-zero value, leaving you astronomic number of unique and non-zero valued features for each and every example. This is very very complicated for a SVM tool to get in and extract efficient parameter values.
Long story short, if you had enough research on SVM tools and understand what I mean, you either assign parameter values in the training command before executing it or find a way to decrease the number of unique features. If you haven't, go on and download the latest version of LibSVM, read the ReadME files as well as the FAQ from the website of the tool.
If non of these is the case, then sorry for taking your time:) Good luck.
It might be an issue of convergence given the characteristics of your data.
Check the kernel you have as default selection and change it. Also, check the stopping criterion of the package. Additionally, if you are looking for faster implementation, check MSVMpack which is a parallel implementation of SVM.
Finally, feature selection in your case is desired. You can end up with a good feature subset of almost half of what you have. In addition, you need only a portion of data for training e.g. 60~70 % are sufficient.
First of all 60k is huge data for training.Training that much data with linear kernel will take hell of time unless you have a supercomputing. Also you have selected a linear kernel function of degree 1. Its better to use Gaussian or higher degree polynomial kernel (deg 4 used with the same dataset showed a good tranning accuracy). Try to add the LIBSVM options for -c cost -m memory cachesize -e epsilon tolerance of termination criterion (default 0.001). First run 1000 samples with Gaussian/ polynomial of deg 4 and compare the accuracy.

Iterative solving for unknowns in a fluids problem

I am a Mechanical engineer with a computer scientist question. This is an example of what the equations I'm working with are like:
x = √((y-z)×2/r)
z = f×(L/D)×(x/2g)
f = something crazy with x in it
etc…(there are more equations with x in it)
The situation is this:
I need r to find x, but I need x to find z. I also need x to find f which is a part of finding z. So I guess a value for x, and then I use that value to find r and f. Then I go back and use the value I found for r and f to find x. I keep doing this until the guess and the calculated are the same.
My question is:
How do I get the computer to do this? I've been using mathcad, but an example in another language like C++ is fine.
The very first thing you should do faced with iterative algorithms is write down on paper the sequence that will result from your idea:
Eg.:
x_0 = ..., f_0 = ..., r_0 = ...
x_1 = ..., f_1 = ..., r_1 = ...
...
x_n = ..., f_n = ..., r_n = ...
Now, you have an idea of what you should implement (even if you don't know how). If you don't manage to find a closed form expression for one of the x_i, r_i or whatever_i, you will need to solve one dimensional equations numerically. This will imply more work.
Now, for the implementation part, if you never wrote a program, you should seriously ask someone live who can help you (or hire an intern and have him write the code). We cannot help you beginning from scratch with, eg. C programming, but we are willing to help you with specific problems which should arise when you write the program.
Please note that your algorithm is not guaranteed to converge, even if you strongly think there is a unique solution. Solving non linear equations is a difficult subject.
It appears that mathcad has many abstractions for iterative algorithms without the need to actually implement them directly using a "lower level" language. Perhaps this question is better suited for the mathcad forums at:
http://communities.ptc.com/index.jspa
If you are using Mathcad, it has the functionality built in. It is called solve block.
Start with the keyword "given"
Given
define the guess values for all unknowns
x:=2
f:=3
r:=2
...
define your constraints
x = √((y-z)×2/r)
z = f×(L/D)×(x/2g)
f = something crazy with x in it
etc…(there are more equations with x in it)
calculate the solution
find(x, y, z, r, ...)=
Check Mathcad help or Quicksheets for examples of the exact syntax.
The simple answer to your question is this pseudo-code:
X = startingX;
lastF = Infinity;
F = 0;
tolerance = 1e-10;
while ((lastF - F)^2 > tolerance)
{
lastF = F;
X = ?;
R = ?;
F = FunctionOf(X,R);
}
This may not do what you expect at all. It may give a valid but nonsense answer or it may loop endlessly between alternate wrong answers.
This is standard substitution to convergence. There are more advanced techniques like DIIS but I'm not sure you want to go there. I found this article while figuring out if I want to go there.
In general, it really pays to think about how you can transform your problem into an easier problem.
In my experience it is better to pose your problem as a univariate bounded root-finding problem and use Brent's Method if you can
Next worst option is multivariate minimization with something like BFGS.
Iterative solutions are horrible, but are more easily solved once you think of them as X2 = f(X1) where X is the input vector and you're trying to reduce the difference between X1 and X2.
As the commenters have noted, the mathematical aspects of your question are beyond the scope of the help you can expect here, and are even beyond the help you could be offered based on the detail you posted.
However, I think that even if you understood the mathematics thoroughly there are computer science aspects to your question that should be addressed.
When you write your code, try to make organize it into functions that depend only upon the parameters you are passing in to a subroutine. So write a subroutine that takes in values for y, z, and r and returns you x. Make another that takes in f,L,D,G and returns z. Now you have testable routines that you can check to make sure they are computing correctly. Check the input values to your routines in the routines - for instance in computing x you will get a divide by 0 error if you pass in a 0 for r. Think about how you want to handle this.
If you are going to solve this problem interatively you will need a method that will decide, based on the results of one iteration, what the values for the next iteration will be. This also should be encapsulated within a subroutine. Now if you are using a language that allows only one value to be returned from a subroutine (which is most common computation languages C, C++, Java, C#) you need to package up all your variables into some kind of data structure to return them. You could use an array of reals or doubles, but it would be nicer to choose to make an object and then you can reference the variables by their name and not their position (less chance of error).
Another aspect of iteration is knowing when to stop. Certainly you'll do so when you get a solution that converges. Make this decision into another subroutine. Now when you need to change the convergence criteria there is only one place in the code to go to. But you need to consider other reasons for stopping - what do you do if your solution starts diverging instead of converging? How many iterations will you allow the run to go before giving up?
Another aspect of iteration of a computer is round-off error. Mathematically 10^40/10^38 is 100. Mathematically 10^20 + 1 > 10^20. These statements are not true in most computations. Your calculations may need to take this into account or you will end up with numbers that are garbage. This is an example of a cross-cutting concern that does not lend itself to encapsulation in a subroutine.
I would suggest that you go look at the Python language, and the pythonxy.com extensions. There are people in the associated forums that would be a good resource for helping you learn how to do iterative solving of a system of equations.

Resources