time series simulation and logical checking with Matlab or with other tools - algorithm

1) I have time series data and signals (indicators) that their value changes over time.
My question:
2) I need to do logical checking all the time, e.g. if signal 1 and 2 happened around the same time (were equal to a certain value e.g.=1) then I need to know the exact time in order to check what happened next.
3) to complicate things,e.g. if signal 3 happened in some time range after signal 1 and signal 2 were equal to 1, I would like to check other things.
4)The time series is very long and I need to deal with it segment by segment.
Please advice how to write it without inventing the wheel.
Is it recommended to write it in Matlab?, using a state machine? in C++?, using threads?
5) Does Matlab have a simulator ready for this kind of things?
How do I define the logical conditions in an efficient way?
6) Can I use data mining tools for this?
I saw this list of tools:
Data Mining open source tools
not sure where to start.
Thanks

The second and third question could be done like this in Matlab:
T = -range; % Assuming that t starts at 0.
for n = 1 : length(t)
if signal1(n) == 1 && signal2(n) == 1
T = t(n);
end
if t(n) - T < range && signal3(n) == 1
if % Conditions you want to get checked, could also be put in the previous if statement.
% Things you want to be executed if these coditions are met.
end
end
end
Using a lower level programming language like C++ would improve the rate at which it would be done. And if data is very long it could also reduce the amount of memory use by loading in an element of each array at the time.
Matlab has a simulator, called Simulink, but that is more meant for solving more complicated things, since you only conditionally want to do something.

Related

Never ending 'for' loop prevents my RStudio notebook from being rendered into a .md file

I'm trying to calculate the Kolmogorov-Smirnov statistic in R. I have the following sample, which clearly comes from a random variable that follows a long-tailed distribution.
Download link
https://drive.google.com/file/d/1hIgqikX7p343zdyc-Goq34THUpsZA63n/view?usp=sharing
As you may know, the Kolmogorov-Smirnov statistic requires the calculation of the empirical cumulative distribution function and the presumed cumulative distribution function. For both calculations I take the following approach: first, I create a vector with the same length as the length of the sample, and then I modify each of the components of the vector so as for it to contain the empirical cdf (or presumed cdf) of the corresponding observation of the sample.
For the sake of illustration, I'll show you the code I wrote in order to calculate the empirical cdf.
I'm assuming that the data has been read and stored in a dataframe called data.
ecdf = vector("numeric", length(data$logueos))for (i in 1:length(data$logueos)) {ecdf[i] = sum (data$logueos <= data$logueos[i])/length(data$logueos)}
The code I wrote for the calculation of the presumed cdf is analogous to the preceding one; the only difference is that I set each component of the pcdf vector equal to the formula $P(X<=t)$ —where t is the corresponding observation of the sample— according to the distribution that I'm assuming.
The problem is that this 'for' loop never ends. If I force it to end by clicking RStudio's stop button it works: it makes the vector store what I want it to store. But, if I press Ctrl+Shift+k in order to render my notebook and preview it, the load gets stuck when trying to execute the first chunk encountered that contains one of those loops.
First of all, your loop is not endless. It will finish, eventually.
You start initializing a vector with as much elements as the number of observations (1.245.888, which is a lot of iterations). This vector is FULL OF ZEROS.
What your loop does is iterate while changing each zero with the calculus sum (data$logueos <= data$logueos[i])/length(data$logueos). Check that when you stop the execution, the first values of your vector will be values between 0 and 1 while the last values is going to be 0s (because the loop hasn't arrived there yet).
So, you will have to wait more time.
In order to make the execution faster, you could consider loop parallelization (because standard loops go sequentially, one by one, and if it's too much wait, parallelization makes it faster. For example, executing 4 by 4, depending of your computer capacities). Here you'll find some information about it: https://nceas.github.io/oss-lessons/parallel-computing-in-r/parallel-computing-in-r.html
Then, my proposal to you:
if(!require(foreach)){install.packages("foreach")}; require(foreach)
registerDoParallel(detectCores() - 1)
ecdf = vector("numeric", length(data$logueos))
foreach (i=1:length(data$logueos)) %do% {
print(i)
ecdf[i] = sum (data$logueos <= data$logueos[i])/length(data$logueos)
}
The first line will download and load foreach library, that you
need for parallelization.
detectCores() - 1 is going to use all the
processors that your computer has except one (to avoid freezing your
machine) for computing this loop. You'll see that is going to be
faster!
registerDoParallel function is what tells to foreach how many cores use.

Make a previously unknown number of parallel operations. In VHDL

Im working on a project for which I need to make calculations with vectors (orthogonalizing a matrix using gram schmidt method). The length of this vectors is unknown now, the program must be able to adapt to different lengths. One of such calculations is calculating a new vector (C) which is the result of adding A and B. Each element of the vectors is a number in fixed-point.
I want C(i)=A(i)+B(i). For all the elements of the vector (for i=0 to N, where N is the vector length).
I can find 2 solutions for this but both present some problems:
1- I can declare in the entity, vectors whose length changes according to a generic and then just create a for loop which goes through all the vector.
for I in 0 to N loop
C(I)<=A(I)+B(I);
end loop;
The problem with this solution is that the execution would be sequential, and therefore slow. Im not completly sure about this and I dont know how to check it but I guess that the compiler is not smart enough to notice that it can be processed in parallel. In this application speed is a key factor.
2- I can declare vectors which are as long as the maximum possible length for the actual data and fill them with zeroes. Then I could just assign:
C(0)<=A(0)+B(0);
C(1)<=A(1)+B(1);
C(2)<=A(2)+B(2);
...
C(Nmax)<=A(Nmax)+B(Nmax);
This is not an elegant solution and in this application N can be between 3 and 300 therefore it could be a complete waste and tedious to program.
3- I want to find a third solution which could be able to create a number (asigned by the generic) of combinational calculations following a template such as C(i)=A(i)+B(i). Is there any solution like this? It is actually creating a loop which would not be executed sequentially but instead all at the same time.
I know that similar stuff can be done using CUDA but this project is actually a comparison between GPUs and FPGAs, so changing the platform is not a suitable solution either.
Thank you in advance
Edit: I have tought of another unsatisfactory solution but I want to share it in case it is helpful for somebody else checking this in the future. Given that A and B have the same length, you can write them in a 1-D format, that is: A(normal)=[1001,1100,0011], A(1-D)=100111000011. The same would be done with B.
If you know before hand that the sum of any two possible numbers can be expressed with the same amount of bits, there will be no problems. So with 4 unsigned bits you should make sure that in any possible case the numbers in A or B are !>0111 (not higher than 0111). You could just write C(1-D)=A(1-D)+B(1-D) and then just asign C(0)=C(1-D)(3 downto 0), C(1)=C(1-D)(7 downto 4) etc.
If you cannot make sure that the numbers are not higher than 0111 (in the 4 bit case) it wont work.
You might be able to use the length attribute to create a loop depending on the size of your vector.
https://www.csee.umbc.edu/portal/help/VHDL/attribute.html
As mentioned in the comment to the question the loop should be unrolled as long as it is not synchronized to the clock.

Faster way of testing a condition in MATLAB

I need to run many many tests of the form a<0 where a is a vector (a relatively short one). I am currently doing it with
all(v<0)
Is there a faster way?
Not sure which one will be faster (that may depend on the machine and Matlab version), but here are some alternatives to all(v<0):
~any(v>0)
nnz(v>=0)==0 %// Or ~nnz(v>=0)
sum(v>=0)==0 %// Or ~sum(v>=0)
isempty(find(v>0, 1)) %// Or isempty(find(v>0))
I think the issue is that the conditional is executed on all elements of the array first, then the condition is tested... That is, for the test "any(v<0)", matlab does the following I believe:
Step 1: compute v<0 for every element of v
Step 2: search through the results of step 1 for a true value
So even if the first element of v is less than zero, the conditional was first computed for all elements, hence wasting a lot of time. I think this is also true for any of the alternative solutions offered above.
I don't know of a faster way to do it easily, but wish I did. In some cases, breaking the array v up into smaller chunks and testing incrementally could speed things up, particularly if the condition is common. For example:
function result = anyLessThanZero(v);
w = v(:);
result = true;
for i=1:numel(w)
if ( w(i) < 0 )
return;
end
end
result = false;
end
but that can be very inefficient if the condition is rare. (If you were to really do this, there is probably a better way than I illustrate above to handle any condition, not just <0, but I show it this way to make it clear).

Attempt to "go back" without goto statement

The code examples are gonna be in Lua, but the question is rather general - it's just an example.
for k=0,100 do
::again::
local X = math.random(100)
if X <= 30
then
-- do something
else
goto again
end
end
This code generates 100 pseudorandom numbers between 0-30. It should do it between 0-100, but doesn't let the loop go on if any of them is larger than 30.
I try to do this task without goto statement.
for k=0,100 do
local X = 100 -- may be put behind "for", in some cases, the matter is that we need an 'X' variable
while X >= 30 do --IMPORTANT! it's the opposite operation of the "if" condition above!
X = math.random(100)
end
-- do the same "something" as in the condition above
end
Instead, this program runs the random number generation until I get a desired value. In general, I put all the codes here that was between the main loop and the condition in the first example.
Theoretically, it does the same as the first example, only without gotos. However, I'm not sure in it.
Main question: are these program codes equal? They do the same? If yes, which is the faster (=more optimized)? If no, what's the difference?
It is bad practice to use Goto. Please see http://xkcd.com/292/
Anyway, I'm not much into Lua, but this looks simple enough;
For your first code: What you are doing is starting a loop to repeat 100 times. In the loop you make a random number between 0 and 100. If this number is less than or equal to 30, you do something with it. If this number is greater than 30, you actually throw it away and get another random number. This continues until you have 100 random numbers which will ALL be less than or equal to thirty.
The second code says: Start a loop from 0 to 100. Then you set X to be 100. Then you start another loop with this condition: As long as X is greater than 30, keep randomizing X. Only when X is less than 30 will your code exit and perform some action. When it has performed that action 100 times, the program ends.
Sure, both codes do the same thing, but the first one uses a goto - which is bad practice regardless of efficiency.
The second code uses loops, but is still not efficient - there are 2 levels of loops - and one is based on psuedo-random generation which can be extremely inefficient (maybe the CPU generates only numbers between 30-100 for a trillion iterations?) Then things get very slow. But this is also true for you're first piece of code - it has a 'loop' that is based on psuedo-random number generation.
TLDR; strictly speaking about efficiency, I do not see one of those being more efficient than the other. I could be wrong but it seems the same things is going on.
you can directly use math.random(lower, upper)
for k=0,100 do
local X = math.random(0, 30)
end
even faster.
As I see this pieces of code do the same, but using goto always isn't the best choice (in any programming language). For lua see details here

Do speeds of if statements in a repetitive loop affect overall performance?

If I have code that will take a while to execute, printing out results every iteration will slow down the program a lot. To still receive occasional output to check on the progress of the code, I might have:
if (i % 10000 == 0) {
# print progress here
}
Does the if statement checking every time slow it down at all? Should I just not put output and just wait, will that make it noticeably faster at all?
Also, is it faster to do: (i % 10000 == 0) or (i == 10000)?
Is checking equality or modulus faster?
In general case, it won't matter at all.
A slightly longer answer: It won't matter unless the loop is run millions of times and the other statement in it is actually less demanding than an if statement (for example, a simple multiplication etc.). In that case, you might see a slight performance drop.
Regarding (i % 10000 == 0) vs. (i == 10000), the latter is obviously faster, because it only compares, whereas the former possibility does a (fairly costly) modulus and a comparison.
That said, both an if statement and a modulus count won't make any difference if your loop doesn't take up 90 % of the program's running time. Which usually is the case only at school :). You probably spent a lot more time by asking this question than you would have saved by not printing anything. For development and debugging, this is not a bad way to go.
The golden rule for this kind of decisions:
Write the most readable and explicit code you can imagine to do the
thing you want it to do. If you have a performance problem, look at
wrong data structures and algorithmic choices first. If you have done
all those and need a really quick program, profile it to see which
part takes most time. After all those, you're allowed to do this kind
of low-level guesses.

Resources