Performance algorithm - Ordering - Tree (data structure) only solution? - algorithm

I have a problem at hand, at first sight it looks easy and it is, however I am looking for some other solution (maybe more easy one):
Expressions:
V0
V1
V2
V3
V4
SumA = V1 + V2
SumB = SumA + V3
SumC = SumB + SumA
SumD = SumC + V0
As we can see here, the "base" variables are V0, V1, V2, V3 and V4 (the value of each one of them is returned from DB queries)
The user ask the software to return the result of V1 and SumC.
Solution that I know:
Find all necessary variables: V1, SumC, SumB, SumA, V3, V2
For performance I just want to process the math of each variable JUST ONE TIME.
This means that I need to order the expressions from "base expressions" to "top variables".
At this point I am only seeing a solution of the type "Tree (data structure)" > Get V1, V2 and V3
Then get SumA, after get SumB and only at last get SumC.
Is there any other way to solve this problem?
The final objective in this algorithm is to use with more complex variables and several "middle variables". So, performance is critical, I can't make the same math operation more than 1 time.

I am not sure I completely understand - but I think you are referring to common subexpression elimination, [or something similar to it] which is a very common compiler optimization.
One common way of doing this optimization is using a graph [which is actually a DAG] of the expressions in the program, and adding iteratively new expressions. The "sources" in your DAG are all initial variables [V0,V1,V2,V3,V4 in your example]. You can "know" which expression is redundant if you already calculated it - and avoid recalculating it.
These lecture notes seems to be a decent more detailed explanation [though I admit I did not read it all]

First of all, you need to build a tree with all expressions. Trees are the most simple data structure for this case.
Now let's assume you have these formulas:
SumA = v1 + v2
SumB = v1 + v2 + v3
SumC = ...
and the user asks for SumB (so you know how to calculate SumC but to make the user happy, you don't have to).
In Memory, this looks like so:
SumA = Add( v1, v2 )
SumB = Add( Add( v1, v2 ), v3 ) )
The next step is to define compare operators which tell whether two sub-trees are the same. Running those, you will notice that Add( v1, v2 ) appears twice, so you can optimize:
SumA = Add( v1, v2 )
SumB = Add( SumA, v3 )
This means you can achieve the result by the minimum of calculations. The next step is to add caching to your operators: When someone asks their value, they should cache it so the next getValue() call can return the last result.
That means evaluating either SumA or SumB will fill the cache for SumA. Since you never ask for the value of SumC, it's never calculated and therefore costs nothing.

Maybe you could simplify it into this and eliminate the middle step:
SumA = (V1 + V2)*2
SumC = V3 + SumA

Only way to speed this up is to use serialisation on level you can't get programatically unless you use your own hardware. Example:
Please ignore note on top right, this is stolen from my script :)
Case A:
100 * 4 cycles
Case B:
First result takes 3 cycles, each next takes only 1 (serialisation, Ford factory like). - 102 cycles
102 vs 400 - roughly 4* the speed.
Modern CPUs can do this to some extent automatically, but it's pretty hard to measure it.
I've heard that ICC (intel C compiler) does optimize it's assembly to exploit this as much as possible, maybe that's partially why they beat everything else on intel CPU's :)

Related

How Random module get tested in OCaml?

OCaml has a Random module, I am wondering how it tests itself for randomness. However, i don't have a clue what exactly they are doing. I understand it tries to test for chi-square with two more dependencies tests. Here are the code for the testing part:
chi-square test
(* Return the sum of the squares of v[i0,i1[ *)
let rec sumsq v i0 i1 =
if i0 >= i1 then 0.0
else if i1 = i0 + 1 then Pervasives.float v.(i0) *. Pervasives.float v.(i0)
else sumsq v i0 ((i0+i1)/2) +. sumsq v ((i0+i1)/2) i1
;;
let chisquare g n r =
if n <= 10 * r then invalid_arg "chisquare";
let f = Array.make r 0 in
for i = 1 to n do
let t = g r in
f.(t) <- f.(t) + 1
done;
let t = sumsq f 0 r
and r = Pervasives.float r
and n = Pervasives.float n in
let sr = 2.0 *. sqrt r in
(r -. sr, (r *. t /. n) -. n, r +. sr)
;;
Q1:, why they write sum of squares like that?
It seems it is just summing up all squares. Why not write like:
let rec sumsq v i0 i1 =
if i0 >= i1 then 0.0
else Pervasives.float v.(i0) *. Pervasives.float v.(i0) + (sumsq v (i0+1) i1)
Q2:, why they seem to use different way for chisquare?
From the chi squared test wiki, they formula is
But it seems they are using different formula, what's behind the scene?
Other two dependencies tests
(* This is to test for linear dependencies between successive random numbers.
*)
let st = ref 0;;
let init_diff r = st := int r;;
let diff r =
let x1 = !st
and x2 = int r
in
st := x2;
if x1 >= x2 then
x1 - x2
else
r + x1 - x2
;;
let st1 = ref 0
and st2 = ref 0
;;
(* This is to test for quadratic dependencies between successive random
numbers.
*)
let init_diff2 r = st1 := int r; st2 := int r;;
let diff2 r =
let x1 = !st1
and x2 = !st2
and x3 = int r
in
st1 := x2;
st2 := x3;
(x3 - x2 - x2 + x1 + 2*r) mod r
;;
Q3: I don't really know these two tests, can someone en-light me?
Q1:
It's a question of memory usage. You will notice that for large arrays, your implementation of sumsq will fail with "Stack overflow during evaluation" (on my laptop, it fails for r = 200000). This is because before adding Pervasives.float v.(i0) *. Pervasives.float v.(i0) to (sumsq v (i0+1) i1), you have to compute the latter. So it's not until you have computed the result of the last call of sumsq that you can start "going up the stack" and adding everything. Clearly, sumsq is going to be called r times in your case, so you will have to keep track of r calls.
By contrast, with their approach they only have to keep track of log(r) calls because once sumsq has been computed for half the array, you only need to the result of the corresponding call (you can forget about all the other calls that you had to do to compute that).
However, there are other ways of achieving this result and I'm not sure why they chose this one (maybe somebody will be able to tell ?). If you want to know more on the problems linked to recursion and memory, you should probably check the wikipedia article on tail-recursion. If you want to know more on the technique that they used here, you should check the wikipedia article on divide and conquer algorithms -- be careful though, because here we are talking about memory and the Wikipedia article will probably talk a lot about temporal complexity (speed).
Q2:
You should look more closely at both expressions. Here, all the E_i's are equal to n/r. If you replace this in the expression you gave, you will find the same expression that they use: (r *. t /. n) -. n. I didn't check about the values of the bounds though, but since you have a Chi-squared distribution with parameter r-minus-one-or-two degrees of freedom, and r quite large, it's not surprising to see them use this kind of confidence interval. The Wikipedia article you mentionned should help you figure out what confidence interval they use exactly fairly easily.
Good luck!
Edit: Oops, I forgot about Q3. I don't know these tests either, but I'm sure you should be able to find more about them by googling something like "linear dependency between consecutive numbers" or something. =)
Edit 2: In reply to Jackson Tale's June 29 question about the confidence interval:
They should indeed test it against the Chi-squared distribution -- or, rather, use the Chi-squared distribution to find a confidence interval. However, because of the central limit theorem, the Chi-squared distribution with k degrees of freedom converges to a normal law with mean k and variance 2k. A classical result is that the 95% confidence interval for the normal law is approximately [μ - 1.96 σ, μ + 1.96 σ], where μ is the mean and σ the standard deviation -- so that's roughly the mean ± twice the standard deviation. Here, the number of degrees of freedom is (I think) r - 1 ~ r (because r is large) so that's why I said I wasn't surprised by a confidence interval of the form [r - 2 sqrt(r), r + 2 sqrt(r)]. Nevertheless, now that I think about it I can't see why they don't use ± 2 sqrt(2 r)... But I might have missed something. And anyway, even if I was correct, since sqrt(2) > 1, they get a more stringent confidence interval, so I guess that's not really a problem. But they should document what they're doing a bit more... I mean, the tests that they're using are probably pretty standard so most likely most people reading their code will know what they're doing, but still...
Also, you should note that, as is often the case, this kind of test is not conclusive: generally, you want to show that something has some kind of effect. So you formulate two hypothesis : the null hypothesis, "there is no effect", and the alternative hypothesis, "there is an effect". Then, you show that, given your data, the probability that the null hypothesis holds is very low. So you conclude that the alternative hypothesis is (most likely) true -- i.e. that there is some kind of effect. This is conclusive. Here, what we would like to show is that the random number generator is good. So we don't want to show that the numbers it produces differ from some law, but that they conform to it. The only way to do that is to perform as many tests as possible showing that the number produced have the same property as randomly generated ones. But the only conclusion we can draw is "we were not able to find a difference between the actual data and what we would have observed, had they really been randomly generated". But this is not a lack of rigor from the OCaml developers: people always do that (eg, a lot of tests require, say, the normality. So before performing these tests, you try to find a test which would show that your variable is not normally distributed. And when you can't find any, you say "Oh well, the normality of this variable is probably sufficient for my subsequent tests to hold") -- simply because there is no other way to do it...
Anyway, I'm no statistician and the considerations above are simply my two cents, so you should be careful. For instance, I'm sure there is a better reason why they're using this particular confidence interval. I also think you should be able to figure it out if you write everything down carefully to make sure about what they're doing exactly.

Optimize values in tree where tolerance is somewhere deep in

I am currently doing some calculations with trees. Each node has 5 values I am trying to calculate and a type deciding how these values are calculated. Some calculations can be pretty complicated algorithms. All calculations within a node depend solely on the values of its child nodes, so I am doing calculations from down to top. For each node type, a value depends on different values of the childnodes. I am interested mainly in the 5 values in the root node, which depend on all values in all other nodes ofc. All this is working just fine. A node can only have 1 or 2 childnodes, and the tree usually is no deeper than 5 levels.
For some node-types, there is a tolerance; meaning some values there would not matter, see this picture, I marked those with XX. Sometimes even, some values would be in relation, like C = XX * A. Currently, these values are just set to some default values. Sometimes there would be a complicated relationship even, like multiple possible solutions of an algorithm like Newton's Method, depending on starting values.
Now there is a rating I can apply on the values of the root node. What I would like is to optimize this rating by adjusting the XX-values deep within the tree. The calculations within each node can be a range of many possible formulas and the tolerance can be one of many possible patterns, so I cannot just figure out some formula but I would need some algorithm which is very flexible. I do not know of such an algorithm. Does anyone have an idea?
/Edit: To clarify, it is unclear how many values in the tree will be free. There is not just one XX, but there may be any number of them (I guess max. 10), so my first step would be identifying these values. Also, I will be doing this on many generated trees within a time window, so speed is not unimportant as well. Thanks:)
If you have 3 input values XX, YY, and ZZ, you are searching a 3 dimension space. What you are looking to do is to apply an optimisation algorithm, or Heuristic algorithm. Your choice of algorithm is key, a cost benefit between your time and the computer's time. I am guessing that you just want to do this once.
What ever method you use, you need to understand the problem, which means to understand how your algorithm changes with different input values. Some solutions have a very nice minimum that is easy to find (e.g. using Newton's Method), some don't.
I suggest starting simple. One of the most basic is just to do an iterative search. It's slow, but it works. You need to make sure that your iteration step is not too large, such that you don't miss some sweet spots.
For XX = XXmin to XXmax
For YY = YYmin to YYmax
For ZZ = ZZmin to ZZmax
result = GetRootNodeValue(XX, YY, ZZ)
If result < best_result then
print result, XX, YY, ZZ
best_result = result
End if
End For
End For
End For
Below is another method, it's a stochastic optimisation method (uses random points to converge on the best solution), it's results are reasonable for most conditions. I have used this successfully and it's good at converging to the minimum value. This is good to use if there is no clear global minimum. You will have to configure the parameters for your problem.
' How long to search for, a larger value will result in long search time
max_depth = 20000
' Initial values
x0 = initial XX value
y0 = initial YY value
z0 = initial ZZ value
' These are the delta values, how far should the default values range
dx = 5
dy = 5
dz = 5
' Set this at a large value (assuming the best result is a small number)
best_result = inf
' Loop for a long time
For i = 1 To max_depth
' New random values near the best result
xx = x0 + dx * (Rnd() - 0.5) * (Rnd() - 0.5)
yy = y0 + dy * (Rnd() - 0.5) * (Rnd() - 0.5)
zz = y0 + dy * (Rnd() - 0.5) * (Rnd() - 0.5)
' Do the test
result = GetRootNodeValue(xx, yy, zz)
' We have found the best solution so far
If result < best_result Then
x0 = xx
y0 = yy
z0 = zz
best_result = result
End If
Print progress
Next i
There are many optimisation algorithms to choose from. Above are some very simple ones, but they may not be the best for your problem.
As another answer has pointed out, this looks like a optimization problem. You may consider using a genetic algorithm. Basically, you try to mimic the evolution process by "mating" different individuals (in your case trees) with different traits (in your case the values on the leaves) and make them survive based on an objective function (in your case, what you obtain on the root node). The algorithm can be improved by adding mutations to your populations (as in nature's evolution).

very slow matlab jacket if statement

I encountered a very slow if statement response using cuda\jacket in matlab. (5 sec vs 0.02 sec for the same code that finds local maxima, using a simple for loop and an if condition)
Being new to GPU programming, I went reading and when I saw a previous matlab if statements with CUDA SO discussion, I felt something is missing.
You don't need to use cuda to know that it is better to vectorized your code. However, there are cases where you will need to use an if statement anyway.
For example, I'd like to find whether a pixel of a 2D image (say m(a,b)) is the the local maximum of its 8 nearest neighbors. In matlab, an easy way to do that is by using 8 logical conditions on an if statement:
if m(a,b)>m(a-1,b-1) & m(a,b)>(a,b-1) & m(a,b)>(a+1,b-1) & ... etc on all nearest neighbors
I'd appreciate if you have an idea how to resolve (or vectorize) this...
The problem with using multiple "if" statement (or any other conditional statement) is that for each the statements, the result is copied from gpu to host and this can be costly.
The simplest way is to vectorize in the following manner.
window = m(a-1:a+1, b-1:b+1);
if all(window(:) <= m(a,b))
% do something
end
This can be further optimized if you can show what the if / else conditions are doing. i.e. please post the if/else code to see if other optimizations are available (i.e look at possible ways to remove if condition entirely).
EDIT
With new information, here is what can be done.
for j = 1:length(y)
a = x(j);
b = y(j);
window = d(a-1:a+1, b-1:b+1);
condition = all(window(:) <= d(a,b));
M(a, b) = condition + ~condition * M(a,b);
end
You can use gfor loop to make it even faster.
gfor j = 1:length(y)
a = x(j);
b = y(j);
window = d(a-1:a+1, b-1:b+1);
condition = all(window(:) <= d(a,b));
M(a, b) = condition + ~condition * M(a,b);
gend
Using built-in functions
The easiest already optimized approach is probably to use the imregionalmax function,
maxinI = imregionalmax(I, CONN);
where CONN is the desired connectivity (in your case 8).
Note however that imregionalmax is part of the image processing toolbox.
Using the max function
If you're trying to see if just that one pixel is the local maximum of it's neighbors you would probably do something like
if m(a,b) == max(max(m( (a-1) : (a+1), (b-1) : (b+1))))
Or perhaps rather than taking two max it may be faster in some cases to reshape,
if m(a,b) == max(reshape (m( (a-1) : (a+1), (b-1) : (b+1)), 9,1) )
Without the max function
Lastly if you want to avoid the max function altogether that is also possible in a more vectorized form than you have so far, namely
if all(reshape( m(a,b) >= m( (a-1) : (a+1), (b-1) : (b+1)), 9,1))

Efficient algorithm to find String overlaps

I won't go into the details of the problem I'm trying to solve, but it deals with a large string and involves finding overlapping intervals that exist in the string. I can only use one of the intervals that overlap, so I wanted to separate these intervals out and analyze them individually. I was wondering what algorithm to use to do this as efficiently as possible.
I must stress that speed is paramount here. I need to separate the intervals as quickly as possible. The algorithm that came to my mind was an Interval Tree, but I wasn't sure if that's the best that we can do.
Interval Trees can be queried in O(log n) time, n being the number of intervals and construction requires O(nlog n) time, though I wanted to know if we can cut down on either.
Thanks!
Edit: I know the question is vague. I apologize for the confusion. I suggest that people look at the answer by Aaron Huran and the comments on the same. That should help clarify things a lot more.
Well, I was bored last night so I did this in Python. It's recursive unnecessarily (I just read The Little Schemer and think recursion is super neat right now) but it solves your problem and handles all input I threw at it.
intervals = [(0,4), (5,13), (8,19), (10,12)]
def overlaps(x,y):
x1, x2 = x
y1, y2 = y
return (
(x1 <= y1 <= x2) or
(x1 <= y2 <= x2) or
(y1 <= x1 <= y2) or
(y1 <= x2 <= y2)
)
def find_overlaps(intervals, checklist=None, pending=None):
if not intervals:
return []
interval = intervals.pop()
if not checklist:
return find_overlaps(intervals, [interval], [interval])
check = checklist.pop()
if overlaps(interval, check):
pending = pending or []
checklist.append(check)
checklist.append(interval)
return pending + [interval] + find_overlaps(intervals, checklist)
else:
intervals.append(interval)
return find_overlaps(intervals, checklist)
Use like this:
>>> find_overlaps(intervals)
[(10, 12), (8, 19), (5, 13)]
Note that it returns all overlapping intervals in REVERSE order of their start point. Hopefully that's a minor issue. That's only happening because I'm using push() and pop() on the list, which operates on the end of the list, rather than insert(0) and pop(0) which operates on the beginning.
This isn't perfect, but it runs in linear time. Also remember that the size of the actual string doesn't matter at all - the running time is relative to the number of intervals, not the size of the string.
You may want to try using Ukkonen's algorithm (see https://en.wikipedia.org/wiki/Ukkonen%27s_algorithm).
There is a free code version at http://biit.cs.ut.ee/~vilo/edu/2002-03/Tekstialgoritmid_I/Software/Loeng5_Suffix_Trees/Suffix_Trees/cs.haifa.ac.il/shlomo/suffix_tree/suffix_tree.c
You are looking to calculate the difference between the two strings right? What language are you trying to do this in?
Update:
Without any sort of criteria on how you will select which intervals to use there are an enormous possible solutions.
One method would be to take the lowest starting number, grab its end.
Grab the next starting number that is higher than the previous interval's end. Get this interval's end and repeat.
So for 0-4, 5-13, 8-19, 10-12
You get: 0-4, 5-13 and ignore the others.

Algorithm for multidimensional optimization / root-finding / something

I have five values, A, B, C, D and E.
Given the constraint A + B + C + D + E = 1, and five functions F(A), F(B), F(C), F(D), F(E), I need to solve for A through E such that F(A) = F(B) = F(C) = F(D) = F(E).
What's the best algorithm/approach to use for this? I don't care if I have to write it myself, I would just like to know where to look.
EDIT: These are nonlinear functions. Beyond that, they can't be characterized. Some of them may eventually be interpolated from a table of data.
There is no general answer to this question. A solver finding the solution to any equation does not exist. As Lance Roberts already says, you have to know more about the functions. Just a few examples
If the functions are twice differentiable, and you can compute the first derivative, you might try a variant of Newton-Raphson
Have a look at the Lagrange Multiplier Method for implementing the constraint.
If the function F is continuous (which it probably is, if it is an interpolant), you could also try the Bisection Method, which is a lot like binary search.
Before you can solve the problem, you really need to know more about the function you're studying.
As others have already posted, we do need some more information on the functions. However, given that, we can still try to solve the following relaxation with a standard non-linear programming toolbox.
min k
st.
A + B + C + D + E = 1
F1(A) - k = 0
F2(B) - k = 0
F3(C) -k = 0
F4(D) - k = 0
F5(E) -k = 0
Now we can solve this in any manner we wish, such as penalty method
min k + mu*sum(Fi(x_i) - k)^2
st
A+B+C+D+E = 1
or a straightforward SQP or interior-point method.
More details and I can help advise as to a good method.
m
The functions are all monotonically increasing with their argument. Beyond that, they can't be characterized. The approach that worked turned out to be:
1) Start with A = B = C = D = E = 1/5
2) Compute F1(A) through F5(E), and recalculate A through E such that each function equals that sum divided by 5 (the average).
3) Rescale the new A through E so that they all sum to 1, and recompute F1 through F5.
4) Repeat until satisfied.
It converges surprisingly fast - just a few iterations. Of course, each iteration requires 5 root finds for step 2.
One solution of the equations
A + B + C + D + E = 1
F(A) = F(B) = F(C) = F(D) = F(E)
is to take A, B, C, D and E all equal to 1/5. Not sure though whether that is what you want ...
Added after John's comment (thanks!)
Assuming the second equation should read F1(A) = F2(B) = F3(C) = F4(D) = F5(E), I'd use the Newton-Raphson method (see Martijn's answer). You can eliminate one variable by setting E = 1 - A - B - C - D. At every step of the iteration you need to solve a 4x4 system. The biggest problem is probably where to start the iteration. One possibility is to start at a random point, do some iterations, and if you're not getting anywhere, pick another random point and start again.
Keep in mind that if you really don't know anything about the function then there need not be a solution.
ALGENCAN (part of TANGO) is really nice. There are Python bindings, too.
http://www.ime.usp.br/~egbirgin/tango/codes.php - " general nonlinear programming that does not use matrix manipulations at all and, so, is able to solve extremely large problems with moderate computer time. The general algorithm is of Augmented Lagrangian type ... "
http://pypi.python.org/pypi/TANGO%20Project%20-%20ALGENCAN/1.0
Google OPTIF9 or ALLUNC. We use these for general optimization.
You could use standard search technic as the others mentioned. There are a few optimization you could make use of it while doing the search.
First of all, you only need to solve A,B,C,D because 1-E = A+B+C+D.
Second, you have F(A) = F(B) = F(C) = F(D), then you can search for A. Once you get F(A), you could solve B, C, D if that is possible. If it is not possible to solve the functions, you need to continue search each variable, but now you have a limited range to search for because A+B+C+D <= 1.
If your search is discrete and finite, the above optimizations should work reasonable well.
I would try Particle Swarm Optimization first. It is very easy to implement and tweak. See the Wiki page for it.

Resources