Haskell: Slow infinite list for Verlet Integration - performance

I am using Haskell to make a Verlet integrator to model gravity. The integrator uses the first two positions of the object as seeds and generates the rest after this.
I thought a nice way of making this in Haskell would be to use an infinite list. However, when implemented I find that it runs very slowly for large times (Haskell 1700 time steps: 12 seconds, Python 1700 time steps: < 1 second)
Here is the relevant code for a 1d integrator that has similar performance:
verletStep dt acc xn xn1 = 2*xn1 - xn + (acc xn1)*dt*dt
verlet dt acc x0 x1 = x0 : x1 : next (verlet dt acc x0 x1)
where
next (xn : xs#(xn1:_)) = (verletStep dt acc xn xn1) : next xs
I also tried using zipWith to generate the infinite list but it has similar performance.
Why does this take so long? The garbage collection itself is around 5 seconds. Is there a nice way to make this run faster?

This definition...
verlet dt acc x0 x1 = x0 : x1 : next (verlet dt acc x0 x1)
where
next (xn : xs#(xn1:_)) = (verletStep dt acc xn xn1) : next xs
... leads to verlet dt acc x0 x1 being calculated many times unnecessarily, thus building a lot of unneeded lists. That can be seen by working out a time step by hand:
verlet dt acc x0 x1
x0 : x1 : next (verlet dt acc x0 x1)
x0 : x1 : next (x0 : x1 : next (verlet dt acc x0 x1))
x0 : x1 : (verletStep dt acc x0 x1) : next (x1 : next (verlet dt acc x0 x1))
The solution is to eliminate the unnecessary list-building:
verlet dt acc x0 x1 = x0 : x1 : x2 : drop 2 (verlet dt acc x1 x2)
where
x2 = verletStep dt acc x0 x1
drop 2 removes the first two elements of a list (in this case, x1 and x2, which we have already prepended). verlet is called recursively with the second position, x1, and the newly calculated third one, x2. (Compare with the original definition, in which verlet is called recursively with the same arguments. That should raise suspicion.)

Related

WAM register allocation for same structures

Warren’s Abstract Machine A Tutorial Reconstruction states the following for Variable Register Allocation rules:
Variable registers are allocated according to least available index.
Register X1 is always allocated to the outermost term.
A same register is allocated to all the occurrences of
a given variable.
Further in the tutorial, while building program queries, the following example is given:
p(f(X), h(Y, f(a)), Y).
X1 = p(X2, X3, X4)
X2 = f(X5)
X3 = h(X4, X6)
X4 = Y
X5 = X
X6 = f(X7)
X7 = a
My doubt is when considering the two occurrences of the f clause, both are f/1 structures, but with a different body and therefore needed to be instantiated differently. But what exactly is considered a variable in the WAM context, a prolog variable, or every term? How would the clause p(f(a), f(a)) be constructed:
X1 = p(X2, X2)
X2 = f(X3)
X3 = a
or
X1 = p(X2, X3)
X2 = f(X4)
X3 = f(X4)
X4 = a

pow(X,Y,Z) <=> Z = X^Y with add

Would it be possible to do "pow" with "add" predicate (or just X is Y + Z )?
I make this:
pow(0,1,1).
pow(_,0,1).
pow(X,Y,Z) :- Y1 is Y - 1, pow(X,Y1,Z1), Z is Z1 * X.
But I want also make it with " + " (just for practise) like 3^2 = 3 * 3 = 3 + 3 + 3
You can write the multiplication (mul/3) in terms of addition. Like:
pow(0,1,1).
pow(_,0,1).
pow(X,Y,Z) :-
Y > 1,
Y1 is Y - 1,
pow(X,Y1,Z1),
mul(Z1,X,Z). %% originally: Z is Z1 * X.
mul(0,_,0).
mul(I,A,R) :-
I > 0,
I1 is I-1,
mul(I1,A,R1),
R is R1 + A.
Usually a basic exercise is to write addition, multiplication, and power predictates with the Peano number representation. In that case addition is written with the successor functor.

resolve a system of linked equations with different modulo

Is there any algorithm to solve a system of equations expressed in different modulo spaces?
For exemple, consider this system of equations:
(x1 + x2 ) % 2 = 0
( x2 + x3) % 2 = 0
(x1 + x2 + x3) % 3 = 2
One of the solutions of this system is:
x1 = 0
x2 = 2
x3 = 0
How could I arithmetically find this solution (without using a brute force algorithm)?
Thanks
You can rewrite these equations as
x1 + x2 = 2*n1
x2 + x3 = 2*n2
x1 + x2 + x3 = 3*n3 + 2
Now, this is a linear Diophantine equation problem for which there are solutions in the literature.
Example: http://www.wikihow.com/Solve-a-Linear-Diophantine-Equation
Also see: https://www.math.uwaterloo.ca/~wgilbert/Research/GilbertPathria.pdf
Algorithm:
Write xi as a function of nks
In this case:
x3 = 3*n3 + 2 - 2*n1
x2 = 2*n2 - (3*n3 + 2 - 2*n1)
x1 = 2*n1 - (2*n2 - (3*n3 + 2 - 2*n1))
Since there is no division on the right-hand side, pick any (n1, n2, n3) and you should get a solution.
First line is same as saying x1, x2 is all even or all odd numbers.
Second line is same as saying x2, x3 is all even or all odd numbers.
Hence x1,x2,x3 is all even or all odd numbers.
From third line we can replace the question to "3 odd or 3 even numbers that accumulate to 3k+2."
You can convert your system to modulo LCM (least common multiple). Just find the LCM of all equation's modulo, and multiply each equation appropriately.

Merging two very large lists

Given a list of size 2n -1 elements and the list looks like this:
x1, x2, x3, ....., xn, y1, y2, y3, ....y(n-1)
Convert it to:
x1, y1, x2, y2, x3, y3, ........., y(n-1), xn
I can use two iterators for each of the lists and get the solution in O(n) time complexity and O(n) space complexity. But if my n was very large, is there a way to do this in lesser space complexity?
It feels like this can be done with O(1) space and O(n) time but the algorithm is far from trivial. Basically take an element that is out of place, say x2, look where it needs to be in the final arrangement take out the element that is there (i.e. x3) and put in x2.
Now look where x3 needs to go and so on.
When the cycle is closed, take the next element that is out of place (if there is any).
Lets do an example:
x1 x2 x3 y1 y2 x2 is out of place so take it into temp storage
x1 -- x3 y1 y2 temp: x2 needs to go where x3 currently is
x1 -- x2 y1 y2 temp: x3 needs to go where y2 currently is
x1 -- x2 y1 x3 temp: y2 needs to go where y1 currently is
x1 -- x2 y2 x3 temp: y1 needs to go into the empty slot
x1 y1 x2 y2 x3 all elements in place -> finished
If the array indices start at 0, the final position of the element at k is given by
2k if k < n
2(k-n) + 1 if k >= n
The difficulty is to find out an element of a cycle that is not yet handled. For example if n = 4 there are 3 cycles:
0 -> 0
1 -> 2 -> 4 -> 1
3 -> 6 -> 5 -> 3
I do not have an easy solution for that at the moment.
If you have one bit of storage available per array element it is trivial but then we are back to O(n) storage.
In Python:
lst = 'x1 x2 x3 x4 x5 y1 y2 y3 y4 y5'.split()
lst
Out[9]: ['x1', 'x2', 'x3', 'x4', 'x5', 'y1', 'y2', 'y3', 'y4', 'y5']
out = sum((list(xy) for xy in zip(lst[:len(lst)//2], lst[len(lst)//2:])), [])
out
Out[11]: ['x1', 'y1', 'x2', 'y2', 'x3', 'y3', 'x4', 'y4', 'x5', 'y5']

Using min/max *within* an Integer Linear Program

I'm trying to set up a linear program in which the objective function adds extra weight to the max out of the decision variables multiplied by their respective coefficients.
With this in mind, is there a way to use min or max operators within the objective function of a linear program?
Example:
Minimize
(c1 * x1) + (c2 * x2) + (c3 * x3) + (c4 * max(c1*x1, c2*x2, c3*x3))
subject to
#some arbitrary integer constraints:
x1 >= ...
x1 + 2*x2 <= ...
x3 >= ...
x1 + x3 == ...
Note that (c4 * max(c1*x1, c2*x2, c3*x3)) is the "extra weight" term that I'm concerned about. We let c4 denote the "extra weight" coefficient. Also, note that x1, x2, and x3 are integers in this particular example.
I think the above might be outside the scope of what linear programming offers. However, perhaps there's a way to hack/reformat this into a valid linear program?
If this problem is completely out of the scope of linear programming, perhaps someone can recommend an optimization paradigm that is more suitable to this type of problem? (Anything that allows me to avoid manually enumerating and checking all possible solutions would be helpful.)
Add in an auxiliary variable, say x4, with constraints:
x4 >= c1*x1
x4 >= c2*x2
x4 >= c3*x3
Objective += c4*x4

Resources