Arbitrary precision with inaccurate calculation? - precision

I am interested in the numerical solution of ordinary differential equations (ODE's) with a precision of 25 to 30 significant digits (I mean finding 25 to 30 significant digits of the actual solution at each point).
My first guess was multiple precision or arbitrary precision arithmetics.
However according to this post there are multiple precision packages that do calculations without multiple precision (?!). This may not be as contradictory as it sounds. However it poses a severe problem.
So do there exist any multiple precision packages (in Fortran, C, C++, Python, Java or whatever) that can do reliable multiple precision computations?

Let's stay modest for demonstration purposes and aim for 20 correct digits in an implementation of the circle equations u'=v; v'=-u which give u=sin(t) for (u0,v0)=(0,1). Let's integrate this over the interval [0,1].
This has a Lipschitz constant of L=1, which makes arguing about step sizes easier. With the classical Runge-(Heun)-Kutta order 4 method, the global error is expected as some fraction of h^4, so to get about 20 digits after the dot, we need h=1e-5, which also means 100 000 steps over the interval. Each has some floating point errors in the last digit, so that over the full interval, possibly 5 digits will be unreliable. Thus the working precision should be 25 digits or more
from mpmath import mp
mp.dps = 25
def RK4(t0, y0, tf, N):
h = (tf-t0)/N
u,v = y0;
for n in range(N):
k1u, k1v = v, -u;
k2u, k2v = v+0.5*h*k1v, -(u+0.5*h*k1u)
k3u, k3v = v+0.5*h*k2v, -(u+0.5*h*k2u)
k4u, k4v = v+h*k3v, -(u+h*k3u)
u, v = u+h*(k1u+2*k2u+2*k3u+k4u)/6, v+h*(k1v+2*k2v+2*k3v+k4v)/6
return u,v
u,v = mp.mpf(0), mp.mpf(1)
t, dt = mp.mpf(0), mp.mpf(1)/10
for k in range(10):
u1,v1 = RK4(t,[u,v],t+dt,10**3)
u,v = RK4(t,[u,v],t+dt,10**4)
t = t+dt
# error estimate via extrapolation
# u = uexact + uerr, u1 = uexact + 10**4*uerr
uerr, verr = (u1-u)/9999, (v1-v)/9999
with mp.extradps(5): si, co = mp.sin(t), mp.cos(t)
print("t=%s"%t)
print("u=%30s, sin(t)=%30s, step error=%30s"%(u, si, uerr))
print("v=%30s, cos(t)=%30s, step error=%30s"%(v, co, verr))
giving a result list as follows
t=0.1
u= 0.09983341664682815230680593, sin(t)= 0.0998334166468281523068142, step error=-8.291772901773692681703852e-24
v= 0.9950041652780257660955634, cos(t)= 0.995004165278025766095562, step error=8.312005953404312840899088e-25
t=0.2
u= 0.1986693307950612154593963, sin(t)= 0.1986693307950612154594126, step error=-8.167351203006133449374218e-24
v= 0.9800665778412416311242006, cos(t)= 0.9800665778412416311241965, step error=1.654857583109387499562722e-24
t=0.3
u= 0.2955202066613395751052971, sin(t)= 0.2955202066613395751053207, step error= -7.9613619853238077687103e-24
v= 0.9553364891256060196423186, cos(t)= 0.9553364891256060196423102, step error= 2.46199127828304457142136e-24
t=0.4
u= 0.3894183423086504916662812, sin(t)= 0.3894183423086504916663118, step error=-7.675762243562770001241619e-24
v= 0.9210609940028850827985405, cos(t)= 0.9210609940028850827985267, step error=3.244534570708288054285545e-24
t=0.5
u= 0.479425538604203000273252, sin(t)= 0.4794255386042030002732879, step error=-7.313560501818737799847937e-24
v= 0.8775825618903727161163026, cos(t)= 0.8775825618903727161162816, step error=3.994583217701846864434659e-24
t=0.6
u= 0.5646424733950353572009052, sin(t)= 0.5646424733950353572009454, step error=-6.878230619815552707865177e-24
v= 0.8253356149096782972409814, cos(t)= 0.8253356149096782972409525, step error=4.704815938714571792025104e-24
t=0.7
u= 0.6442176872376910536725712, sin(t)= 0.6442176872376910536726143, step error=-6.374164848843117446876767e-24
v= 0.7648421872844884262558976, cos(t)= 0.76484218728448842625586, step error=5.368009690718806448531807e-24
t=0.8
u= 0.7173560908995227616271302, sin(t)= 0.7173560908995227616271746, step error=-5.806451504735070057940143e-24
v= 0.6967067093471654209207975, cos(t)= 0.69670670934716542092075, step error=5.977494663044775070749797e-24
t=0.9
u= 0.7833269096274833884613378, sin(t)= 0.7833269096274833884613823, step error=-5.180615155476414729415392e-24
v= 0.621609968270664456484775, cos(t)= 0.6216099682706644564847162, step error=6.527225370323768115169452e-24
t=1.0
u= 0.8414709848078965066524596, sin(t)= 0.8414709848078965066525023, step error=-4.503113625506337452188567e-24
v= 0.540302305868139717401006, cos(t)= 0.5403023058681397174009366, step error=7.011939642161084587215687e-24
One can see that the errors are restricted to the last 4 digits, as expected. The cumulative method error is about 1e-23 over each of the sub-intervals, thus in total less than 1e-22, about the same as the expected floating point noise, which in sum of the quasi-random rounding or arithmetic operations will influence at maximum 5 digits, on average about half of that.
Higher order methods will allow larger step sizes, thus less steps, which not only reduces the computation time but also the accumulation of rounding errors. Thus requiring less additional digits for the working precision, which also reduces (slightly) the computation time.
For heavy computations like this, one should use a compiled language. The strong typing and transitioning from garbage collection to active memory management will make the code look less easy to read, like in the code cited in How are the trigonometric functions tested in the GNU C Library?

Related

Conditional sampling of binary vectors (?)

I'm trying to find a name for my problem, so I don't have to re-invent wheel when coding an algorithm which solves it...
I have say 2,000 binary (row) vectors and I need to pick 500 from them. In the picked sample I do column sums and I want my sample to be as close as possible to a pre-defined distribution of the column sums. I'll be working with 20 to 60 columns.
A tiny example:
Out of the vectors:
110
010
011
110
100
I need to pick 2 to get column sums 2, 1, 0. The solution (exact in this case) would be
110
100
My ideas so far
one could maybe call this a binary multidimensional knapsack, but I did not find any algos for that
Linear Programming could help, but I'd need some step by step explanation as I got no experience with it
as exact solution is not always feasible, something like simulated annealing brute force could work well
a hacky way using constraint solvers comes to mind - first set the constraints tight and gradually loosen them until some solution is found - given that CSP should be much faster than ILP...?
My concrete, practical (if the approximation guarantee works out for you) suggestion would be to apply the maximum entropy method (in Chapter 7 of Boyd and Vandenberghe's book Convex Optimization; you can probably find several implementations with your favorite search engine) to find the maximum entropy probability distribution on row indexes such that (1) no row index is more likely than 1/500 (2) the expected value of the row vector chosen is 1/500th of the predefined distribution. Given this distribution, choose each row independently with probability 500 times its distribution likelihood, which will give you 500 rows on average. If you need exactly 500, repeat until you get exactly 500 (shouldn't take too many tries due to concentration bounds).
Firstly I will make some assumptions regarding this problem:
Regardless whether the column sum of the selected solution is over or under the target, it weighs the same.
The sum of the first, second, and third column are equally weighted in the solution (i.e. If there's a solution whereas the first column sum is off by 1, and another where the third column sum is off by 1, the solution are equally good).
The closest problem I can think of this problem is the Subset sum problem, which itself can be thought of a special case of Knapsack problem.
However both of these problem are NP-Complete. This means there are no polynomial time algorithm that can solve them, even though it is easy to verify the solution.
If I were you the two most arguably efficient solution of this problem are linear programming and machine learning.
Depending on how many columns you are optimising in this problem, with linear programming you can control how much finely tuned you want the solution, in exchange of time. You should read up on this, because this is fairly simple and efficient.
With Machine learning, you need a lot of data sets (the set of vectors and the set of solutions). You don't even need to specify what you want, a lot of machine learning algorithms can generally deduce what you want them to optimise based on your data set.
Both solution has pros and cons, you should decide which one to use yourself based on the circumstances and problem set.
This definitely can be modeled as (integer!) linear program (many problems can). Once you have it, you can use a program such as lpsolve to solve it.
We model vector i is selected as x_i which can be 0 or 1.
Then for each column c, we have a constraint:
sum of all (x_i * value of i in column c) = target for column c
Taking your example, in lp_solve this could look like:
min: ;
+x1 +x4 +x5 >= 2;
+x1 +x4 +x5 <= 2;
+x1 +x2 +x3 +x4 <= 1;
+x1 +x2 +x3 +x4 >= 1;
+x3 <= 0;
+x3 >= 0;
bin x1, x2, x3, x4, x5;
If you are fine with a heuristic based search approach, here is one.
Go over the list and find the minimum squared sum of the digit wise difference between each bit string and the goal. For example, if we are looking for 2, 1, 0, and we are scoring 0, 1, 0, we would do it in the following way:
Take the digit wise difference:
2, 0, 1
Square the digit wise difference:
4, 0, 1
Sum:
5
As a side note, squaring the difference when scoring is a common method when doing heuristic search. In your case, it makes sense because bit strings that have a 1 in as the first digit are a lot more interesting to us. In your case this simple algorithm would pick first 110, then 100, which would is the best solution.
In any case, there are some optimizations that could be made to this, I will post them here if this kind of approach is what you are looking for, but this is the core of the algorithm.
You have a given target binary vector. You want to select M vectors out of N that have the closest sum to the target. Let's say you use the eucilidean distance to measure if a selection is better than another.
If you want an exact sum, have a look at the k-sum problem which is a generalization of the 3SUM problem. The problem is harder than the subset sum problem, because you want an exact number of elements to add to a target value. There is a solution in O(N^(M/2)). lg N), but that means more than 2000^250 * 7.6 > 10^826 operations in your case (in the favorable case where vectors operations have a cost of 1).
First conclusion: do not try to get an exact result unless your vectors have some characteristics that may reduce the complexity.
Here's a hill climbing approach:
sort the vectors by number of 1's: 111... first, 000... last;
use the polynomial time approximate algorithm for the subset sum;
you have an approximate solution with K elements. Because of the order of elements (the big ones come first), K should be a little as possible:
if K >= M, you take the M first vectors of the solution and that's probably near the best you can do.
if K < M, you can remove the first vector and try to replace it with 2 or more vectors from the rest of the N vectors, using the same technique, until you have M vectors. To sumarize: split the big vectors into smaller ones until you reach the correct number of vectors.
Here's a proof of concept with numbers, in Python:
import random
def distance(x, y):
return abs(x-y)
def show(ls):
if len(ls) < 10:
return str(ls)
else:
return ", ".join(map(str, ls[:5]+("...",)+ls[-5:]))
def find(is_xs, target):
# see https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution
S = [(0, ())] # we store indices along with values to get the path
for i, x in is_xs:
T = [(x + t, js + (i,)) for t, js in S]
U = sorted(S + T)
y, ks = U[0]
S = [(y, ks)]
for z, ls in U:
if z == target: # use the euclidean distance here if you want an approximation
return ls
if z != y and z < target:
y, ks = z, ls
S.append((z, ls))
ls = S[-1][1] # take the closest element to target
return ls
N = 2000
M = 500
target = 1000
xs = [random.randint(0, 10) for _ in range(N)]
print ("Take {} numbers out of {} to make a sum of {}", M, xs, target)
xs = sorted(xs, reverse = True)
is_xs = list(enumerate(xs))
print ("Sorted numbers: {}".format(show(tuple(is_xs))))
ls = find(is_xs, target)
print("FIRST TRY: {} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
splits = 0
while len(ls) < M:
first_x = xs[ls[0]]
js_ys = [(i, x) for i, x in is_xs if i not in ls and x != first_x]
replace = find(js_ys, first_x)
splits += 1
if len(replace) < 2 or len(replace) + len(ls) - 1 > M or sum(xs[i] for i in replace) != first_x:
print("Give up: can't replace {}.\nAdd the lowest elements.")
ls += tuple([i for i, x in is_xs if i not in ls][len(ls)-M:])
break
print ("Replace {} (={}) by {} (={})".format(ls[:1], first_x, replace, sum(xs[i] for i in replace)))
ls = tuple(sorted(ls[1:] + replace)) # use a heap?
print("{} elements ({}) -> {}".format(len(ls), show(ls), sum(x for i, x in is_xs if i in ls)))
print("AFTER {} splits, {} -> {}".format(splits, ls, sum(x for i, x in is_xs if i in ls)))
The result is obviously not guaranteed to be optimal.
Remarks:
Complexity: find has a polynomial time complexity (see the Wikipedia page) and is called at most M^2 times, hence the complexity remains polynomial. In practice, the process is reasonably fast (split calls have a small target).
Vectors: to ensure that you reach the target with the minimum of elements, you can improve the order of element. Your target is (t_1, ..., t_c): if you sort the t_js from max to min, you get the more importants columns first. You can sort the vectors: by number of 1s and then by the presence of a 1 in the most important columns. E.g. target = 4 8 6 => 1 1 1 > 0 1 1 > 1 1 0 > 1 0 1 > 0 1 0 > 0 0 1 > 1 0 0 > 0 0 0.
find (Vectors) if the current sum exceed the target in all the columns, then you're not connecting to the target (any vector you add to the current sum will bring you farther from the target): don't add the sum to S (z >= target case for numbers).
I propose a simple ad hoc algorithm, which, broadly speaking, is a kind of gradient descent algorithm. It seems to work relatively well for input vectors which have a distribution of 1s “similar” to the target sum vector, and probably also for all “nice” input vectors, as defined in a comment of yours. The solution is not exact, but the approximation seems good.
The distance between the sum vector of the output vectors and the target vector is taken to be Euclidean. To minimize it means minimizing the sum of the square differences off sum vector and target vector (the square root is not needed because it is monotonic). The algorithm does not guarantee to yield the sample that minimizes the distance from the target, but anyway makes a serious attempt at doing so, by always moving in some locally optimal direction.
The algorithm can be split into 3 parts.
First of all the first M candidate output vectors out of the N input vectors (e.g., N=2000, M=500) are put in a list, and the remaining vectors are put in another.
Then "approximately optimal" swaps between vectors in the two lists are done, until either the distance would not decrease any more, or a predefined maximum number of iterations is reached. An approximately optimal swap is one where removing the first vector from the list of output vectors causes a maximal decrease or minimal increase of the distance, and then, after the removal of the first vector, adding the second vector to the same list causes a maximal decrease of the distance. The whole swap is avoided if the net result is not a decrease of the distance.
Then, as a last phase, "optimal" swaps are done, again stopping on no decrease in distance or maximum number of iterations reached. Optimal swaps cause a maximal decrease of the distance, without requiring the removal of the first vector to be optimal in itself. To find an optimal swap all vector pairs have to be checked. This phase is much more expensive, being O(M(N-M)), while the previous "approximate" phase is O(M+(N-M))=O(N). Luckily, when entering this phase, most of the work has already been done by the previous phase.
from typing import List, Tuple
def get_sample(vects: List[Tuple[int]], target: Tuple[int], n_out: int,
max_approx_swaps: int = None, max_optimal_swaps: int = None,
verbose: bool = False) -> List[Tuple[int]]:
"""
Get a sample of the input vectors having a sum close to the target vector.
Closeness is measured in Euclidean metrics. The output is not guaranteed to be
optimal (minimum square distance from target), but a serious attempt is made.
The max_* parameters can be used to avoid too long execution times,
tune them to your needs by setting verbose to True, or leave them None (∞).
:param vects: the list of vectors (tuples) with the same number of "columns"
:param target: the target vector, with the same number of "columns"
:param n_out: the requested sample size
:param max_approx_swaps: the max number of approximately optimal vector swaps,
None means unlimited (default: None)
:param max_optimal_swaps: the max number of optimal vector swaps,
None means unlimited (default: None)
:param verbose: print some info if True (default: False)
:return: the sample of n_out vectors having a sum close to the target vector
"""
def square_distance(v1, v2):
return sum((e1 - e2) ** 2 for e1, e2 in zip(v1, v2))
n_vec = len(vects)
assert n_vec > 0
assert n_out > 0
n_rem = n_vec - n_out
assert n_rem > 0
output = vects[:n_out]
remain = vects[n_out:]
n_col = len(vects[0])
assert n_col == len(target) > 0
sumvect = (0,) * n_col
for outvect in output:
sumvect = tuple(map(int.__add__, sumvect, outvect))
sqdist = square_distance(sumvect, target)
if verbose:
print(f"sqdist = {sqdist:4} after"
f" picking the first {n_out} vectors out of {n_vec}")
if max_approx_swaps is None:
max_approx_swaps = sqdist
n_approx_swaps = 0
while sqdist and n_approx_swaps < max_approx_swaps:
# find the best vect to subtract (the square distance MAY increase)
sqdist_0 = None
index_0 = None
sumvect_0 = None
for index in range(n_out):
tmp_sumvect = tuple(map(int.__sub__, sumvect, output[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_0 is None or sqdist_0 > tmp_sqdist:
sqdist_0 = tmp_sqdist
index_0 = index
sumvect_0 = tmp_sumvect
# find the best vect to add,
# but only if there is a net decrease of the square distance
sqdist_1 = sqdist
index_1 = None
sumvect_1 = None
for index in range(n_rem):
tmp_sumvect = tuple(map(int.__add__, sumvect_0, remain[index]))
tmp_sqdist = square_distance(tmp_sumvect, target)
if sqdist_1 > tmp_sqdist:
sqdist_1 = tmp_sqdist
index_1 = index
sumvect_1 = tmp_sumvect
if sumvect_1:
tmp = output[index_0]
output[index_0] = remain[index_1]
remain[index_1] = tmp
sqdist = sqdist_1
sumvect = sumvect_1
n_approx_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_approx_swaps}"
f" approximately optimal swap{'s'[n_approx_swaps == 1:]}")
diffvect = tuple(map(int.__sub__, sumvect, target))
if max_optimal_swaps is None:
max_optimal_swaps = sqdist
n_optimal_swaps = 0
while sqdist and n_optimal_swaps < max_optimal_swaps:
# find the best pair to swap,
# but only if the square distance decreases
best_sqdist = sqdist
best_diffvect = diffvect
best_pair = None
for i0 in range(M):
tmp_diffvect = tuple(map(int.__sub__, diffvect, output[i0]))
for i1 in range(n_rem):
new_diffvect = tuple(map(int.__add__, tmp_diffvect, remain[i1]))
new_sqdist = sum(d * d for d in new_diffvect)
if best_sqdist > new_sqdist:
best_sqdist = new_sqdist
best_diffvect = new_diffvect
best_pair = (i0, i1)
if best_pair:
tmp = output[best_pair[0]]
output[best_pair[0]] = remain[best_pair[1]]
remain[best_pair[1]] = tmp
sqdist = best_sqdist
diffvect = best_diffvect
n_optimal_swaps += 1
else:
break
if verbose:
print(f"sqdist = {sqdist:4} after {n_optimal_swaps}"
f" optimal swap{'s'[n_optimal_swaps == 1:]}")
return output
from random import randrange
C = 30 # number of columns
N = 2000 # total number of vectors
M = 500 # number of output vectors
F = 0.9 # fill factor of the target sum vector
T = int(M * F) # maximum value + 1 that can be appear in the target sum vector
A = 10000 # maximum number of approximately optimal swaps, may be None (∞)
B = 10 # maximum number of optimal swaps, may be None (unlimited)
target = tuple(randrange(T) for _ in range(C))
vects = [tuple(int(randrange(M) < t) for t in target) for _ in range(N)]
sample = get_sample(vects, target, M, A, B, True)
Typical output:
sqdist = 2639 after picking the first 500 vectors out of 2000
sqdist = 9 after 27 approximately optimal swaps
sqdist = 1 after 4 optimal swaps
P.S.: As it stands, this algorithm is not limited to binary input vectors, integer vectors would work too. Intuitively I suspect that the quality of the optimization could suffer, though. I suspect that this algorithm is more appropriate for binary vectors.
P.P.S.: Execution times with your kind of data are probably acceptable with standard CPython, but get better (like a couple of seconds, almost a factor of 10) with PyPy. To handle bigger sets of data, the algorithm would have to be translated to C or some other language, which should not be difficult at all.

Particle Dynamics

I am modelling a particle in 3D space.
{0} The particle starts at time t0 from a known position P0 with a velocity V0. The velocity is computed using its known previous position of P-1 at t-1.
{1} The particle is targeted to go to P1 at t1 with a known velocity of V1.
{..} The particle moves as fast as it can, without jerks (C1 continuous) bound by a set of constraints that clamp the acceleration along x, y and z independently. The maximum acceleration/deceleration along x, y and z are known and are Xa, Ya, and Za. The max rate of change of acceleration along x, y and z are defined by Xr, Yr, and Zr.
{n} After an unknown number of time steps it reaches Pn at some time (say tn) with a velocity of Vn.
{n+1} It moves to Pn+1 at tn+1.
The problem I have is to compute the minimum time for the transition from P0 to Pn and to generate the intermediate positions and velocity directions thereof. A secondary goal is to accelerate smoothly instead of applying acceleration that results in jerks.
Current Approach:
find the dimension {x, y or z} that will take the longest to align from start P0 to end Pn. This will be the critical dimension and will determine the total time. This is fairly straightforward and I can write something to this effect.
interpolate smoothly without jitters from P0 to Pn in all dimensions such that the velocity at Pn is as expected. I am not sure, how to approach this.
Any inputs/physics engines that already do this will be useful. It is a commercial project and I cannot put dependencies on large 3rd party libraries with restrictive licenses.
Note: Particle at P0 and Pn have little or no acceleration.
If I understand correctly, you have a point (P0, V0), with V0 = P0 - P-1, and a point (Pn, Vn), with Vn = Pn - Pn-1, and you want to find the fewest intermediate points by adjusting the acceleration at each time step.
Let's define the acceleration at ti: Ai = Vi - Vi-1, with abs(Ai) <= mA. Here, since the problem is axis-independant, abs is the member-wise absolute instead of the norm (or vector magnitude), and mA is the maximum acceleration vector, positive in each dimension. Let's also consider that Pn > P0 (member-wise).
From that, we get Vi = Vi-1 + Ai and so Pi = Pi-1 + Vi-1 + Ai.
If you need to go from some point to another in the fastest way possible, the obvious thing to do, whatever the initial velocity, is accelerate as much as possible until you reach the goal. However, since your problem is discrete and you have a terminal velocity Vn, using that method will probably lead too far and with a different terminal velocity.
However, you can do the same thing in reverse, starting from the end point. And if you start simultaneously from both points, you will make two paths crossing each other in each dimension (not necessarily crossing in 3D, but, in each dimension, the relative direction of both paths changes at some "crossing" point).
Let's take a one-dimensional example. (P0, V0) = (0, -2) and (Pn, Vn) = (35, -1), and mA = 1.
The first path, with Ai = mA, goes like this:
(0, -2) -> (-1, -1) -> (-1, 0) -> (0, 1) -> (2, 2) ->
(5, 3) -> (9, 4) -> (14, 5) -> (20, 6) -> (27, 7) -> ...
The second path, with Ai = -mA but in reverse, goes like this:
(35, -1) <- (36, 0) <- (36, 1) <- (35, 2) <- (33, 3) <-
(30, 4) <- (26, 5) <- (21, 6) <- (15, 7) <- ...
You can see the paths cross with the same velocity somewhere between 20 and 21. That gives you the fastest acceleration and deceleration parts of the path you need, but the two parts aren't connected. However, it's easy to connect them by finding the closest points of same velocity; let's call these points Pq and Pr. Here, Pq = (20, 6) and Pr = (21, 6). Since that velocity is calculated between current and previous points, take the point before Pq (Pq-1, or (14, 5) in the example) and the point Pr, and try connecting them.
If Pq >= Pr >= Pq - 2mA, then you can connect them directly by taking Pq-1 unchanged, and Pr with Vr = Pr - Pq-1.
Else, take Pq-2 and Pr-1 (where Vr-1 = Vr - mA, because it's in reverse) and try connecting those by adding intermediate points. Since these points have a velocity difference of mA, you can search only for intermediate points with the same velocity Vs such that Vq-2 <= Vs <= Vr-1.
If you still can't find a solution, then take Pq-3 and Pr-2 and repeat the process with more intermediate points.
In the example I took, Pq < Pr, so we have to try with Pq-2 = (9, 4) and Pr-1 = (26, 5). We can connect those with a sequence of 3 points, for example (9, 4) -> (13, 4) -> (17, 4) -> (21, 4) -> (26, 5).
In any case, this method will give you the smallest amount of intermediate points, meaning the fastest path between P0 and Pn.
If you then want to reduce jerk, then you can forget the points calculated previously and do an interpolation with the number of points you now know to be minimal.
After playing around with some ideas, I came up with another solution, more accurate and probably faster, if done correctly, than that of my previous answer. It is however quite complicated and requires quite a bit of maths, although not very complex maths. Moreover, this is a work in progress: I am still investigating some areas. Nonetheless, from what I've tried, it does already produce very good results.
The problem
Definitions and goal
Throughout this answer, p[n] refers to the position of the nth point, v[n] to its velocity, a[n] to its acceleration, and j[n] to its jerk (the derivative of acceleration). The velocity of the nth point depends only on its position and that of the previous point. Similarly for acceleration and jerk, but with the points velocity and acceleration, respectively.
We have a start point and an end point, respectively p[0] and p[n], both with associated velocities v[0] and v[n]. The goal is to place n-1 points in between, with an arbitrary n, such that, along the X, Y, and Z axes, the absolute values of acceleration and jerk at any of these points (and at p[n]) are below some limits, respectively aMaxX, aMaxY, and aMaxZ for acceleration, and jMaxX, jMaxY, and jMaxZ for jerk.
What we want to find is the values of p[i] for all i ∈ [1; n-1]. Because p[i] = p[i-1] + v[i], this is the same as finding v[i]. By the same reasoning, with v[i] = v[i-1] + a[i] and a[i] = a[i-1] + j[i], it is also the same as finding a[i] or j[i].
a[0] and a[n+1] are assumed to be zero.
Observations and simplifications
Because the problem's constraints are independant of the dimension, we can solve for each of the three dimensions separately, as long as the number of points obtained in each case is the same. Therefore, I am only going to solve the one-dimensional version of the problem, using aMax and jMax, irrespective of the axis.
*[WIP]* Determine the worst case to solve first, then solve the other ones, knowing the number of points.
The actual positions of the two given points are irrelevant, what matters is the relative distance between them, which we can define as P = p[n] - p[0]. Let's also define the ranges R = [1; n] and R* = [1; n+1].
Because of the discrete nature of the problem, we can obtain the following equations. Note that ∑{i∈R}(x[i]) is the sum of all x[i] for i∈R.
Ⓐ ∑{i∈R}(v[i]) = P
Ⓑ ∑{i∈R}(a[i]) = v[n] - v[0]
Ⓧ ∑{i∈R*}(j[i]) = 0
Ⓧ comes from the assumption that a[0] = a[n+1] = 0.
From Ⓐ and v[i] = v[i-1] + a[i], i∈R, we can deduce:
Ⓒ ∑{i∈R}((n+1-i)*a[i]) = P - n*v[0]
By the same logic, from Ⓑ, Ⓒ, and a[i] = a[i-1] + j[i], i∈R, we can deduce:
Ⓨ ∑{i∈R}((n+1-i)*j[i]) = v[n] - v[0]
Ⓩ ∑{i∈R}(T[n+1-i]*j[i]) = P - n*v[0]
Here, T[n] is the nth triangular number, defined by T[n] = n*(n+1)/2.
The equations Ⓧ, Ⓨ, and Ⓩ are the relevant ones for the next parts.
The approach
In order to minimize n, we can start with a small value of n (1, 2?) and find a solution. Then, if max{i∈R}(abs(a[i])) > aMax or max{i∈R}(abs(j[i])) > jMax, we can increment n and repeat the process.
*[WIP]* Find a lower bound for n to avoid unnecessary calculations from small values of n. Or estimate the correct value of n and pinpoint it by testing solutions.
Finding a solution requires finding the values of j[i] for all i∈R*. I have yet to find an optimal form for j[i], but defining j*[i], r[i] and s[i] such that
j[i] = j*[i] + r[i]v[0] + s[i]v[n]
works quite well.
*[WIP]* Find a better form for j[i]
By doing that, we transform our n-1 unknowns (j[i], i∈R, note that j[n+1] = -∑{i∈R}(j[i])) into 3(n-1) easier to find unknowns. Here are a few things we can deduce right now from Ⓧ, Ⓨ, and Ⓩ.
∑{i∈R*}(r[i]) = 0
∑{i∈R*}(s[i]) = 0
∑{i∈R}((n+1-i)*r[i]) = -1
∑{i∈R}((n+1-i)*s[i]) = 1
∑{i∈R}(T(n+1-i)*r[i]) = -n
∑{i∈R}(T(n+1-i)*s[i]) = 0
As a reminder, here are Ⓧ, Ⓨ, and Ⓩ.
Ⓧ ∑{i∈R*}(j[i]) + j[n+1] = 0
Ⓨ ∑{i∈R}((n+1-i)*j[i]) = v[n] - v[0]
Ⓩ ∑{i∈R}(T[n+1-i]*j[i]) = P - n*v[0]
The goal now is to find adequate special cases to help us determine these unknowns.
The special cases
v[0] = v[n] = 0
By playing with values of jerk, I observed that taking all of j[i], i∈R* as part of a parabola yields excellent results for minimizing both jerk and acceleration. Although it isn't the best possible fit, I haven't found better yet.
The intuition behind values of jerk coming from a parabola is that, if the values of position are to follow a polynomial, then its degree must be at least 5, and can be 5. This is easier to understand if you think about the values of velocity following a 4th degree polynomial. The constraints that v[0] and v[n] are set, a[0] = a[n+1] = 0, and that its integral over [0; n] must equal P, this polynomial must have a degree of at least 4. This holds for both continuous and dicrete cases. Finally, it seems that taking the smallest degree leads to a smoother jerk as well as making it easier to calculate.
Here is an example of a continuous case where the position is in purple, the velocity in blue, the acceleration in yellow and the jerk in red.
In case you want to play with this, here is how to define the position curve in terms of n, p[0], p[n], v[0], and v[n] (the other ones are simply derivatives).
a = (-3(v[n]+v[0]) + 6(p[n]-p[0])) / n^5
b = (n(7v[n]+8v[0]) - 15(p[n]-p[0])) / n^4
c = (-n(4v[n]+6v[0]) + 10(p[n]-p[0])) / n^3
p[x] = ax^5 + bx^4 + cx^3 + v[0]x + p[0]
If v[0] = v[n] = 0, then j[i] = j*[i], i∈R*. That means that the values j*[i] follow a quadratic polynomial. So we want to find α, β, and γ such that Ⓟ holds.
Ⓟ j*[i] = αi^2 + βi + γ, i∈R*
From Ⓧ, Ⓨ, and Ⓩ follow these equations.
α*∑{i∈R*}(i^2) + β*∑{i∈R*}(i) + c*∑{i∈R*}(1) = 0
α*∑{i∈R}((n+1-i)*i^2) + β*∑{i∈R}((n+1-i)*i) + c*∑{i∈R}(n+1-i) = 0
α*∑{i∈R}(T(n+1-i)*i^2) + β*∑{i∈R}(T(n+1-i)*i) + c*∑{i∈R}(T(n+1-i)) = P
Solving this system gives α, β, and γ, which can be used with Ⓟ to calculate j*[i], i∈R*. Note that j*[i] = j*[n+2-i], so only the upper half of the calculations need to be done.
v[0] = v[n] = 1/n
If v[0] = v[n] = 1/n, then j[i] = 0, i∈R*. This means that Ⓠ holds.
Ⓠ r[i] + s[i] = -n*j[i], i∈R*
v[0] = 0, j[i∈L] = J, j[h] = 0, j[i∈U] = -J
L and U are respectively the lower and upper halves of R*, and h is the value in between, if n+1 is odd. In other words:
if n is odd:
L = [1; (n+1)/2]
U = [(n+3)/2; n+1]
if n is even:
L = [1; n/2]
h = n/2+1
U = [n/2+2; n]
This special case corresponds to the maximum overall acceleration between p[0] and p[n] while minimizing abs(j[i]), i∈R*. Here, Ⓩ gives us the following equation.
∑{i∈R}(T[n+1-i]*j[i]) = P
∑{i∈L}(T[n+1-i])*j[1] + ∑{i∈U}(T[n+1-i])*j[n+1] = P
j[1] = P / [ ∑{i∈L}(T[n+1-i]) - ∑{i∈U}(T[n+1-i]) ]
This gives j[1], and so every j[i], i∈R*. We can then calculate v[n] using Ⓨ.
Putting the pieces together
Each special case gives us, for some values of v[0], v[n] and P, a relation of the form
αj*[i] + βr[i] + γs[i] = δ.
By treating three special cases (assuming they are not similar, meaning the do not give the same relation), we have a system of three equations that, once solved, gives the values of j*[i], r[i] and s[i] for all i∈R*.
As a result, we can calculate, for each value of n, values of j[i] depending on v[0], v[n] and P. They can be precalculated, which means testing them for any value of n can be very fast. Thereby, we can very quicklyt find a good estimate for the fewest amount of points needed in the trajectory, as well as a good approximation of the best trajectory possible, as long as we have precalculated values up to a sufficiently large value of n.
Answer
I suggest you to take following function :
X(n) = Xstart + Vxstart n+ (-6xstart+3Vxstart+6xend-3Vxend+c/2) n^2 + (8xstart+3Vxstart-8xend+5Vxend-c) n^3 + (-3Xstart-Vxstart+3xend-2Vxend+c/2) n^4
(for each coordinate X,Y,Z)
Here are some graphs of what this gives, I took c=3 for each samples:
For xstart=1, vstart=1, xend=3, vstart=-2, this gives :
X(n)= 1 + n + 16 n^2 -25 n^3 + 10 n^4
For xstart = -4, vstart =-4, xend = 4, vend = 0, this gives :
(-4 -4n +61n^2 -78n^3 + 29yn^4)
where c is a number from 0.1 to 5, it is up to you to decide, the higher c will be, the faster the function will go to that point (but it might have to turn back if c > 4). (See graphs below).
The polynomial comes from following calculation : where a=x0,b=v0,c=xe,d=v2,e=the magic constant
Explanation
Based on Nelfeal's answer, my idea was to try to solve the given problem with polynomials.
We can change the problem as to define a new Axis which goes in the P[last]-P[0], to have the problem reduced to dimension 1.
We can think about the problem in continuous mathematics instead of discrete mathematics (eg use functions instead of sequences), and go back to the discrete world which is just a special case of the continuous.
We can change the unit for time and space so that the time is 1 and the distance is 1, so that the problem is simplified to
Find a function 𝒇 which satisfies the following :
𝒇(0) = 0 and 𝒇(1) = 1
𝒇'(0) = 0 and 𝒇'(1) = 0
For x∈ℝ |𝒇''(x)| < c, where c is the max speed
We have
P(X) = ∑{i∈ℕ} Ai Xi
P'(X) = ∑{i∈ℕ} (i+1) Ai+1 Xi
P''(X) = ∑{i∈ℕ} (i+2)(i+1) Ai+2 Xi
We need :
P(0) = 0
P(1) = 1
P'(0) = 0
P'(1) = 0
-c <= P''(x) <= c
Thus it means :
a0 = 0 (from 1.)
a1 = 0 (from 3.)
P(1) = ∑{i∈ℕ} Ai = 1
P'(1) = ∑{i∈ℕ} (i+1) Ai = 0
P''(x) = ∑{i∈ℕ} (i+2)(i+1) Ai Xi in [-c,c]
The third equation is the most complex one, and can be simplified by saying that P(1) = c.
We will have c vary to see what changes.
After inverting a 3x3 matrix, we get following result :
P(x) = (c/2+6) x^2 - (c+8) x^3 + (c/2+3) x^4
For c=0.15, this gives :
For c=1, this gives:
For c=4, we see a bounce back :
If we take c from 0.1 to 6, we get following 3d graph :
Note that we have solved this for polynomals of degree 4, but you might do the same things to higher degrees (up to 10 if you want to) to get more possibilities in your functions.

Why the algorithm fails when increase the number precision? How can we decrease the sensitivity of the algorithm to the number precision?

I am using Newton Raphson +successive Substitute algorithm to perform flash calculation(chemical process simulation).
The algorithm can converge well when the input is in low precision like 0.1, but when the number precision is increased to 0.11111 or 0.99999. The algorithm will not converge.
When I am using the quasi newton method with BFGS update, the same problem occurs again. How can we decrease the sensitivity of the code to the numerical precision?
Here is a simple example using matlab to solve the Rachford-Rice equation. When the comp_overall=[0.9,1-0.9], it converges well. However, when the number precision increase to like[0.99999,1-0.99999]. It will not converge.
K=[0.053154011443159 34.234731216532658],
comp_overall= [0.99999 1- 0.99999], phi=0.5; %initial values
epsilon = 1.0;
iter1 = 1;
while (epsilon >=1.e-05)
rc=0.0;
drc=0.0;
for i=1:2
% Rachford-Rice Equation
rc = comp_overall(i)*(K(i)-1.0)/(1.0+phi*(K(i)-1.0))+rc;
% Derivative
drc = comp_overall(i)*(K(i)-1.0)^2/(1.0+phiK(i)-1.0))^2+drc;
end
% Newton-Raphson
phi1 = phi +0.01 (rc / drc);
epsilon = abs( (phi1-phi)/phi );
% Convergence
phi = phi1;
iter1=iter1+1;
end
The Newton–Raphson method relies on the function being differentiable between any two consecutive approximations. Depending on the choice of the initial value, this may not be the case for z₁ = 0.99999. Let's look at the graph of the Rachford-Rice function:
The root of this function is φ₀ ≈ –0.0300781429 and the nearest point of discontinuity is –1/(K₂-1) ≈ –0.0300890052. They are close enough for the Newton–Raphson method to overshoot, to jump over that discontinuity.
For example:
φ₁ = –0.025
f(φ₁) ≈ -0.9229770571
f'(φ₁) ≈ 1.2416569960
φ₂ = φ₁ + 0.01 * f(φ₁) / f'(φ₁) ≈ -0.0324334302
φ₂ lies to the left of the discontinuity, so the following steps will be away from, not towards the root.
φ₃ = -0.0358986759 < φ₂
What can be done about it:
When the algorithm fails to converge, repeat it with smaller steps. For example, start with the coefficient 0.01 (as it is now) and decrease it 10 times after every failure.
Detect overshoots. On each iteration check if there is a discontinuity point (–1/(Kᵢ-1)) between the current approximation and the previous one. When it happens, discard the current approximation, decrease the coefficient and continue.
Limit the scope of the search. Are solutions outside of [0, 1] physically meaningful? If not, you can stop once the approximated value falls out of that range.
Use different method. The function is monotonic on any interval between two consecutive discontinuity points, so you can perform binary search on each such interval. It will be both faster and more robust than the Newton–Raphson method.

how to avoid repeating linear regression procedures when adding new points

I know how to do to do regression on a set of N samples. However, my project is about doing the linear regression of the first 2, 3, 4 ... k, k+1, ... N samples respectively. Instead of repeating the same procedure when adding a new sample, is there a faster method that I can use the previous result (or intermediate results) to solve the regression after adding a new point? Thank you.
In linear least squares method coefficients of approximation line are calculated using these formulas:
a = (N * Sum(Xi*Yi) - Sum(Xi)*Sum(Yi)) / (n * Sum(Xi^2) - (Sum(Xi))^2)
b = (Sum(Yi) - a * Sum(Xi)) / N
So you can store the values of Nth sums
Sum(Xi*Yi)
Sum(Xi)
Sum(Yi)
Sum(Xi^2)
and update them at (N+1)th step.
Sum(Xi)[N+1] = Sum(Xi)[N] + X(N+1)
Sum(Xi*Yi)[N+1] = Sum(Xi*Yi)[N] + X(N+1)*Y(N+1)
and so on, and calculate new coefficients values.
Note: such algorithms are called 'running' or 'online' - see analog for std deviation

convert real number to radicals

Suppose I have a real number. I want to approximate it with something of the form a+sqrt(b) for integers a and b. But I don't know the values of a and b. Of course I would prefer to get a good approximation with small values of a and b. Let's leave it undefined for now what is meant by "good" and "small". Any sensible definitions of those terms will do.
Is there a sane way to find them? Something like the continued fraction algorithm for finding fractional approximations of decimals. For more on the fractions problem, see here.
EDIT: To clarify, it is an arbitrary real number. All I have are a bunch of its digits. So depending on how good of an approximation we want, a and b might or might not exist. Brute force is naturally not a particularly good algorithm. The best I can think of would be to start adding integers to my real, squaring the result, and seeing if I come close to an integer. Pretty much brute force, and not a particularly good algorithm. But if nothing better exists, that would itself be interesting to know.
EDIT: Obviously b has to be zero or positive. But a could be any integer.
No need for continued fractions; just calculate the square-root of all "small" values of b (up to whatever value you feel is still "small" enough), remove everything before the decimal point, and sort/store them all (along with the b that generated it).
Then when you need to approximate a real number, find the radical whose decimal-portion is closet to the real number's decimal-portion. This gives you b - choosing the correct a is then a simple matter of subtraction.
This is actually more of a math problem than a computer problem, but to answer the question I think you are right that you can use continued fractions. What you do is first represent the target number as a continued fraction. For example, if you want to approximate pi (3.14159265) then the CF is:
3: 7, 15, 1, 288, 1, 2, 1, 3, 1, 7, 4 ...
The next step is create a table of CFs for square roots, then you compare the values in the table to the fractional part of the target value (here: 7, 15, 1, 288, 1, 2, 1, 3, 1, 7, 4...). For example, let's say your table had square roots for 1-99 only. Then you would find the closest match would be sqrt(51) which has a CF of 7: 7,14 repeating. The 7,14 is the closest to pi's 7,15. Thus your answer would be:
sqrt(51)-4
As the closest approximation given a b < 100 which is off by 0.00016. If you allow larger b's then you could get a better approximation.
The advantage of using CFs is that it is faster than working in, say, doubles or using floating point. For example, in the above case you only have to compare two integers (7 and 15), and you can also use indexing to make finding the closest entry in the table very fast.
This can be done using mixed integer quadratic programming very efficiently (though there are no run-time guarantees as MIQP is NP-complete.)
Define:
d := the real number you wish to approximate
b, a := two integers such that a + sqrt(b) is as "close" to d as possible
r := (d - a)^2 - b, is the residual of the approximation
The goal is to minimize r. Setup your quadratic program as:
x := [ s b t ]
D := | 1 0 0 |
| 0 0 0 |
| 0 0 0 |
c := [0 -1 0]^T
with the constraint that s - t = f (where f is the fractional part of d)
and b,t are integers (s is not)
This is a convex (therefore optimally solvable) mixed integer quadratic program since D is positive semi-definite.
Once s,b,t are computed, simply derive the answer using b=b, s=d-a and t can be ignored.
Your problem may be NP-complete, it would be interesting to prove if so.
Some of the previous answers use methods that are of time or space complexity O(n), where n is the largest “small number” that will be accepted. By contrast, the following method is O(sqrt(n)) in time, and O(1) in space.
Suppose that positive real number r = x + y, where x=floor(r) and 0 ≤ y < 1. We want to approximate r by a number of the form a + √b. If x+y ≈ a+√b then x+y-a ≈ √b, so √b ≈ h+y for some integer offset h, and b ≈ (h+y)^2. To make b an integer, we want to minimize the fractional part of (h+y)^2 over all eligible h. There are at most √n eligible values of h. See following python code and sample output.
import math, random
def findb(y, rhi):
bestb = loerror = 1;
for r in range(2,rhi):
v = (r+y)**2
u = round(v)
err = abs(v-u)
if round(math.sqrt(u))**2 == u: continue
if err < loerror:
bestb, loerror = u, err
return bestb
#random.seed(123456) # set a seed if testing repetitively
f = [math.pi-3] + sorted([random.random() for i in range(24)])
print (' frac sqrt(b) error b')
for frac in f:
b = findb(frac, 12)
r = math.sqrt(b)
t = math.modf(r)[0] # Get fractional part of sqrt(b)
print ('{:9.5f} {:9.5f} {:11.7f} {:5.0f}'.format(frac, r, t-frac, b))
(Note 1: This code is in demo form; the parameters to findb() are y, the fractional part of r, and rhi, the square root of the largest small number. You may wish to change usage of parameters. Note 2: The
if round(math.sqrt(u))**2 == u: continue
line of code prevents findb() from returning perfect-square values of b, except for the value b=1, because no perfect square can improve upon the accuracy offered by b=1.)
Sample output follows. About a dozen lines have been elided in the middle. The first output line shows that this procedure yields b=51 to represent the fractional part of pi, which is the same value reported in some other answers.
frac sqrt(b) error b
0.14159 7.14143 -0.0001642 51
0.11975 4.12311 0.0033593 17
0.12230 4.12311 0.0008085 17
0.22150 9.21954 -0.0019586 85
0.22681 11.22497 -0.0018377 126
0.25946 2.23607 -0.0233893 5
0.30024 5.29150 -0.0087362 28
0.36772 8.36660 -0.0011170 70
0.42452 8.42615 0.0016309 71
...
0.93086 6.92820 -0.0026609 48
0.94677 8.94427 -0.0024960 80
0.96549 11.95826 -0.0072333 143
0.97693 11.95826 -0.0186723 143
With the following code added at the end of the program, the output shown below also appears. This shows closer approximations for the fractional part of pi.
frac, rhi = math.pi-3, 16
print (' frac sqrt(b) error b bMax')
while rhi < 1000:
b = findb(frac, rhi)
r = math.sqrt(b)
t = math.modf(r)[0] # Get fractional part of sqrt(b)
print ('{:11.7f} {:11.7f} {:13.9f} {:7.0f} {:7.0f}'.format(frac, r, t-frac, b,rhi**2))
rhi = 3*rhi/2
frac sqrt(b) error b bMax
0.1415927 7.1414284 -0.000164225 51 256
0.1415927 7.1414284 -0.000164225 51 576
0.1415927 7.1414284 -0.000164225 51 1296
0.1415927 7.1414284 -0.000164225 51 2916
0.1415927 7.1414284 -0.000164225 51 6561
0.1415927 120.1415831 -0.000009511 14434 14641
0.1415927 120.1415831 -0.000009511 14434 32761
0.1415927 233.1415879 -0.000004772 54355 73441
0.1415927 346.1415895 -0.000003127 119814 164836
0.1415927 572.1415909 -0.000001786 327346 370881
0.1415927 911.1415916 -0.000001023 830179 833569
I do not know if there is any kind of standard algorithm for this kind of problem, but it does intrigue me, so here is my attempt at developing an algorithm that finds the needed approximation.
Call the real number in question r. Then, first I assume that a can be negative, in that case we can reduce the problem and now only have to find a b such that the decimal part of sqrt(b) is a good approximation of the decimal part of r. Let us now write r as r = x.y with x being the integer and y the decimal part.
Now:
b = r^2
= (x.y)^2
= (x + .y)^2
= x^2 + 2 * x * .y + .y^2
= 2 * x * .y + .y^2 (mod 1)
We now only have to find an x such that 0 = .y^2 + 2 * x * .y (mod 1) (approximately).
Filling that x into the formulas above we get b and can then calculate a as a = r - b. (All of these calculations have to be carefully rounded of course.)
Now, for the time being I am not sure if there is a way to find this x without brute forcing it. But even then, one can simple use a simple loop to find an x good enough.
I am thinking of something like this(semi pseudo code):
max_diff_low = 0.01 // arbitrary accuracy
max_diff_high = 1 - max_diff_low
y = r % 1
v = y^2
addend = 2 * y
x = 0
while (v < max_diff_high && v > max_diff_low)
x++;
v = (v + addend) % 1
c = (x + y) ^ 2
b = round(c)
a = round(r - c)
Now, I think this algorithm is fairly efficient, while even allowing you to specify the wished accuracy of the approximation. One thing that could be done that would turn it into an O(1) algorithm is calculating all the x and putting them into a lookup table. If one only cares about the first three decimal digits of r(for example), the lookup table would only have 1000 values, which is only 4kb of memory(assuming that 32bit integers are used).
Hope this is helpful at all. If anyone finds anything wrong with the algorithm, please let me know in a comment and I will fix it.
EDIT:
Upon reflection I retract my claim of efficiency. There is in fact as far as I can tell no guarantee that the algorithm as outlined above will ever terminate, and even if it does, it might take a long time to find a very large x that solves the equation adequately.
One could maybe keep track of the best x found so far and relax the accuracy bounds over time to make sure the algorithm terminates quickly, at the possible cost of accuracy.
These problems are of course non-existent, if one simply pre-calculates a lookup table.

Resources