Searching a 3D array for closest point satisfying a certain predicate - algorithm

I'm looking for an enumeration algorithm to search through a 3D array "sphering" around a given starting point.
Given an array a of size NxNxN where each N is 2^k for some k, and a point p in that array. The algorithm I'm looking for should do the following: If a[p] satisfies a certain predicate, the algorithm stops and p is returned. Otherwise the next point q is checked, where q is another point in the array that is the closest to p and hasn't been visited yet. If that doesn't match either, the next q'is checked an so on until in the worst case the whole array has been searched.
By "closest" here the perfect solution would be the point q that has the smallest Euclidean distance to p. As only discrete points have to be considered, perhaps some clever enumeration algorithm woukd make that possible. However, if this gets too complicated, the smallest Manhattan distance would be fine too. If there are several nearest points, it doesn't matter which one should be considered next.
Is there already an algorithm that can be used for this task?

You can search for increasing squared distances, so you won't miss a point. This python code should make it clear:
import math
import itertools
# Calculates all points at a certain distance.
# Coordinate constraint: z <= y <= x
def get_points_at_squared_euclidean_distance(d):
result = []
x = int(math.floor(math.sqrt(d)))
while 0 <= x:
y = x
while 0 <= y:
target = d - x*x - y*y
lower = 0
upper = y + 1
while lower < upper:
middle = (lower + upper) / 2
current = middle * middle
if current == target:
result.append((x, y, middle))
break
if current < target:
lower = middle + 1
else:
upper = middle
y -= 1
x -= 1
return result
# Creates all possible reflections of a point
def get_point_reflections(point):
result = set()
for p in itertools.permutations(point):
for n in range(8):
result.add((
p[0] * (1 if n % 8 < 4 else -1),
p[1] * (1 if n % 4 < 2 else -1),
p[2] * (1 if n % 2 < 1 else -1),
))
return sorted(result)
# Enumerates all points around a center, in increasing distance
def get_next_point_near(center):
d = 0
points_at_d = []
while True:
while not points_at_d:
d += 1
points_at_d = get_points_at_squared_euclidean_distance(d)
point = points_at_d.pop()
for reflection in get_point_reflections(point):
yield (
center[0] + reflection[0],
center[1] + reflection[1],
center[2] + reflection[2],
)
# The function you asked for
def get_nearest_point(center, predicate):
for point in get_next_point_near(center):
if predicate(point):
return point
# Example usage
print get_nearest_point((1,2,3), lambda p: sum(p) == 10)
Basically you consume points from the generator until one of them fulfills your predicate.

This is pseudocode for a simple algorithm that will search in increasing-radius spherical husks until it either finds a point or it runs out of array. Let us assume that condition returns either true or false and has access to the x, y, z coordinates being tested and the array itself, returning false (instead of exploding) for out-of-bounds coordinates:
def find_from_center(center, max_radius, condition) returns a point
let radius = 0
while radius < max_radius,
let point = find_in_spherical_husk(center, radius, condition)
if (point != null) return point
radius ++
return null
the hard part is inside find_in_spherical_husk. We are interested in checking out points such that
dist(center, p) >= radius AND dist(center, p) < radius+1
which will be our operating definition of husk. We could iterate over the whole 3D array in O(n^3) looking for those, but that would be really expensive in terms of time. A better pseudocode is the following:
def find_in_spherical_husk(center, radius, condition)
let z = center.z - radius // current slice height
let r = 0 // current circle radius; maxes at equator, then decreases
while z <= center + radius,
let z_center = (z, center.x, point.y)
let point = find_in_z_circle(z_center, r)
if (point != null) return point
// prepare for next z-sliced cirle
z ++
r = sqrt(radius*radius - (z-center.z)*(z-center.z))
the idea here is to slice each husk into circles along the z-axis (any axis will do), and then look at each slice separately. If you were looking at the earth, and the poles were the z axis, you would be slicing from north to south. Finally, you would implement find_in_z_circle(z_center, r, condition) to look at the circumference of each of those circles. You can avoid some math there by using the Bresenham circle-drawing algorithm; but I assume that the savings are negligible compared with the cost of checking condition.

Related

Is there a binary function f(x,y) where x,y are integers and result is 0 or 1, and result 1 in 2d plane are "continous" and "irregular" enough

That is, if f(x,y)=1, then f(x-1,y),f(x+1,y),f(x,y-1),f(x,y+1) at least one results 1.
I'm thinking about a technology to define game map, neither predefined nor random generated each time, but bind to 2d binary function, so the map data will never be saved to disk and each time entering game, the map keeps the same.
If 1 means land and 0 means ocean, I want the lands keep continous, all are reachable, no islands, and of course, the map must be enough irregular.
I'm not good at maths, is it possible? thanks.
What I need is only a simple function, no recursions, eg. once xy is given, the result is out, which has nothing to do with other values, only xy.
Guaranteeing connectivity with local considerations only is a very strong constraint on what we can do. I agree with the comments that suggest traditional map generation from a fixed seed.
Nevertheless, to answer the question as framed, my first thought would be star-shaped land. This idea requires a continuous function f(θ) > 0 with period 2π. We take every point (x, y) such that hypot(x, y) < f(atan2(y, x)).
This works great if x and y are real numbers -- every (x, y) in the land is connected by a straight line segment back to the origin (0, 0), hence "star-shaped". Over the integers, we have to put an extra condition on f: the function log(f(θ)) should be Lipschitz continuous (can't wiggle too much).
(You can skip this paragraph.) Assume without loss of generality that x > 0 and y > 0 are integers. If (x, y) is land, then we need (x-1, y) or (x, y-1) to be land. On one hand, one of these squares is closer, which is good since we're using a threshold: min(hypot(y, x-1), hypot(y-1, x)) <= hypot(y, x) - (sqrt(2) - 1), which is tight for (x, y) = (1, 1). On the other hand, the angle changes. We've deviated from the line segment by distance at most 1/sqrt(2). Let r = hypot(x, y). The change in angle is at most 1/sqrt(2) / (r - 1/sqrt(2)), which since r >= sqrt(2) is at most 1/sqrt(2) / (r - r/2) == sqrt(2) / r. Therefore a Lipschitz constant of (sqrt(2) - 1) / sqrt(2) = 1 - 1/sqrt(2) should suffice (probably this can be tightened).
So far this is very abstract. The classic way to get a periodic function that doesn't wiggle too much is by adding sine waves (with varying phases). I've provided a Python implementation and sample output below. The land is not 100% guaranteed to be connected, but it should be extremely unlikely.
import math
import os
import random
def make_parameters(n=20):
return [
(random.random() / (i + 1), 2 * math.pi * random.random()) for i in range(n)
]
width = 100
def is_land(parameters, x, y):
if (x, y) == (0, 0):
return True
theta = math.atan2(y, x)
return math.hypot(x, y) < 0.1 * width * math.exp(
sum(
amp * math.sin((i + 1) * theta + phase)
for i, (amp, phase) in enumerate(parameters)
)
)
def main():
dir = "lands"
os.mkdir(dir)
for i in range(10):
for j in range(10):
with open(os.path.join(dir, "%02d_%02d.pbm" % (i, j)), "w") as f:
parameters = make_parameters()
x0 = y0 = width // 2
print("P1", file=f)
print(width, width, file=f)
for y in range(width):
print(
*(
int(is_land(parameters, x - x0, y - y0))
for x in range(width)
),
file=f
)
if __name__ == "__main__":
main()

Calculating normal to line towards given side

Given is a line (segment), defined by two vectors start(x,y) and end(x,y). I also have a point p(x,y), which is on either of the two areas separated by the line (i.e. it is not exactly on the line).
How can I calculate the normal to the line that is facing towards the side in which p is?
Let:
A = (a,b) and B = (c,d) define the line segment
P = (p,q) be the other point.
Define:
dot( (p,q), (r,s) ) == p*r + q*s
Then the vector:
v = ( c-a, d-b)
defines the direction along the line segment. Its perpendicular is:
u = (d-b, (-(c-a)) = (d-b,a-c)
This can be seen by taking the dot product with v. To get the normal from the perpendicular, just divide by its length:
n = u /|u|, |u| = sqrt( dot(u,u))
We now just need to know where P lies relative to the normal. If we take:
dir = dot( (P-A), n) )
Then dir > 0 means n is in the same direction as P, whilst dir < 0 means it is in the opposite direction. Should dir == 0, then P is in fact on the extended line (not necessarily the line segment itself).
First, determine which side of the line the point lies on, by taking the cross product of end-start and p-end:
z = (xend-xstart)(yp-yend) - (yend-ystart)(xp-xend)
If z>0, then the point is to the left of the line (as seen by a person standing at start and facing end). If z<0, then the point is to the right of the line.
Second, normalize the line segment:
S = end - start
k = S/|S|
Finally, if the point is to the left of the line, rotate k to the left:
(xk, yk) => (-yk, xk)
or if the point is to the right of the line, rotate k to the right:
(xk, yk) => (yk, -xk)
My math skills are a bit rusty, so I can't give you the exact calculations, but what you do is this (assuming 2D from your description):
First you calculate a normal n.
Then you calculate P' which is the perpendicular projection of your point P onto your line.
Basically, what you do is, you "create" another line and use your vector n from step 1 as the direction (y = p + x * n where y,p and n are vectors, p is actually your p(x,y) and x is a real number), then you intersect this line with the first one and the point where they intersect is P'.
Seeing you're from Austria, everyone else please forgive me for using one German word, I really don't know the English translation and couldn't find any. P' = Lotfußpunkt
Calculate P - P'. If it has the same sign as n in both components, n is the normal you're searching for. If it has the opposite sign, -n is the one you're searching for.
I hope the idea is clear even though I don't know all the technical terms in English.
For
start = (a,b)
end = (c,d)
p = (x,y)
Slope(startend) = (d - b) / (c - a)
Slope(norm) = -(c - a) / (d - b)
Norm line must include p = (x,y), so
ynorm = -((c - a) / (d - b)) * xnorm + (y + ((c - a) / (d - b)) * x)
y = mx + c
is the normal line equation where m is the slope and c is any constant.
You have start and end. Lets call them (x1,y1) and (x2,y2) and the line joining them L1.
The slope of this line, m1, is (y2-y1)/(x2-x1). This line is perpendicular to the line you need which we can call L2 with slope m2. The product of slopes of 2 mutuallu perpendicular lines is -1. Hence,
m1*m2=-1.
Hence you can calculate m2. Now, you need to find the equation of the line L2. You have 1 point in the line P (x,y). You can substitute in this manner:
y=m2*x+c.
This will give you c. Once you have the line equation, you can convert it to parametric form as shown below:
http://thejuniverse.org/PUBLIC/LinearAlgebra/LOLA/lines/index.html
The equation of the line is given as
A = start.y-end.y
B = end.x-start.x
C = start.x*end.y-start.y*end.x
A*x + B*y + C = 0
The minimum distance d to the line of a point p=(px,py) is
d = (A*px+B*py+C)/sqrt(A^2+B^2)
If the value is positive then the point is at a counter clockwise rotation from the vector (start->end). If negative then it is in clockwise rotation. So if (start->end) is pointing up, then a positive distance is to the left of the line.
Example
start = (8.04, -0.18)
end = (6.58, 1.72)
P = (2.82, 0.66)
A = (-0.18)-(1.72) = -1.9
B = (6.58)-(8.04) = -1.46
C = (8.04)*(1.72)-(-0.18)*(6.58) = 15.01
d = (A*(2.82)+B*(0.66)+C)/√(A^2+B^2) = 3.63
The calculation for d shows the identical value as the length of vector (near->P) in the sketch.
N = (Ey - Sy, Sx - Ex) is perpendicular to the line (it is SE rotated by 90°, not normalized).
Then compute the sign of the dot product
N . SP = (Ey - Sy)(Px - Sx) + (Sx - Ex)(Py - Sy),
it will tell you on what side the normal is pointing.

random increasing sequence with O(1) access to any element?

I have an interesting math/CS problem. I need to sample from a possibly infinite random sequence of increasing values, X, with X(i) > X(i-1), with some distribution between them. You could think of this as the sum of a different sequence D of uniform random numbers in [0,d). This is easy to do if you start from the first one and go from there; you just add a random amount to the sum each time. But the catch is, I want to be able to get any element of the sequence in faster than O(n) time, ideally O(1), without storing the whole list. To be concrete, let's say I pick d=1, so one possibility for D (given a particular seed) and its associated X is:
D={.1, .5, .2, .9, .3, .3, .6 ...} // standard random sequence, elements in [0,1)
X={.1, .6, .8, 1.7, 2.0, 2.3, 2.9, ...} // increasing random values; partial sum of D
(I don't really care about D, I'm just showing one conceptual way to construct X, my sequence of interest.) Now I want to be able to compute the value of X[1] or X[1000] or X[1000000] equally fast, without storing all the values of X or D. Can anyone point me to some clever algorithm or a way to think about this?
(Yes, what I'm looking for is random access into a random sequence -- with two different meanings of random. Makes it hard to google for!)
Since D is pseudorandom, there’s a space-time tradeoff possible:
O(sqrt(n))-time retrievals using O(sqrt(n)) storage locations (or,
in general, O(n**alpha)-time retrievals using O(n**(1-alpha))
storage locations). Assume zero-based indexing and that
X[n] = D[0] + D[1] + ... + D[n-1]. Compute and store
Y[s] = X[s**2]
for all s**2 <= n in the range of interest. To look up X[n], let
s = floor(sqrt(n)) and return
Y[s] + D[s**2] + D[s**2+1] + ... + D[n-1].
EDIT: here's the start of an approach based on the following idea.
Let Dist(1) be the uniform distribution on [0, d) and let Dist(k) for k > 1 be the distribution of the sum of k independent samples from Dist(1). We need fast, deterministic methods to (i) pseudorandomly sample Dist(2**p) and (ii) given that X and Y are distributed as Dist(2**p), pseudorandomly sample X conditioned on the outcome of X + Y.
Now imagine that the D array constitutes the leaves of a complete binary tree of size 2**q. The values at interior nodes are the sums of the values at their two children. The naive way is to fill the D array directly, but then it takes a long time to compute the root entry. The way I'm proposing is to sample the root from Dist(2**q). Then, sample one child according to Dist(2**(q-1)) given the root's value. This determines the value of the other, since the sum is fixed. Work recursively down the tree. In this way, we look up tree values in time O(q).
Here's an implementation for Gaussian D. I'm not sure it's working properly.
import hashlib, math
def random_oracle(seed):
h = hashlib.sha512()
h.update(str(seed).encode())
x = 0.0
for b in h.digest():
x = ((x + b) / 256.0)
return x
def sample_gaussian(variance, seed):
u0 = random_oracle((2 * seed))
u1 = random_oracle(((2 * seed) + 1))
return (math.sqrt((((- 2.0) * variance) * math.log((1.0 - u0)))) * math.cos(((2.0 * math.pi) * u1)))
def sample_children(sum_outcome, sum_variance, seed):
difference_outcome = sample_gaussian(sum_variance, seed)
return (((sum_outcome + difference_outcome) / 2.0), ((sum_outcome - difference_outcome) / 2.0))
def sample_X(height, i):
assert (0 <= i <= (2 ** height))
total = 0.0
z = sample_gaussian((2 ** height), 0)
seed = 1
for j in range(height, 0, (- 1)):
(x, y) = sample_children(z, (2 ** j), seed)
assert (abs(((x + y) - z)) <= 1e-09)
seed *= 2
if (i >= (2 ** (j - 1))):
i -= (2 ** (j - 1))
total += x
z = y
seed += 1
else:
z = x
return total
def test(height):
X = [sample_X(height, i) for i in range(((2 ** height) + 1))]
D = [(X[(i + 1)] - X[i]) for i in range((2 ** height))]
mean = (sum(D) / len(D))
variance = (sum((((d - mean) ** 2) for d in D)) / (len(D) - 1))
print(mean, math.sqrt(variance))
D.sort()
with open('data', 'w') as f:
for d in D:
print(d, file=f)
if (__name__ == '__main__'):
test(10)
If you do not record the values in X, and if you do not remember the values in X you have previously generate, there is no way to guarantee that the elements in X you generate (on the fly) will be in increasing order. It furthermore seems like there is no way to avoid O(n) time worst-case per query, if you don't know how to quickly generate the CDF for the sum of the first m random variables in D for any choice of m.
If you want the ith value X(i) from a particular realization, I can't see how you could do this without generating the sequence up to i. Perhaps somebody else can come up with something clever.
Would you be willing to accept a value which is plausible in the sense that it has the same distribution as the X(i)'s you would observe across multiple realizations of the X process? If so, it should be pretty easy. X(i) will be asymptotically normally distributed with mean i/2 (since it's the sum of the Dk's for k=1,...,i, the D's are Uniform(0,1), and the expected value of a D is 1/2) and variance i/12 (since the variance of a D is 1/12 and the variance of a sum of independent random variables is the sum of their variances).
Because of the asymptotic aspect, I'd pick some threshold value for i to switch over from direct summing to using the normal. For example, if you use i = 12 as your threshold you would use actual summing of uniforms for values of i from 1 to 11, and generate a Normal(i/2, sqrt(i/12)) value for i >. That's an O(1) algorithm since the total work is bounded by your threshold, and the results produced will be distributionally representative of what you would see if you actually went through the summing.

Find first root of a black box function, or any negative value of same function

I have a black box function, f(x) and a range of values for x.
I need to find the lowest value of x for which f(x) = 0.
I know that for the start of the range of x, f(x) > 0, and if I had a value for which f(x) < 0 I could use regula falsi, or similar root finding methods, to try determine f(x)=0.
I know f(x) is continuous, and should only have 0,1 or 2 roots for the range in question, but it might have a local minimum.
f(x) is somewhat computationally expensive, and I'll have to find this first root a lot.
I was thinking some kind of hill climbing with a degree of randomness to avoid any local minimums, but then how do you know if there was no minimum less than zero or if you just haven't found it yet? I think the function shouldn't have more than two minimum points, but I can't be absolutely certain of that enough to rely on it.
If it helps, x in this case represents a time, and f(x) represents the distance between a ship and a body in orbit (moon/planet) at that time. I need the first point where they are a certain distance from each other.
My method will sound pretty complicated, but in the end the computation time of the method will be far smaller than the distance calculations (evaluation of your f(x)). Also, there are quite many implementations of it already written up in existing libraries.
So what I would do:
approximate f(x) with a Chebychev polynomial
find the real roots of that polynomial
If any are found, use those roots as initial estimates in a more precise rootfinder (if needed)
Given the nature of your function (smooth, continuous, otherwise well-behaved) and the information that there's 0,1 or 2 roots, a good Chebychev polynomial can already be found with 3 evaluations of f(x).
Then find the eigenvalues of the companion matrix of the Chebychev coefficients; these correspond to the roots of the Chebychev polynomial.
If all are imaginary, there's 0 roots.
If there are some real roots, check if two are equal (that "rare" case you spoke of).
Otherwise, all real eigenvalues are roots; the lowest one of which is the root you seek.
Then use Newton-Raphson to refine (if necessary, or use a better Chebychev polynomial). Derivatives of f can be approximated using central differences
f'(x) = ( f(x+h)-f(h-x) ) /2/h (for small h)
I have an implementation of the Chebychev routines in Matlab/Octave (given below). Use like this:
R = FindRealRoots(#f, x_min, x_max, 5, true,true);
with [x_min,x_max] your range in x, 5 the number of points to use for finding the polynomial (the higher, the more accurate. Equals the amount of function evaluations needed), and the last true will make a plot of the actual function and the Chebychev approximation to it (mainly for testing purposes).
Now, the implementation:
% FINDREALROOTS Find approximations to all real roots of any function
% on an interval [a, b].
%
% USAGE:
% Roots = FindRealRoots(funfcn, a, b, n, vectorized, make_plot)
%
% FINDREALROOTS() approximates all the real roots of the function 'funfcn'
% in the interval [a,b]. It does so by finding the roots of an [n]-th degree
% Chebyshev polynomial approximation, via the eignevalues of the associated
% companion matrix.
%
% When the argument [vectorized] is [true], FINDREALROOTS() will evaluate
% the function 'funfcn' at all [n] required points in the interval
% simultaneously. Otherwise, it will use ARRAFUN() to calculate the [n]
% function values one-by-one. [vectorized] defaults to [false].
%
% When the argument [make_plot] is true, FINDREALROOTS() plots the
% original function and the Chebyshev approximation, and shows any roots on
% the given interval. Also [make_plot] defaults to [false].
%
% All [Roots] (if any) will be sorted.
%
% First version 26th May 2007 by Stephen Morris,
% Nightingale-EOS Ltd., St. Asaph, Wales.
%
% Modified 14/Nov (Rody Oldenhuis)
%
% See also roots, eig.
function Roots = FindRealRoots(funfcn, a, b, n, vectorized, make_plot)
% parse input and initialize.
inarg = nargin;
if n <= 2, n = 3; end % Minimum [n] is 3:
if (inarg < 5), vectorized = false; end % default: function isn't vectorized
if (inarg < 6), make_plot = false; end % default: don't make plot
% some convenient variables
bma = (b-a)/2; bpa = (b+a)/2; Roots = [];
% Obtain the Chebyshev coefficients for the function
%
% Based on the routine given in Numerical Recipes (3rd) section 5.8;
% calculates the Chebyshev coefficients necessary to approximate some
% function over the interval [a,b]
% initialize
c = zeros(1,n); k=(1:n)'; y = cos(pi*((1:n)-1/2)/n);
% evaluate function on Chebychev nodes
if vectorized
f = feval(funfcn,(y*bma)+bpa);
else
f = arrayfun(#(x) feval(funfcn,x),(y*bma)+bpa);
end
% compute the coefficients
for j=1:n, c(j)=(f(:).'*(cos((pi*(j-1))*((k-0.5)/n))))*(2-(j==1))/n; end
% coefficients may be [NaN] if [inf]
% ??? TODO - it is of course possible for c(n) to be zero...
if any(~isfinite(c(:))) || (c(n) == 0), return; end
% Define [A] as the Frobenius-Chebyshev companion matrix. This is based
% on the form given by J.P. Boyd, Appl. Num. Math. 56 pp.1077-1091 (2006).
one = ones(n-3,1);
A = diag([one/2; 0],-1) + diag([1; one/2],+1);
A(end, :) = -c(1:n-1)/2/c(n);
A(end,end-1) = A(end,end-1) + 0.5;
% Now we have the companion matrix, we can find its eigenvalues using the
% MATLAB built-in function. We're only interested in the real elements of
% the matrix:
eigvals = eig(A); realvals = eigvals(imag(eigvals)==0);
% if there aren't any real roots, return
if isempty(realvals), return; end
% Of course these are the roots scaled to the canonical interval [-1,1]. We
% need to map them back onto the interval [a, b]; we widen the interval just
% a tiny bit to make sure that we don't miss any that are right on the
% boundaries.
rangevals = nonzeros(realvals(abs(realvals) <= 1+1e-5));
% also sort the roots
Roots = sort(rangevals*bma + bpa);
% As a sanity check we'll plot out the original function and its Chebyshev
% approximation: if they don't match then we know to call the routine again
% with a larger 'n'.
if make_plot
% simple grid
grid = linspace(a,b, max(25,n));
% evaluate function
if vectorized
fungrid = feval(funfcn, grid);
else
fungrid = arrayfun(#(x) feval(funfcn,x), grid);
end
% corresponding Chebychev-grid (more complicated but essentially the same)
y = (2.*grid-a-b)./(b-a); d = zeros(1,length(grid)); dd = d;
for j = length(c):-1:2, sv=d; d=(2*y.*d)-dd+c(j); dd=sv; end, chebgrid=(y.*d)-dd+c(1);
% Now make plot
figure(1), clf, hold on
plot(grid, fungrid ,'color' , 'r');
line(grid, chebgrid,'color' , 'b');
line(grid, zeros(1,length(grid)), 'linestyle','--')
legend('function', 'interpolation')
end % make plot
end % FindRealRoots
You could use the secant method which is a discrete version of Newton's method.
The root is estimated by calculating the line between two points (= the secant) and its crossing of the X axis.
Your function has only 0, 1 or 2 roots, so it can be done using an algorithm it doesn't ensure the first root.
Find one root using Newton's method or other method. If it can't find any root, this algorithm also give up.
Let the found root is r and beginning of the range of the x is x0. let d = (r-x0)/2.
While d > 0, calculate f(r-d). if f(r-d) > 0, half d (d := d / 2) and loop. if
f(r-d) <= 0, escape the loop.
if loop is finished by d = 0, report r as the first root. if d > 0, find a root between x0 and r-d by using any other method and report it.
I assumed two prerequiesite conditions.
f(x) takes x of floating point numbers
At each point of the roots of f(x), the graph of f(x) crosses to x-axis. They are not touching root like x=0 in f(x)=x^2.
Using condition 2, you can prove that if there is no point such that f(r-d) < 0, ∀ x: x0 < x < r, f(x) > 0.
You could make a small change to the uniroot.all function from the R library rootSolve.
uniroot.all <- function (f, interval, lower= min(interval),
upper= max(interval), tol= .Machine$double.eps^0.2,
maxiter= 1000, n = 100, nroots = -1, ... ) {
## error checking as in uniroot...
if (!missing(interval) && length(interval) != 2)
stop("'interval' must be a vector of length 2")
if (!is.numeric(lower) || !is.numeric(upper) || lower >=
upper)
stop("lower < upper is not fulfilled")
## subdivide interval in n subintervals and estimate the function values
xseq <- seq(lower,upper,len=n+1)
mod <- f(xseq,...)
## some function values may already be 0
Equi <- xseq[which(mod==0)]
ss <- mod[1:n]*mod[2:(n+1)] # interval where functionvalues change sign
ii <- which(ss<0)
for (i in ii) {
Equi <- c(Equi, uniroot(f, lower = xseq[i], upper = xseq[i+1] ,...)$root)
if (length(Equi) == nroots) {
return(Equi)
}
}
return(Equi)
}
And run it like this:
uniroot.all(f = your_function, interval = c(start, stop), nroots = 1)

Randomly Generate a set of numbers of n length totaling x

I'm working on a project for fun and I need an algorithm to do as follows:
Generate a list of numbers of Length n which add up to x
I would settle for list of integers, but ideally, I would like to be left with a set of floating point numbers.
I would be very surprised if this problem wasn't heavily studied, but I'm not sure what to look for.
I've tackled similar problems in the past, but this one is decidedly different in nature. Before I've generated different combinations of a list of numbers that will add up to x. I'm sure that I could simply bruteforce this problem but that hardly seems like the ideal solution.
Anyone have any idea what this may be called, or how to approach it? Thanks all!
Edit: To clarify, I mean that the list should be length N while the numbers themselves can be of any size.
edit2: Sorry for my improper use of 'set', I was using it as a catch all term for a list or an array. I understand that it was causing confusion, my apologies.
This is how to do it in Python
import random
def random_values_with_prescribed_sum(n, total):
x = [random.random() for i in range(n)]
k = total / sum(x)
return [v * k for v in x]
Basically you pick n random numbers, compute their sum and compute a scale factor so that the sum will be what you want it to be.
Note that this approach will not produce "uniform" slices, i.e. the distribution you will get will tend to be more "egalitarian" than it should be if it was picked at random among all distribution with the given sum.
To see the reason you can just picture what the algorithm does in the case of two numbers with a prescribed sum (e.g. 1):
The point P is a generic point obtained by picking two random numbers and it will be uniform inside the square [0,1]x[0,1]. The point Q is the point obtained by scaling P so that the sum is required to be 1. As it's clear from the picture the points close to the center of the have an higher probability; for example the exact center of the squares will be found by projecting any point on the diagonal (0,0)-(1,1), while the point (0, 1) will be found projecting only points from (0,0)-(0,1)... the diagonal length is sqrt(2)=1.4142... while the square side is only 1.0.
Actually, you need to generate a partition of x into n parts. This is usually done the in following way: The partition of x into n non-negative parts can be represented in the following way: reserve n + x free places, put n borders to some arbitrary places, and stones to the rest. The stone groups add up to x, thus the number of possible partitions is the binomial coefficient (n + x \atop n).
So your algorithm could be as follows: choose an arbitrary n-subset of (n + x)-set, it determines uniquely a partition of x into n parts.
In Knuth's TAOCP the chapter 3.4.2 discusses random sampling. See Algortihm S there.
Algorithm S: (choose n arbitrary records from total of N)
t = 0, m = 0;
u = random, uniformly distributed on (0, 1)
if (N - t)*u >= n - m, skip t-th record and increase t by 1; otherwise include t-th record in the sample, increase m and t by 1
if M < n, return to 2, otherwise, algorithm finished
The solution for non-integers is algorithmically trivial: you just select arbitrary n numbers that don't sum up to 0, and norm them by their sum.
If you want to sample uniformly in the region of N-1-dimensional space defined by x1 + x2 + ... + xN = x, then you're looking at a special case of sampling from a Dirichlet distribution. The sampling procedure is a little more involved than generating uniform deviates for the xi. Here's one way to do it, in Python:
xs = [random.gammavariate(1,1) for a in range(N)]
xs = [x*v/sum(xs) for v in xs]
If you don't care too much about the sampling properties of your results, you can just generate uniform deviates and correct their sum afterwards.
Here is a version of the above algorithm in Javascript
function getRandomArbitrary(min, max) {
return Math.random() * (max - min) + min;
};
function getRandomArray(min, max, n) {
var arr = [];
for (var i = 0, l = n; i < l; i++) {
arr.push(getRandomArbitrary(min, max))
};
return arr;
};
function randomValuesPrescribedSum(min, max, n, total) {
var arr = getRandomArray(min, max, n);
var sum = arr.reduce(function(pv, cv) { return pv + cv; }, 0);
var k = total/sum;
var delays = arr.map(function(x) { return k*x; })
return delays;
};
You can call it with
var myarray = randomValuesPrescribedSum(0,1,3,3);
And then check it with
var sum = myarray.reduce(function(pv, cv) { return pv + cv;},0);
This code does a reasonable job. I think it produces a different distribution than 6502's answer, but I am not sure which is better or more natural. Certainly his code is clearer/nicer.
import random
def parts(total_sum, num_parts):
points = [random.random() for i in range(num_parts-1)]
points.append(0)
points.append(1)
points.sort()
ret = []
for i in range(1, len(points)):
ret.append((points[i] - points[i-1]) * total_sum)
return ret
def test(total_sum, num_parts):
ans = parts(total_sum, num_parts)
assert abs(sum(ans) - total_sum) < 1e-7
print ans
test(5.5, 3)
test(10, 1)
test(10, 5)
In python:
a: create a list of (random #'s 0 to 1) times total; append 0 and total to the list
b: sort the list, measure the distance between each element
c: round the list elements
import random
import time
TOTAL = 15
PARTS = 4
PLACES = 3
def random_sum_split(parts, total, places):
a = [0, total] + [random.random()*total for i in range(parts-1)]
a.sort()
b = [(a[i] - a[i-1]) for i in range(1, (parts+1))]
if places == None:
return b
else:
b.pop()
c = [round(x, places) for x in b]
c.append(round(total-sum(c), places))
return c
def tick():
if info.tick == 1:
start = time.time()
alpha = random_sum_split(PARTS, TOTAL, PLACES)
end = time.time()
log('alpha: %s' % alpha)
log('total: %.7f' % sum(alpha))
log('parts: %s' % PARTS)
log('places: %s' % PLACES)
log('elapsed: %.7f' % (end-start))
yields:
[2014-06-13 01:00:00] alpha: [0.154, 3.617, 6.075, 5.154]
[2014-06-13 01:00:00] total: 15.0000000
[2014-06-13 01:00:00] parts: 4
[2014-06-13 01:00:00] places: 3
[2014-06-13 01:00:00] elapsed: 0.0005839
to the best of my knowledge this distribution is uniform

Resources