Computing mid in Interpolation Search? - algorithm

I came to know that Interpolation Search is a modification of Binary Search where in binary search the input is divided into two equal halves in each iteration by computing
mid = (low + high) / 2
and in Interpolation search the mid is computed as
mid = low + (key - arr[low]) * ((high - low) / (arr[high] - arr[low]))
Now I need to understand this formula of calculating mid in interpolation search.
Ref: https://en.wikipedia.org/wiki/Interpolation_search#Sample_implementation

You can think of array arr as a function f that acts on index and return a value, which is monotone (because array is sorted). So you have your initial data f(low) = m and f(high) = M. Now you can interpolate your function f with a straight line, which is quite reasonable to do because your f is monotone and you have only 2 points.
So your interpolation should be line (linear function) that pass throw points (low, m) and (high, M). This is it's equation
(y - f(low))/(f(high) - f(low)) = (x - low)/(high - low)
So y here is the element of search space and x is from domain (x is index of array). So if your function f would be the same as it's interpolation, then index of your key would be:
x = low + (high - low)*(key - f(low))/(f(high) - f(low))
So, assuming that your function f is close to it's interpolation, you should just check the value f at x to see if it is the goal. Otherwise you just shrink your interpolation interval.

An extention to #JustAnotherCurious answer
the equation perposed by him was based on Interpolation Formula (equ of a line)
Now, thinkf() as function that takes an index and return its y axis value,
y1 = f(low)
y2 = f(high)
x1 = low
x2 = high
(y - f(low)) = [(f(high) - f(low)) / (high - low)] * (x - low);
OR
x = low + [(high - low) * (y - f(low))] / (f(high) - f(low))
here: y = key
we are looking for position(value) of x.

Related

Writing a vector sum in MATLAB

Suppose I have a function phi(x1,x2)=k1*x1+k2*x2 which I have evaluated over a grid where the grid is a square having boundaries at -100 and 100 in both x1 and x2 axis with some step size say h=0.1. Now I want to calculate this sum over the grid with which I'm struggling:
What I was trying :
clear all
close all
clc
D=1; h=0.1;
D1 = -100;
D2 = 100;
X = D1 : h : D2;
Y = D1 : h : D2;
[x1, x2] = meshgrid(X, Y);
k1=2;k2=2;
phi = k1.*x1 + k2.*x2;
figure(1)
surf(X,Y,phi)
m1=-500:500;
m2=-500:500;
[M1,M2,X1,X2]=ndgrid(m1,m2,X,Y)
sys=#(m1,m2,X,Y) (k1*h*m1+k2*h*m2).*exp((-([X Y]-h*[m1 m2]).^2)./(h^2*D))
sum1=sum(sys(M1,M2,X1,X2))
Matlab says error in ndgrid, any idea how I should code this?
MATLAB shows:
Error using repmat
Requested 10001x1001x2001x2001 (298649.5GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive. See array size limit or preference
panel for more information.
Error in ndgrid (line 72)
varargout{i} = repmat(x,s);
Error in new_try1 (line 16)
[M1,M2,X1,X2]=ndgrid(m1,m2,X,Y)
Judging by your comments and your code, it appears as though you don't fully understand what the equation is asking you to compute.
To obtain the value M(x1,x2) at some given (x1,x2), you have to compute that sum over Z2. Of course, using a numerical toolbox such as MATLAB, you could only ever hope to compute over some finite range of Z2. In this case, since (x1,x2) covers the range [-100,100] x [-100,100], and h=0.1, it follows that mh covers the range [-1000, 1000] x [-1000, 1000]. Example: m = (-1000, -1000) gives you mh = (-100, -100), which is the bottom-left corner of your domain. So really, phi(mh) is just phi(x1,x2) evaluated on all of your discretised points.
As an aside, since you need to compute |x-hm|^2, you can treat x = x1 + i x2 as a complex number to make use of MATLAB's abs function. If you were strictly working with vectors, you would have to use norm, which is OK too, but a bit more verbose. Thus, for some given x=(x10, x20), you would compute x-hm over the entire discretised plane as (x10 - x1) + i (x20 - x2).
Finally, you can compute 1 term of M at a time:
D=1; h=0.1;
D1 = -100;
D2 = 100;
X = (D1 : h : D2); % X is in rows (dim 2)
Y = (D1 : h : D2)'; % Y is in columns (dim 1)
k1=2;k2=2;
phi = k1*X + k2*Y;
M = zeros(length(Y), length(X));
for j = 1:length(X)
for i = 1:length(Y)
% treat (x - hm) as a complex number
x_hm = (X(j)-X) + 1i*(Y(i)-Y); % this computes x-hm for all m
M(i,j) = 1/(pi*D) * sum(sum(phi .* exp(-abs(x_hm).^2/(h^2*D)), 1), 2);
end
end
By the way, this computation takes quite a long time. You can consider either increasing h, reducing D1 and D2, or changing all three of them.

Searching a 3D array for closest point satisfying a certain predicate

I'm looking for an enumeration algorithm to search through a 3D array "sphering" around a given starting point.
Given an array a of size NxNxN where each N is 2^k for some k, and a point p in that array. The algorithm I'm looking for should do the following: If a[p] satisfies a certain predicate, the algorithm stops and p is returned. Otherwise the next point q is checked, where q is another point in the array that is the closest to p and hasn't been visited yet. If that doesn't match either, the next q'is checked an so on until in the worst case the whole array has been searched.
By "closest" here the perfect solution would be the point q that has the smallest Euclidean distance to p. As only discrete points have to be considered, perhaps some clever enumeration algorithm woukd make that possible. However, if this gets too complicated, the smallest Manhattan distance would be fine too. If there are several nearest points, it doesn't matter which one should be considered next.
Is there already an algorithm that can be used for this task?
You can search for increasing squared distances, so you won't miss a point. This python code should make it clear:
import math
import itertools
# Calculates all points at a certain distance.
# Coordinate constraint: z <= y <= x
def get_points_at_squared_euclidean_distance(d):
result = []
x = int(math.floor(math.sqrt(d)))
while 0 <= x:
y = x
while 0 <= y:
target = d - x*x - y*y
lower = 0
upper = y + 1
while lower < upper:
middle = (lower + upper) / 2
current = middle * middle
if current == target:
result.append((x, y, middle))
break
if current < target:
lower = middle + 1
else:
upper = middle
y -= 1
x -= 1
return result
# Creates all possible reflections of a point
def get_point_reflections(point):
result = set()
for p in itertools.permutations(point):
for n in range(8):
result.add((
p[0] * (1 if n % 8 < 4 else -1),
p[1] * (1 if n % 4 < 2 else -1),
p[2] * (1 if n % 2 < 1 else -1),
))
return sorted(result)
# Enumerates all points around a center, in increasing distance
def get_next_point_near(center):
d = 0
points_at_d = []
while True:
while not points_at_d:
d += 1
points_at_d = get_points_at_squared_euclidean_distance(d)
point = points_at_d.pop()
for reflection in get_point_reflections(point):
yield (
center[0] + reflection[0],
center[1] + reflection[1],
center[2] + reflection[2],
)
# The function you asked for
def get_nearest_point(center, predicate):
for point in get_next_point_near(center):
if predicate(point):
return point
# Example usage
print get_nearest_point((1,2,3), lambda p: sum(p) == 10)
Basically you consume points from the generator until one of them fulfills your predicate.
This is pseudocode for a simple algorithm that will search in increasing-radius spherical husks until it either finds a point or it runs out of array. Let us assume that condition returns either true or false and has access to the x, y, z coordinates being tested and the array itself, returning false (instead of exploding) for out-of-bounds coordinates:
def find_from_center(center, max_radius, condition) returns a point
let radius = 0
while radius < max_radius,
let point = find_in_spherical_husk(center, radius, condition)
if (point != null) return point
radius ++
return null
the hard part is inside find_in_spherical_husk. We are interested in checking out points such that
dist(center, p) >= radius AND dist(center, p) < radius+1
which will be our operating definition of husk. We could iterate over the whole 3D array in O(n^3) looking for those, but that would be really expensive in terms of time. A better pseudocode is the following:
def find_in_spherical_husk(center, radius, condition)
let z = center.z - radius // current slice height
let r = 0 // current circle radius; maxes at equator, then decreases
while z <= center + radius,
let z_center = (z, center.x, point.y)
let point = find_in_z_circle(z_center, r)
if (point != null) return point
// prepare for next z-sliced cirle
z ++
r = sqrt(radius*radius - (z-center.z)*(z-center.z))
the idea here is to slice each husk into circles along the z-axis (any axis will do), and then look at each slice separately. If you were looking at the earth, and the poles were the z axis, you would be slicing from north to south. Finally, you would implement find_in_z_circle(z_center, r, condition) to look at the circumference of each of those circles. You can avoid some math there by using the Bresenham circle-drawing algorithm; but I assume that the savings are negligible compared with the cost of checking condition.

Formula for calculating distance with decaying velocity

I have a moving graphic whose velocity decays geometrically every frame. I want to find the initial velocity that will make the graphic travel a desired distance in a given number of frames.
Using these variables:
v initial velocity
r rate
d distance
I can come up with d = v * (r0 + r1 + r2 + ...)
So if I want to find the v to travel 200 pixels in 3 frames with a decay rate of 90%, I would adapt to:
d = 200
r = .9
v = d / (r0 + r1 + r2)
That doesn't translate well to code, since I have to edit the expression if the number of frames changes. The only solution I can think of is this (in no specific language):
r = .9
numFrames = 3
d = 200
sum = 1
for (i = 1; i < numFrames; i++) {
sum = sum + power(r, i);
}
v = d / sum;
Is there a better way to do this without using a loop?
(I wouldn't be surprised if there is a mistake in there somewhere... today is just one of those days..)
What you have here is a geometric sequence. See the link:
http://www.mathsisfun.com/algebra/sequences-sums-geometric.html
To find the sum of a geometric sequence, you use this formula:
sum = a * ((1 - r^n) / (1 - r))
Since you are looking for a, the initial velocity, move the terms around:
a = sum * ((1-r) / (1 - r^n))
In Java:
int distanceInPixels = SOME_INTEGER;
int decayRate = SOME_DECIMAl;
int numberOfFrames = SOME_INTEGER;
int initialVelocity; //this is what we need to find
initialVelocity = distanceinPixel * ((1-decayRate) / (1-Math.pow(decayRate, NumberOfFrames)));
Using this formula you can get any one of the four variables if you know the values of the other three. Enjoy!
According to http://mikestoolbox.com/powersum.html, you should be able to reduce your for loop to:
F(x) = (x^n - 1)/(x-1)

random increasing sequence with O(1) access to any element?

I have an interesting math/CS problem. I need to sample from a possibly infinite random sequence of increasing values, X, with X(i) > X(i-1), with some distribution between them. You could think of this as the sum of a different sequence D of uniform random numbers in [0,d). This is easy to do if you start from the first one and go from there; you just add a random amount to the sum each time. But the catch is, I want to be able to get any element of the sequence in faster than O(n) time, ideally O(1), without storing the whole list. To be concrete, let's say I pick d=1, so one possibility for D (given a particular seed) and its associated X is:
D={.1, .5, .2, .9, .3, .3, .6 ...} // standard random sequence, elements in [0,1)
X={.1, .6, .8, 1.7, 2.0, 2.3, 2.9, ...} // increasing random values; partial sum of D
(I don't really care about D, I'm just showing one conceptual way to construct X, my sequence of interest.) Now I want to be able to compute the value of X[1] or X[1000] or X[1000000] equally fast, without storing all the values of X or D. Can anyone point me to some clever algorithm or a way to think about this?
(Yes, what I'm looking for is random access into a random sequence -- with two different meanings of random. Makes it hard to google for!)
Since D is pseudorandom, there’s a space-time tradeoff possible:
O(sqrt(n))-time retrievals using O(sqrt(n)) storage locations (or,
in general, O(n**alpha)-time retrievals using O(n**(1-alpha))
storage locations). Assume zero-based indexing and that
X[n] = D[0] + D[1] + ... + D[n-1]. Compute and store
Y[s] = X[s**2]
for all s**2 <= n in the range of interest. To look up X[n], let
s = floor(sqrt(n)) and return
Y[s] + D[s**2] + D[s**2+1] + ... + D[n-1].
EDIT: here's the start of an approach based on the following idea.
Let Dist(1) be the uniform distribution on [0, d) and let Dist(k) for k > 1 be the distribution of the sum of k independent samples from Dist(1). We need fast, deterministic methods to (i) pseudorandomly sample Dist(2**p) and (ii) given that X and Y are distributed as Dist(2**p), pseudorandomly sample X conditioned on the outcome of X + Y.
Now imagine that the D array constitutes the leaves of a complete binary tree of size 2**q. The values at interior nodes are the sums of the values at their two children. The naive way is to fill the D array directly, but then it takes a long time to compute the root entry. The way I'm proposing is to sample the root from Dist(2**q). Then, sample one child according to Dist(2**(q-1)) given the root's value. This determines the value of the other, since the sum is fixed. Work recursively down the tree. In this way, we look up tree values in time O(q).
Here's an implementation for Gaussian D. I'm not sure it's working properly.
import hashlib, math
def random_oracle(seed):
h = hashlib.sha512()
h.update(str(seed).encode())
x = 0.0
for b in h.digest():
x = ((x + b) / 256.0)
return x
def sample_gaussian(variance, seed):
u0 = random_oracle((2 * seed))
u1 = random_oracle(((2 * seed) + 1))
return (math.sqrt((((- 2.0) * variance) * math.log((1.0 - u0)))) * math.cos(((2.0 * math.pi) * u1)))
def sample_children(sum_outcome, sum_variance, seed):
difference_outcome = sample_gaussian(sum_variance, seed)
return (((sum_outcome + difference_outcome) / 2.0), ((sum_outcome - difference_outcome) / 2.0))
def sample_X(height, i):
assert (0 <= i <= (2 ** height))
total = 0.0
z = sample_gaussian((2 ** height), 0)
seed = 1
for j in range(height, 0, (- 1)):
(x, y) = sample_children(z, (2 ** j), seed)
assert (abs(((x + y) - z)) <= 1e-09)
seed *= 2
if (i >= (2 ** (j - 1))):
i -= (2 ** (j - 1))
total += x
z = y
seed += 1
else:
z = x
return total
def test(height):
X = [sample_X(height, i) for i in range(((2 ** height) + 1))]
D = [(X[(i + 1)] - X[i]) for i in range((2 ** height))]
mean = (sum(D) / len(D))
variance = (sum((((d - mean) ** 2) for d in D)) / (len(D) - 1))
print(mean, math.sqrt(variance))
D.sort()
with open('data', 'w') as f:
for d in D:
print(d, file=f)
if (__name__ == '__main__'):
test(10)
If you do not record the values in X, and if you do not remember the values in X you have previously generate, there is no way to guarantee that the elements in X you generate (on the fly) will be in increasing order. It furthermore seems like there is no way to avoid O(n) time worst-case per query, if you don't know how to quickly generate the CDF for the sum of the first m random variables in D for any choice of m.
If you want the ith value X(i) from a particular realization, I can't see how you could do this without generating the sequence up to i. Perhaps somebody else can come up with something clever.
Would you be willing to accept a value which is plausible in the sense that it has the same distribution as the X(i)'s you would observe across multiple realizations of the X process? If so, it should be pretty easy. X(i) will be asymptotically normally distributed with mean i/2 (since it's the sum of the Dk's for k=1,...,i, the D's are Uniform(0,1), and the expected value of a D is 1/2) and variance i/12 (since the variance of a D is 1/12 and the variance of a sum of independent random variables is the sum of their variances).
Because of the asymptotic aspect, I'd pick some threshold value for i to switch over from direct summing to using the normal. For example, if you use i = 12 as your threshold you would use actual summing of uniforms for values of i from 1 to 11, and generate a Normal(i/2, sqrt(i/12)) value for i >. That's an O(1) algorithm since the total work is bounded by your threshold, and the results produced will be distributionally representative of what you would see if you actually went through the summing.

shoot projectile (straight trajectory) at moving target in 3 dimensions

I already googled for the problem but only found either 2D solutions or formulas that didn't work for me (found this formula that looks nice: http://www.ogre3d.org/forums/viewtopic.php?f=10&t=55796 but seems not to be correct).
I have given:
Vec3 cannonPos;
Vec3 targetPos;
Vec3 targetVelocityVec;
float bulletSpeed;
what i'm looking for is time t such that
targetPos+t*targetVelocityVec
is the intersectionpoint where to aim the cannon to and shoot.
I'm looking for a simple, inexpensive formula for t (by simple i just mean not making many unnecessary vectorspace transformations and the like)
thanks!
The real problem is finding out where in space that the bullet can intersect the targets path. The bullet speed is constant, so in a certain amount of time it will travel the same distance regardless of the direction in which we fire it. This means that it's position after time t will always lie on a sphere. Here's an ugly illustration in 2d:
This sphere can be expressed mathematically as:
(x-x_b0)^2 + (y-y_b0)^2 + (z-z_b0)^2 = (bulletSpeed * t)^2 (eq 1)
x_b0, y_b0 and z_b0 denote the position of the cannon. You can find the time t by solving this equation for t using the equation provided in your question:
targetPos+t*targetVelocityVec (eq 2)
(eq 2) is a vector equation and can be decomposed into three separate equations:
x = x_t0 + t * v_x
y = y_t0 + t * v_y
z = z_t0 + t * v_z
These three equations can be inserted into (eq 1):
(x_t0 + t * v_x - x_b0)^2 + (y_t0 + t * v_y - y_b0)^2 + (z_t0 + t * v_z - z_b0)^2 = (bulletSpeed * t)^2
This equation contains only known variables and can be solved for t. By assigning the constant part of the quadratic subexpressions to constants we can simplify the calculation:
c_1 = x_t0 - x_b0
c_2 = y_t0 - y_b0
c_3 = z_t0 - z_b0
(v_b = bulletSpeed)
(t * v_x + c_1)^2 + (t * v_y + c_2)^2 + (t * v_z + c_3)^2 = (v_b * t)^2
Rearrange it as a standard quadratic equation:
(v_x^2+v_y^2+v_z^2-v_b^2)t^2 + 2*(v_x*c_1+v_y*c_2+v_z*c_3)t + (c_1^2+c_2^2+c_3^2) = 0
This is easily solvable using the standard formula. It can result in zero, one or two solutions. Zero solutions (not counting complex solutions) means that there's no possible way for the bullet to reach the target. One solution will probably happen very rarely, when the target trajectory intersects with the very edge of the sphere. Two solutions will be the most common scenario. A negative solution means that you can't hit the target, since you would need to fire the bullet into the past. These are all conditions you'll have to check for.
When you've solved the equation you can find the position of t by putting it back into (eq 2). In pseudo code:
# setup all needed variables
c_1 = x_t0 - x_b0
c_2 = y_t0 - y_b0
c_3 = z_t0 - z_b0
v_b = bulletSpeed
# ... and so on
a = v_x^2+v_y^2+v_z^2-v_b^2
b = 2*(v_x*c_1+v_y*c_2+v_z*c_3)
c = c_1^2+c_2^2+c_3^2
if b^2 < 4*a*c:
# no real solutions
raise error
p = -b/(2*a)
q = sqrt(b^2 - 4*a*c)/(2*a)
t1 = p-q
t2 = p+q
if t1 < 0 and t2 < 0:
# no positive solutions, all possible trajectories are in the past
raise error
# we want to hit it at the earliest possible time
if t1 > t2: t = t2
else: t = t1
# calculate point of collision
x = x_t0 + t * v_x
y = y_t0 + t * v_y
z = z_t0 + t * v_z

Resources