Numerically stable algorithm for online updating vector sum - numerical-stability

Given a vector v, I want to keep track of the sum of its elements in a variable sum_v. Each element i of the vector v is a dot product of a weight vector w_i with other vectors d_i. So, every time d_i changes, so does v. I have been updating sum_v by changing it according to the change in v_i whenever d_i changes. Unfortunately, small numerical instabilities quickly add up.
What efficient techniques can I use to prevent this?
Edit: Right now, my algorithm takes constant time to update sum_v whenever d_i changes. I'd like to stay below log(n) where n is the length of v.

One solution is to build a complete binary tree such that the leaves each represent an element of v_i, and the parents represent the sums of their children. Changing an element of v requires logarithmic time to filter the change to sum_v, but the result is numerically stable with respect to cancelling deltas, although not to cancelling neighbouring elements of v.
It is an interesting problem to find a way to keep it numerically stable to both problems.

Related

least cost path, destination unknown

Question
How would one going about finding a least cost path when the destination is unknown, but the number of edges traversed is a fixed value? Is there a specific name for this problem, or for an algorithm to solve it?
Note that maybe the term "walk" is more appropriate than "path", I'm not sure.
Explanation
Say you have a weighted graph, and you start at vertex V1. The goal is to find a path of length N (where N is the number of edges traversed, can cross the same edge multiple times, can revisit vertices) that has the smallest cost. This process would need to be repeated for all possible starting vertices.
As an additional heuristic, consider a turn-based game where there are rooms connected by corridors. Each corridor has a cost associated with it, and your final score is lowered by an amount equal to each cost 'paid'. It takes 1 turn to traverse a corridor, and the game lasts 10 turns. You can stay in a room (self-loop), but staying put has a cost associated with it too. If you know the cost of all corridors (and for staying put in each room; i.e., you know the weighted graph), what is the optimal (highest-scoring) path to take for a 10-turn (or N-turn) game? You can revisit rooms and corridors.
Possible Approach (likely to fail)
I was originally thinking of using Dijkstra's algorithm to find least cost path between all pairs of vertices, and then for each starting vertex subset the LCP's of length N. However, I realized that this might not give the LCP of length N for a given starting vertex. For example, Dijkstra's LCP between V1 and V2 might have length < N, and Dijkstra's might have excluded an unnecessary but low-cost edge, which, if included, would have made the path length equal N.
It's an interesting fact that if A is the adjacency matrix and you compute Ak using addition and min in place of the usual multiply and sum used in normal matrix multiplication, then Ak[i,j] is the length of the shortest path from node i to node j with exactly k edges. Now the trick is to use repeated squaring so that Ak needs only log k matrix multiply ops.
If you need the path in addition to the minimum length, you must track where the result of each min operation came from.
For your purposes, you want the location of the min of each row of the result matrix and corresponding path.
This is a good algorithm if the graph is dense. If it's sparse, then doing one bread-first search per node to depth k will be faster.

Efficiently checking which of a large collection of nodes are close together?

I'm currently interested in generating random geometric graphs. For my particular problem, we randomly place node v in the unit square, and add an edge from v to node u if they have Euclidean distance <= D, where D=D(u,n) varies with u and the number of nodes n in the graph.
Important points:
It is costly to compute D, so I'd like to minimize the number of calls to this function.
The vast majority of the time, when v is added, edges uv will be added to only a small number of nodes u (usually 0 or 1).
Question: What is an efficient method for checking which vertices u are "close enough" to v?
The brute force algorithm is to compute and compare dist(v,u) and D(u,n) for all extant nodes u. This requires O(n2) calls to D.
I feel we should be able to do much better than this. Perhaps some kind of binning would work. We could divide the space up into bins, then for each vertex u, we store a list of bins where a newly placed vertex v could result in the edge uv. If v ends up placed outside of u's list of bins (which should happen most of the time), then it's too far away, and we don't need to compute D. This is somewhat of a off-the-top-of-my-head suggestion, and I don't know if it'd work well (e.g., there would be overhead in computing sufficiently close bins, which might be too costly), so I'm after feedback.
Based on your description of the problem, I would choose an R-tree as your data structure.
It allows for very fast searching by narrowing the set of vertices you need to run D against drastically. However, in the worst-case insertion, O(n) time is required. Thankfully, you're quite unlikely to hit the worst-case insertion with a typical data set.
I would probably just use a binning approach.
Say we cut the unit square in m x m subsquares (each having side length 1/m of course). Since you place your vertices uniformly at random (or so I assumed), every square will contain n / m^2 vertices on average.
Depending on A1, A2, m and n, you can probably determine the maximum radius you need to check. Say that's less than m. Then, after inserting v, you would need to check the square in which it landed, plus all adjacent squares. Anyway, this is a constant number of squares, so for every insertion you'll need to check O(n / m^2) other vertices on average.
I don't know the best value for m (as said, that depends on A1 and A2), but say it would be sqrt(n), then your entire algorithm could run in O(n) expected time.
EDIT
A small addition: you could keep track of vertices with many neighbors (so with high radius, which extends over multiple squares) and check them for every inserted vertex. There should only be few, so that's no problem.

Finding kth largest path sum in a Binary Search Tree witout additional storage

Is there any way to find the kth largest of all possible full path sums in a binary search tree without using any other storage, like array? Initially I thought that if I go on finding the sum of paths from the right while increasing a pointer, with each new sum being the minimum as that and the previous sum (initially sum is infinite), breaking out when the count reaches k. However I just discovered that while the maximum leaf values be naturally sorted from right to left, the sums need not be so. So this method won't work. How can I do this?
If you can calculate the k-smallest then you can calculate the k-largest with the same algorithm. All you need to do is do things right-to-left instead of left-to-right.

Algorithm to find special point k in O(n log n) time

Give an n log n time lower bound for an algorithm to check if a set of points has a special point k.
k is defined as:
for a set A of points, if for every point m in A, there is a point q in A such that k is in the middle of the line segment mq such a k does not have to belong to A.
For example, this set has a special point k = (0.5, 0.5) for a set of four points (1,0), (0,1), (1,1), (0,0).
I was totally poker faced when they asked me this, nothing came to my mind. I guess it needs some strong geometrical background.
O(nlogn) solution (I'm still not clear why you're looking for a lower bound solution. You might as well just do an exhaustive check, and then just run an nlogn loop to make sure of the lower bound. Not very difficult. I think you must mean upper bound):
Find the only valid candidate point by averaging all the points. I.e. summing their co-ordinates and dividing by the number of points. If such a k exists, this is it. If no such k exists, we'll find that the point found is invalid in the final step.
Create a new array (set) of points, where we shift our axes so they centre on the point k. I.e. if k = (xk,yk), a point (x,y) will become (x-xk, y-yk). Sort the points according to the ratio x/y and the norm sqrt(x2+y2). As the next step shows, it doesn't matter how this sort is done, i.e. which is the main criterion and which the secondary.
We could search for each point's complement, or better, simply traverse the array and verify that every two adjacent points are indeed complements. I.e. if this is a solution then every two complementary points in this new array are of the form (x,y) and (-x,-y) since we re-centered our axes, which means they have the same ratio ("gradient") and norm, and after the sort, must be adjacent.
If k is not valid, then the there is a point who we will arrive at in this traversal, and find that it's neighbour is not of the right/complementary form ==> there is no such k.
Time =
O(n) for finding the candidate k +
O(n) for building the new array, since each new point can be calculated in O(1) +
O(nlogn) for the sort +
O(n) for the verifying traversal
= O(nlogn)
I'd say you just compute the center of mass (having removed duplicates first) and check if it is your k. Probably the only thing that cause it to be O(n log n) would be searching for a point at specified location.

Finding subset of disjunctive intervals with maximal weights

I am looking for an algorithm I could use to solve this, not the code. I wondered about using linear programming with relaxation, but maybe there are more efficient ways for solving this?
The problem
I have set of intervals with weights. Intervals can overlap. I need to find maximal sum of weights of disjunctive intervals subset.
Example
Intervals with weights :
|--3--| |---1-----|
|----2--| |----5----|
Answer: 8
I have an exact O(nlog n) DP algorithm in mind. Since this is homework, here is a clue:
Sort the intervals by right edge position as Saeed suggests, then number them up from 1. Define f(i) to be the highest weight attainable by using only intervals that do not extend to the right of interval i's right edge.
EDIT: Clue 2: Calculate each f(i) in increasing order of i. Keep in mind that each interval will either be present or absent. To calculate the score for the "present" case, you'll need to hunt for the "rightmost" interval that is compatible with interval i, which will require a binary search through the solutions you've already computed.
That was a biggie, not sure I can give more clues without totally spelling it out ;)
If there is no weight it's easy you can use greedy algorithm by sorting the intervals by the end time of them, and in each step get the smallest possible end time interval.
but in your case I think It's NPC (should think about it), but you can use similar greedy algorithm by Value each interval by Weigth/Length, and each time get one of a possible intervals in sorted format, Also you can use simulated annealing, means each time you will get best answer by above value with probability P (p is near to 1) or select another interval with probability 1-P. you can do it in while loop for n times to find a good answer.
Here's an idea:
Consider the following graph: Create a node for each interval. If interval I1 and interval I2 do not overlap and I1 comes before I2, add a directed edge from node I1 to node I2. Note this graph is acyclic. Each node has a cost equal to the length of the corresponding interval.
Now, the idea is to find the longest path in this graph, which can be found in polynomial time for acyclic graphs (using dynamic programming, for example). The problem is that the costs are in the nodes, not in the edges. Here is a trick: split each node v into v' and v''. All edges entering v will now enter v' and all edges leaving v will now leave v''. Then, add an edge from v' to v'' with the node's cost, in this case, the length of the interval. All the other edges will have cost 0.
Well, if I'm not mistaken the longest path in this graph will correspond to the set of disjoint intervals with maximum sum.
You could formulate this problem as a general IP (integer programming) problem with binary variables indicating whether an interval is selected or not. The objective function will then be a weighted linear combination of the variables. You would then need appropriate constraints to enforce disjunctiveness amongst the intervals...That should suffice given the homework tag.
Also, just because a problem can be formulated as an integer program (solving which is NP-Hard) it does not mean that the problem class itself is NP-Hard. So, as Ulrich points out there may be a polynomially-solvable formulation/algorithm such as formulating/solving the problem as a linear program.
Correct solution (end to end) is explained here: http://tkramesh.wordpress.com/2011/02/03/dynamic-programming-1-weighted-interval-scheduling/

Resources