Frechet distance in O(n) - algorithm

I have seen on a number of articles that the Fréchet algorithm complexity is O(n^2).
That the paths represent as an Q and P arrays, of n size each
What if I start from Q[0], P[0] and check all the possibilities and choose the minimal:
STP_i,j = min(|Q[i] - P[j+1]|, |Q[i+1] - P[j+1]|,|Q[i+1] - P[j]|)
And change the i and j accordingly.
So I can get the answer on O(n).
Am I wrong?

Consider the next example:
Take the dots marked with black as the beginning of the lines. In the first step, your algorithm would advance one point in both lines. However, the Fréchet distance in this case would be the distance between the first red point and the third blue point, but since your algorithm has already move away from the first point it will give you a larger value.

Related

Minimum Path to Travel (N-1) 1D Points

In order to travel to all (N-1) points, one would usually use a MST / Travelling Sales Man problem, but I found an equation that needs explanation for calculating such in O(1) for 1D points.
min( min(abs(b[0]-a) + b[n-2]-b[0], abs(b[n-2]-a) + b[n-2]-b[0]), min( abs(b[1]-a) + b[n-1]-b[1], abs(b[n-1]-a) + b[n-1] - b[1]))
Where b[] is the array of the given points and a is the starting location.
The source of the problem and the equation are from Codeforces:
http://codeforces.com/contest/709/status/B
I would appreciate any help explaining this mathematical maneuver.
Well, looking at the problem definition, it does not mentions that the input of the points are sorted in some order.
The way you gave is possible only if you first sort the input points, and then yes, finding the minimum path is O(1) by calculating this equation. Though, total runtime complexity will be O(nlogn).
Explanation of the equation:
Think of it as all the points are in a row (after sorting). You are given a starting point and you need to visit n-1 points, meaning that you need to visit all points except for only one. From this we know that minimum path will be one of the two:
All points except for the first one
All points except for the last one
The equation you gave calculates exactly that.

Picking the "spread" from the points on a line

I'm facing an algorithmic problem described as follows: Given a line from 0 to N (really big N), a list of X points on said line, and a number Z (0<=Z<=X) pick Z points from X to maximize the distance between two closest points. The brute-force solution in O(n^2) doesn't seem that difficult but I'm looking for something more sophisticated that can be done in O(n log n) time. Any clues, solutions, advice is very appreciated.
Edit: Answering the question in the first post-it is the minimal distance (between the two closest points) that has to be maximized.
One easy approach is O(XlogN).
First, sort the points.
Next observe that if you already know the minimum distance (call it d) between the points, it's O(X) to see if there's a way of picking Z points all of which are at least distance d apart: take the left-most element, then the next that's at least distance d away, then the next that's at least distance d away from that, and so on. If by the time you've got to the end of the array you have at least Z points, then you have a solution, and if you don't, there is no solution.
Now, you can use a binary search on [0, N] to find the largest d with a solution.
The sort is O(XlogX), the binary search takes O(logN) trials, and each is O(X). Overall, that's O(XlogX + XlogN), but since N >= X that simplifies to O(XlogN).

anagram string edit distance algorithm/code?

There are two anagram strings S and P. There are two basic operations:
Swap two letters that are in neighborhood, e.g, swap "A" and "C" in BCCAB, cost is 1.
Swap the first letter and the last letter in the string, cost is 1.
Question: Design an efficient algorithm that minimize the cost to change S to P.
I tried a greedy algorithm, but I found counter examples and I think it is incorrect. I know famous DP problem edit distance, but I did not get the formula for this one.
Anyone can help? An idea and pseudo code would be great.
I wonder if http://en.wikipedia.org/wiki/A*_search_algorithm would count as efficient? For a heuristic, look for the smallest distance each character has to go, treating the string as a circle, and divide the sum of these distances by two. On the circle, each character needs to participate in enough swaps to move it, one step at a time, to its destination, and each swap affects only two characters, so this heuristic should be a lower bound to the number of swaps required.
Without the ends-swap the answer is simple: you have to get the first and last letter right, and there's no way to "save" by doing it later; hence for word ai where 0 <= i < n you'd "bubble" the correct a0 and an-1 in place, then repeat for the word ai where 1 <= i < n-1 until you're left with 0 or 1 letters.
With the ends-swap option, you're left with much harder problem, since there are two directions where each letter can arrive in the correct place. You'd basically have a bipartite graph between source and target word, and you'd want to find a matching that minimizes the sum of distances. Even that is not really an algorithm, since each swap moves two of the letters, not just one.
Bottom line is, you may have to do a search, but at least you can bound the search with the no-ends-swap distance.

Partition a set into k groups with minimum number of moves

You have a set of n objects for which integer positions are given. A group of objects is a set of objects at the same position (not necessarily all the objects at that position: there might be multiple groups at a single position). The objects can be moved to the left or right, and the goal is to move these objects so as to form k groups, and to do so with the minimum distance moved.
For example:
With initial positions at [4,4,7], and k = 3: the minimum cost is 0.
[4,4,7] and k = 2: minimum cost is 0
[1,2,5,7] and k = 2: minimum cost is 1 + 2 = 3
I've been trying to use a greedy approach (by calculating which move would be shortest) but that wouldn't work because every move involves two elements which could be moved either way. I haven't been able to formulate a dynamic programming approach as yet but I'm working on it.
This problem is a one-dimensional instance of the k-medians problem, which can be stated as follows. Given a set of points x_1...x_n, partition these points into k sets S_1...S_k and choose k locations y_1...y_k in a way that minimizes the sum over all x_i of |x_i - y_f(i)|, where y_f(i) is the location corresponding of the set to which x_i is assigned.
Due to the fact that the median is the population minimizer for absolute distance (i.e. L_1 norm), it follows that each location y_j will be the median of the elements x in the corresponding set S_j (hence the name k-medians). Since you are looking at integer values, there is the technicality that if S_j contains an even number of elements, the median might not be an integer, but in such cases choosing either the next integer above or below the median will give the same sum of absolute distances.
The standard heuristic for solving k-medians (and the related and more common k-means problem) is iterative, but this is not guaranteed to produce an optimal or even good solution. Solving the k-medians problem for general metric spaces is NP-hard, and finding efficient approximations for k-medians is an open research problem. Googling "k-medians approximation", for example, will lead to a bunch of papers giving approximation schemes.
http://www.cis.upenn.edu/~sudipto/mypapers/kmedian_jcss.pdf
http://graphics.stanford.edu/courses/cs468-06-winter/Papers/arr-clustering.pdf
In one dimension things become easier, and you can use a dynamic programming approach. A DP solution to the related one-dimensional k-means problem is described in this paper, and the source code in R is available here. See the paper for details, but the idea is essentially the same as what #SajalJain proposed, and can easily be adapted to solve the k-medians problem rather than k-means. For j<=k and m<=n let D(j,m) denote the cost of an optimal j-medians solution to x_1...x_m, where the x_i are assumed to be in sorted order. We have the recurrence
D(j,m) = min (D(j-1,q) + Cost(x_{q+1},...,x_m)
where q ranges from j-1 to m-1 and Cost is equal to the sum of absolute distances from the median. With a naive O(n) implementation of Cost, this would yield an O(n^3k) DP solution to the whole problem. However, this can be improved to O(n^2k) due to the fact that the Cost can be updated in constant time rather than computed from scratch every time, using the fact that, for a sorted sequence:
Cost(x_1,...,x_h) = Cost(x_2,...,x_h) + median(x_1...x_h)-x_1 if h is odd
Cost(x_1,...,x_h) = Cost(x_2,...,x_h) + median(x_2...x_h)-x_1 if h is even
See the writeup for more details. Except for the fact that the update of the Cost function is different, the implementation will be the same for k-medians as for k-means.
http://journal.r-project.org/archive/2011-2/RJournal_2011-2_Wang+Song.pdf
as I understand, the problems is:
we have n points on a line.
we want to place k position on the line. I call them destinations.
move each of n points to one of the k destinations so the sum of distances is minimum. I call this sum, total cost.
destinations can overlap.
An obvious fact is that for each point we should look for the nearest destinations on the left and the nearest destinations on the right and choose the nearest.
Another important fact is all destinations should be on the points. because we can move them on the line to right or to left to reach a point without increasing total distance.
By these facts consider following DP solution:
DP[i][j] means the minimum total cost needed for the first i point, when we can use only j destinations, and have to put a destination on the i-th point.
to calculate DP[i][j] fix the destination before the i-th point (we have i choice), and for each choice (for example k-th point) calculate the distance needed for points between the i-th point and the new point added (k-th point). add this with DP[k][j - 1] and find the minimum for all k.
the calculation of initial states (e.g. j = 1) and final answer is left as an exercise!
Task 0 - sort the position of the objects in non-decreasing order
Let us define 'center' as the position of the object where it is shifted to.
Now we have two observations;
For N positions the 'center' would be the position which is nearest to the mean of these N positions. Example, let 1,3,6,10 be the positions. Then mean = 5. Nearest position is 6. Hence the center for these elements is 6. This gives us the position with minimum cost of moving when all elements need to be grouped into 1 group.
Let N positions be grouped into K groups "optimally". When N+1 th object is added, then it will disturb only the K th group, i.e, first K-1 groups will remain unchanged.
From these observations, we build a dynamic programming approach.
Let Cost[i][k] and Center[i][k] be two 2D arrays.
Cost[i][k] = minimum cost when first 'i' objects are partitioned into 'k' groups
Center[i][k] stores the center of the 'i-th' object when Cost[i][k] is computed.
Let {L} be the elements from i-L,i-L+1,..i-1 which have the same center.
(Center[i-L][k] = Center[i-L+1][k] = ... = Center[i-1][k]) These are the only objects that need to be considered in the computation for i-th element (from observation 2)
Now
Cost[i][k] will be
min(Cost[i-1][k-1] , Cost[i-L-1][k-1] + computecost(i-L, i-L+1, ... ,i))
Update Center[i-L ... i][k]
computecost() can be found trivially by finding the center (from observation 1)
Time Complexity:
Sorting O(NlogN)
Total Cost Computation Matrix = Total elements * Computecost = O(NK * N)
Total = O(NlogN + N*NK) = O(N*NK)
Let's look at k=1.
For k=1 and n odd, all points should move to the center point. For k=1 and n even, all points should move to either of the center points or any spot between them. By 'center' I mean in terms of number of points to either side, i.e. the median.
You can see this because if you select a target spot, x, with more points to its right than it's left, then a new target 1 to the right of x would result in a cost reduction (unless there is exactly one more point to the right than the left and the target spot is a point, in which case n is even and the target is on/between the two center points).
If your points are already sorted, this is an O(1) operation. If not, I believe it's O(n) (via an order statistic algorithm).
Once you've found the spot that all points are moving to, it's O(n) to find the cost.
Thus regardless of whether the points are sorted or not, this is O(n).

Revisit: 2D Array Sorted Along X and Y Axis

So, this is a common interview question. There's already a topic up, which I have read, but it's dead, and no answer was ever accepted. On top of that, my interests lie in a slightly more constrained form of the question, with a couple practical applications.
Given a two dimensional array such that:
Elements are unique.
Elements are sorted along the x-axis and the y-axis.
Neither sort predominates, so neither sort is a secondary sorting parameter.
As a result, the diagonal is also sorted.
All of the sorts can be thought of as moving in the same direction. That is to say that they are all ascending, or that they are all descending.
Technically, I think as long as you have a >/=/< comparator, any total ordering should work.
Elements are numeric types, with a single-cycle comparator.
Thus, memory operations are the dominating factor in a big-O analysis.
How do you find an element? Only worst case analysis matters.
Solutions I am aware of:
A variety of approaches that are:
O(nlog(n)), where you approach each row separately.
O(nlog(n)) with strong best and average performance.
One that is O(n+m):
Start in a non-extreme corner, which we will assume is the bottom right.
Let the target be J. Cur Pos is M.
If M is greater than J, move left.
If M is less than J, move up.
If you can do neither, you are done, and J is not present.
If M is equal to J, you are done.
Originally found elsewhere, most recently stolen from here.
And I believe I've seen one with a worst-case O(n+m) but a optimal case of nearly O(log(n)).
What I am curious about:
Right now, I have proved to my satisfaction that naive partitioning attack always devolves to nlog(n). Partitioning attacks in general appear to have a optimal worst-case of O(n+m), and most do not terminate early in cases of absence. I was also wondering, as a result, if an interpolation probe might not be better than a binary probe, and thus it occurred to me that one might think of this as a set intersection problem with a weak interaction between sets. My mind cast immediately towards Baeza-Yates intersection, but I haven't had time to draft an adaptation of that approach. However, given my suspicions that optimality of a O(N+M) worst case is provable, I thought I'd just go ahead and ask here, to see if anyone could bash together a counter-argument, or pull together a recurrence relation for interpolation search.
Here's a proof that it has to be at least Omega(min(n,m)). Let n >= m. Then consider the matrix which has all 0s at (i,j) where i+j < m, all 2s where i+j >= m, except for a single (i,j) with i+j = m which has a 1. This is a valid input matrix, and there are m possible placements for the 1. No query into the array (other than the actual location of the 1) can distinguish among those m possible placements. So you'll have to check all m locations in the worst case, and at least m/2 expected locations for any randomized algorithm.
One of your assumptions was that matrix elements have to be unique, and I didn't do that. It is easy to fix, however, because you just pick a big number X=n*m, replace all 0s with unique numbers less than X, all 2s with unique numbers greater than X, and 1 with X.
And because it is also Omega(lg n) (counting argument), it is Omega(m + lg n) where n>=m.
An optimal O(m+n) solution is to start at the top-left corner, that has minimal value. Move diagonally downwards to the right until you hit an element whose value >= value of the given element. If the element's value is equal to that of the given element, return found as true.
Otherwise, from here we can proceed in two ways.
Strategy 1:
Move up in the column and search for the given element until we reach the end. If found, return found as true
Move left in the row and search for the given element until we reach the end. If found, return found as true
return found as false
Strategy 2:
Let i denote the row index and j denote the column index of the diagonal element we have stopped at. (Here, we have i = j, BTW). Let k = 1.
Repeat the below steps until i-k >= 0
Search if a[i-k][j] is equal to the given element. if yes, return found as true.
Search if a[i][j-k] is equal to the given element. if yes, return found as true.
Increment k
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11

Resources