The items a-d are to be paired with items 0-3 in such a way that the total distance between all item pairs are minimized. For example, this matrix could describe the distance between each item in the first group and an item in its counterpart group:
[[2, 2, 4, 9],
[4, 7, 1, 1],
[3, 3, 8, 3],
[6, 1, 7, 8]]
This is supposed to mean that the distance 'a' -> '0' is 2, from 'a' -> '1' is 2, from 'a' -> '2' is 4, 'a' -> '3' is 9. From 'b' -> '0' it is 4 and so on.
Is there an algorithm that can match each letter with a digit, so that the total distance is minimized? E.g.:
[('a', 1), ('b', 3), ('c', 0), ('d', 2)]
Would be a legal solution with total distance: 2 + 1 + 3 + 7 = 13. Brute forcing and testing all possible combinations is not possible since the real world has groups with much more than four items in them.
This is a classic optimization task for bipartite graphs and can be solved with the Hungarian algorithm/method.
This can be solved by treating it as an instance of a weighted bipartite matching problem. The idea is to treat the elements a-d and 0-3 as nodes in a graph, where each lettered node is connected to each numbered node with an edge whose weight is specified by the matrix. Once you have this graph, you want to find a set of edges matching letters to numbers in a way where each node is only connected to at most one edge. Such a set of edges is called a matching, and since you want to minimize the distance you are looking for a minimum-cost matching.
As yi_H points out, this problem is well-studied and has many good polynomial-time algorithms. The Hungarian Algorithm is perhaps the most famous algorithm for the problem, but others have been invented since then that are asymptotically (or practically) faster.
This problem is worth remembering, since it arises in many circumstances. Any time you need to assign items in one group to items in another, check whether you can reduce the problem to bipartite matching. If so, you've almost certainly found a fast solution to the initial problem.
Related
Problem:
There are N cubes. There are M numbers. Each side of cube has number from 1 to M. You can stack one cube on another if their touching sides have same number (top side of bottom cube and bottom side of top cube has same number). Find the highest tower of cubes.
Input: number N of cubes and number M.
Example:
INPUT: N=5, M=6. Now we generate 5 random cubes with 6 sides = <1,M>.
[2, 4, 3, 1, 4, 1]
[5, 1, 6, 6, 2, 5]
[2, 5, 3, 1, 1, 6]
[3, 5, 6, 1, 3, 4]
[2, 4, 4, 5, 5, 5]
how you interpret single array of 6 numbers is up to you. Opposite sides in cube might be index, 5-index (for first cube opposite side of 4 would be 4). Opposite sides in cube might also be index and index+1 or index-1 if index%2==0 or 1 respectively. I used the second one.
Now let's say first cube is our current tower. Depending on the rotation top color might be one of 1, 2, 3, 4. If the 1 is color on top we can stack
on top of it second, third or fourth cube. All of them has color 1 on their sides. Third cube even has two sides with color 1 so we can stack it in two different ways.
I won't analyse it till the end because this post would be too long. Final answer for these (max height of the tower) is 5.
My current solution (you can SKIP this part):
Now I'm just building the tower recursively. Each function has this subproblem to solve: find highest tower given the top color of current tower and current unused cubes (or current used cubes). This way I can memoize and store results for tuple(top color of tower, array of used cubes). Despite memoization I think that in the worst case (for small M) this solution has to store M*(2^N) values (and this many cases to solve).
What I'm looking for:
I'm looking for something that would help me solve this efficiently for small M. I know that there is tile stacking problem (which uses Dynamic Programming) and tower of cubes (which uses DAG longest path) but I don't see the applicability of these solutions to my problem.
You won't find a polynomial time solution- if you did, we'd be able to solve the decision variant of the longest path problem (which is NP-Complete) in polynomial time. The reduction is as follows: for every edge in an undirected graph G, create a cube with opposing faces (u, v), where u and v are unique identifiers for the vertices of the edge. For the remaining 4 faces, assign globally unique identifiers. Solve for the tallest cube tower, this tower's height will be the length of the longest path of G, return if path length equals the queried value (yes/no).
However, you could still solve it in something like O(M^3*(N/2)!*log(N)) time (I think that bound is a bit loose, but its close). Use divide and conquer with memoization. Find all longest paths using cubes [0, N) beginning with a value B in range [0, M) and ending with a value E in range [0, M), for all possible B and E. To compute this, recurse, partitioning the cubes evenly in every possible way. Keep recursing until you hit the bottom (just one cube). Then begin merging them (by combining cube stacks that end in X with those beginning with X, for all X in [0, M). Once that's all done, at the topmost level just take the max of all the tower heights.
I'm looking for an algorithm that addresses the LCS problem for two strings with the following conditions:
Each string consists of English characters and each character has a weight. For example:
sequence 1 (S1): "ABBCD" with weights [1, 2, 4, 1, 3]
sequence 2 (S2): "TBDC" with weights [7, 5, 1, 2]
Suppose that MW(s, S) is defined as the maximum weight of the sub-sequence s in string S with respect to the associated weights. The heaviest common sub-sequence (HCS) is defined as:
HCS = argmin(MW(s, S1), MW(s, S2))
The algorithm output should be the indexes of HCS in both strings and the weight. In this case, the indexes will be:
I_S1 = [2, 4] --> MW("BD", "ABBCD") = 7
I_S2 = [1, 2] --> MW("BD", "TBDC") = 6
Therefore HCS = "BD", and weight = min(MW(s, S1), MW(s, S2)) = 6.
The table that you need to build will have this.
for each position in sequence 1
for each position in sequence 2
for each extreme pair of (weight1, weight2)
(last_position1, last_position2)
Where an extreme pair is one where it is not possible to find a subsequence to that point whose weights in sequence 1 and weights in sequence 2 are both >= and at least one is >.
There may be multiple extreme pairs, where one sequence is higher than the other.
The rule is that at the (i, -1) or (-1, j) positions, the only extreme pair is the empty set with weight 0. At any other we merge the extreme pairs for (i-1, j) and (i, j-1). And then if seq1[i] = seq2[j], then add the options where you went to (i-1, j-1) and then included the i and j in the respective subsequences. (So add weight1[i] and weight2[j] to the weights then do a merge.)
For that merge you can sort by weight1 ascending, all of the extreme values for both previous points, then throw away all of the ones whose weight2 is less than or equal to the best weight2 that was already posted earlier in the sequence.
When you reach the end you can find the extreme pair with the highest min, and that is your answer. You can then walk the data structure back to find the subsequences in question.
Thiswikipedia page explains the Floyd Warshall algorithm to find the shortest path between nodes in a graph. The wikipedia page uses the graph on the left of the image as a starting graph (prior to the first iteration when k = 0) and then shows the remaining iterations (k = 1 etc) but it doesn't explain the significance of the numbers between the nodes and how those numbers are calculated. For example, in the starting graph when k = 0 why is there a -2 on the edge between 1 and 3, and why is there a 3 on the edge between 2 and 3. How are those calculated?
Furthermore, when k = 2, the wikipedia page says,
The path [4,2,3] is not considered, because [2,1,3] is the shortest
path encountered so far from 2 to 3.
Why is [2,1,3] shorter than [4,2,3]?
The numbers on the edges are just weights. It's a part of the input. The algorithm doesn't compute them.
[2, 1, 3] is not shorter than [4, 2, 3]. It's shorter than [2, 3], though. That's the only thing that matters.
I recently thought of this problem, and I thought of an "instinctive" greedy solution but I can't prove its optimality.
You are given N integers, V1, V2, ..., VN and K sets (K < N).
You need to find a way of partitioning the integers into the sets, so that the minimum difference between any two elements in the same set is maximized.
For example, when the integers are 1, 5, 6, 8, 8 and you have 2 sets, an optimal way of partitioning the integers would be
{1, 6, 8}
{5, 8}
So the minimum difference is between 6 and 8, which is 2.
This arrangement is not unique, for example
{1, 5, 8}
{6, 8}
Also gives a minimum difference of 2.
I was thinking, if I can use a greedy algorithm to solve this.
I would sort it first, and then put all V1, V1+K, V1+2K... together, and then all V2, V2+K, V2+2K... together, and so on.
Is there a proof for the optimality of this solution, or a counterexample where this does not work?
Thanks.
Yes, it's optimal. We'll show that if a difference D appears using your process, then for any arrangement of the numbers there's a pair of numbers in the same set which differ by at most D.
To prove it, consider adding the sorted numbers one by one to the K sets. Let's call the sorted numbers x[i]. Suppose we're adding x[n] to one of the sets. The largest value in that set is x[n-k], with x[n]-x[n-k] = D for some D.
Now, the set x[n-k], x[n-k+1], ..., x[n] is a set of k+1 numbers, all of which differ from each other by at most D (for x[n]-x[n-k] = D).
By the pigeon-hole principle, two of these k+1 numbers must fall in the same set no matter how you arrange them, so the maximum minimum distance must be at most D.
This proves that if a distance D appears in your process, then the maximum minimum distance achievable is at most D.
Let D_min be the smallest difference between two numbers in the same set using your process. Then we've shown that the maximum minimum distance achievable is <= D_min, but also D_min <= maximum minimum distance (since D_min is a minimum distance) which shows that D_min is the maximum minimum distance.
I'm looking for an efficient algorithm (not necessarily a code) for solving the following question:
Given n positive and negative numbers that sum up to zero, we would like to find a starting index that will cause the cumulated sum to zero up as many times as possible.
It doesn't have to be in a specific manner, but the importance here is the efficincy- we want the algorithm/idea to be able to this in less then a qudratic "time complexity"
An example:
Given the numbers: 2, -1, 3, 1, -3, -2:
If we strat summing up with 2 (first index), the sum will be zero only once (at the end of the summation), but strting with -1 will yield zero twice during the summation.
The given numbers may have more than one "best index", but we would like to find at least one of these indexes.
I've tried doing it with binary search, but didn't make much progress- so any hints/help will be appreciated.
You can compute prefix sums. In terms of prefix sums, zeros are positions that have the same value of a prefix sum as the start position. So the problem is reduced to finding the most frequent element in the array of prefix sums. It can be solved efficiently using sorting or hash tables.
Here is an example:
Input: {2, -1, 3, 1, -3, 2}
Prefix sums: {0, 2, 1, 4, 5, 2, 0}
The most frequent element is 2. The first occurrence of 2 is in the first position. Thus, starting from the second element yields optimal answer.