Minimizing difference of min & max column sums of a matrix - algorithm

Given a matrix (non-square) of positive integers, where all elements on the same row are permutable, the problem is to minimize the difference between the maximum and minimum sums of columns.
For example,
9 5 7 5 7 9
9 3 4 ~> 9 4 3
10 5 9 5 10 9
---------- ----------
28 13 20 19 21 21
28-13= 15 21-19= 2
where the answer is 2.
I tried to naively sort it (combining min & max values on adjacent rows), which gives correct results for small matrices like 3x3, but what can work for bigger data (up to ~30x30)? Is there a generic solution I couldn't apply?

This problem is, unfortunately, NP-hard via a reduction from the partition problem. In the partition problem, you're given a list of numbers and want to determine whether there's a way to split those numbers into two disjoint groups so that the sums of those numbers are equal.
You can encode an instance of the partition problem as an instance of your problem as follows: create an n × 2 matrix where each row contains one of the numbers from the set and a 0. For example, given the set {1, 3, 5, 7, 9}, you'd make the matrix
1 0
3 0
5 0
7 0
9 0
If you think about it, permuting the rows here will give back two columns that can be thought of as the two sets that you've partitioned the numbers into. If the minimum difference between the columns is 0, then the set can be partitioned into two equal subsets. If it isn't, then the set can't be partitioned. Minimizing the difference between the min and the max column, therefore, would therefore allow you to solve the partition problem as a special case.
Since this problem is NP-hard, then unless P = NP, you won't be able to find any simple polynomial-time algorithms for solving it. You may be able to develop some heuristics to beat brute-force solutions, though.
Hope this helps!

Related

What is the correct approach to solve this matrix

We are given following matrix
5 7 9
7 8 4
4 2 9
We need to find maximum sum row or column and then we need to subtract 1 from each element of that row or column and then we need to repeat this operation for 3 times.
I will try to explain.
The matrix is n*n and the increment process is repeated for k times.
An o(n^2+k×log(n)) algorithm is possible.
If the sum of the biggest row/columns is a row so:
The row sum is increased by n.
All columns sum is increased by 1.
The two rules apply for columns as well.
For rule one store all rows/columns sum in 2 AVL Trees
(or every other data structure that support o(log(n)) insert and remove)
For rule two store rows/columns number of operations. (just two integers)
Now take the max of both trees where the two integers play a role for difference of the data staractures. Change it and change the other it and insert back.

Select optimal pairings of elements

I have following problem e.g:
Given a bucket with symbols
1 1 2 3 3 4
And book of recipes to create pairs e.g:
12 13 24
Select from bucket optimal pairing, leaving as little as possible symbols in the bucket. So using the examplary values above the optimal pairing would be:
13 13 24 Which would use all the symbols given.
Naive picking from the bucket could result in something like:
12 13 Leaving the 3 and 4 unmatched. 3 and 4 cannot be matched because the book does not contain a recipe for that particular connection
Notes:
Real problem consits on average of: 500 elements in bucket in about 30 kind of symbols.
We've tried to implement the solution using the bruteforce algorithm, however I am afraid that even our grandchildren will not live long enough to see the result :).
There is no limit to the size of recipe book, it could even have every possible in the bucket. Pair made of the same element twice is not allowed.
The answer is not required to empty the bucket completely. Its just about getting the most pairs out of the bucket. Its okay to leave some in the bucket. It would be best to look for the optimal solution, however close approximation is also good enough.
I will appreciate an answer that proposes/gives hint to an algorithm to solve the problem.
Examples:
Bucket:
1 1 2 2 2 2 3 3 3 4 5 6 7 8 8
Recipe book:
12 34 15 68
Optimal result (one of possible):
{1 2} {1 2} {3 4} {6 8}
Leftover:
2 2 3 3 5 7 8
This problem is essentially the maximum matching problem with the small twist that you're allowed to have duplicate objects. Here's one way to solve this problem assuming you have a solver for maximum matching:
Create a node for each number in the input list.
For each recipe, for each pair of numbers matching that recipe, add an edge between the nodes for those numbers.
Run a maximum matching algorithm and return the pairs reported that way.
There are a good number of off-the-shelf maximum matching algorithms you can use, and if you need to code one up yourself, consider Edmonds' Blossom Algorithm, which is reasonably efficient and less tricky to code up than other approaches.
First generate all possibles pairs of symbols and store them with the indices of each symbol , so if you have n symbols , then n*(n+1)/2 pairs are going to be generated (max case n=500 then 125250 pairs are going to be generated ).
Ex : bucket with symbols 1 1 3
Then pairs are going to be generated are (11,1,2)(13,1,3)(13,2,3).
General format ( a[i]a[j], i, j ).
Now lets loop over generated pairs and delete pairs that doesn't exist in the book of recipes, so now we have at most 30 pairs .
Next lets build a graph such that the nodes are our generated pairs, and each 2 nodes are connected if the indices of the 2 pairs are different (using 2 nested loops over our pairs ) .
Finally we can perform BFS or DFS and find the longest graph between all generated graphs , which has the answer to our problem.
If you want c++/Java implementation ,please don't hesitate to ask.

Generate multiple sequences of numbers with unique values at each index

I have a row with numbers 1:n. I'm looking to add a second row also with the numbers 1:n but these should be in a random order while satisfying the following:
No positions have the same number in both rows
No combination of numbers occurs twice
For example, in the following
Row 1: 1 2 3 4 5 6 7 ...
Row 2: 3 6 15 8 13 12 7 ...
the number 7 occurs at the same position in both rows 1 and 2 (namely position 7; thereby not satisfying rule 1)
while in the following
Row 1: 1 2 3 4 5 6 7 ...
Row 2: 3 7 15 8 13 12 2 ...
the combination of 2+7 appears twice (in positions 2 and 7; thereby not satisfying rule 2).
It would perhaps be possible – but unnecessarily time-consuming – to do this by hand (at least up until a reasonable number), but there must be quite an elegant solution for this in MATLAB.
This problem is called a derangment of a permutation.
Use the function randperm, in order to find a random permutation of your data.
x = [1 2 3 4 5 6 7];
y = randperm(x);
Then, you can check that the sequence is legal. If not, do it again and again..
You have a probability of about 0.3 each time to succeed, which means that you need roughly 10/3 times to try until you find it.
Therefore you will find the answer really quickly.
Alternatively, you can use this algorithm to create a random derangment.
Edit
If you want to have only cycles of size > 2, this is a generalization of the problem.
In it is written that the probability
in that case is smaller, but big enough to find it in a fixed amount of steps. So the same approach is still valid.
This is fairly straightforward. Create a random permutation of the nodes, but interpret the list as follows: Interpret it as a random walk around the nodes, and if node 'b' appears after node 'a', it means that node 'b' appears below node 'a' in the lists:
So if your initial random permutation is
3 2 5 1 4
Then the walk in this case is 3 -> 2 -> 5 -> 1 -> 4 and you creates the rows as follows:
Row 1: 1 2 3 4 5
Row 2: 4 5 2 3 1
This random walk will satisfy both conditions.
But do you wish to allow more than one cycle in your network? I know you don't want two people to have each other's hat. But what about 7 people, where 3 of them have each other's hats and the other 4 have each other's hats? Is this acceptable and/or desirable?
Andrey has already pointed you to randperm and the rejection-sampling-like approach. After generating a permutation p, an easy way to check whether it has fixed point is any(p==1:n). An easy way to check whether it contains cycles of length 2 is any(p(p)==1:n).
So this gets permutations p of 1:n fulfilling your requirements:
p=[];
while (isempty(p))
p=randperm(n);
if any(p==1:n), p=[];
elseif any(p(p)==1:n), p=[];
end
end
Surrounding this with a for loop and for each counting the iterations of the while loop, it seems that one needs to generate on average 4.5 permutations for every "valid" one (and 6.2 if cycles of length three are not allowed, either). Very interesting.

Finding the maximum sum from a matrix

There is a matrix with m rows and n columns. The task is to find the maximum sum choosing a single element from each row and column. I came up with a solution, which finds the maximum from the whole matrix and then sets that row and column as zero adds it to the sum and proceeds with finding the next max. This repeats m times.
But the problem with this approach was if there is repetitive elements. I'll try to explain it with an example. Here is the matrix..
3 6 5 3
9 4 9 2
8 1 4 3
4 7 2 5
Now, if I follow the above method.. sum will be 9 + 7 + 5 + 3 whereas it should be 9 + 8 + 7 + 3. How to solve this problem.. I'm stuck
Update: The columns are cost of seats which can be assigned to a person and the rows are number of persons. We want to assign them in such a way, so that we get the max cost.
Isn't this just http://en.wikipedia.org/wiki/Assignment_problem, typically solved by http://en.wikipedia.org/wiki/Hungarian_algorithm? Obviously, you want a maximum rather than a minimum, but surely you can achieve that by maximising for costs that are -(the real cost) or, if you are worried about -ve costs, (Max cost in matrix) - (real cost).
Your algorithm is wrong - consider a slight change in the matrix, where the second 9 is 8:
3 6 5 3
9 4 8 2
8 1 4 3
4 7 2 5
Then your algorithm has no problem in finding 9, 7, 5, 3 (no duplicates), but the correct answer is still 8, 8, 7, 3.
You can brute force it, by trying all combinations. One good way to put that into code is to use a recursive function, that solves the problem for any matrix:
In it you would iterate through say the first row and call the function for the submatrix obtained by deleting the correspondent row and column. Then you would pick the highest sum.
That, of course, will be too slow for large matrixes. The complexity is O(N!).

Finding good heuristic for A* search

I'm trying to find the optimal solution for a little puzzle game called Twiddle (an applet with the game can be found here). The game has a 3x3 matrix with the number from 1 to 9. The goal is to bring the numbers in the correct order using the minimum amount of moves. In each move you can rotate a 2x2 square either clockwise or counterclockwise.
I.e. if you have this state
6 3 9
8 7 5
1 2 4
and you rotate the upper left 2x2 square clockwise you get
8 6 9
7 3 5
1 2 4
I'm using a A* search to find the optimal solution. My f() is simply the number of rotations needed. My heuristic function already leads to the optimal solution (if I modify it, see the notice a t the end) but I don't think it's the best one you can find. My current heuristic takes each corner, looks at the number at the corner and calculates the manhatten distance to the position this number will have in the solved state (which gives me the number of rotation needed to bring the number to this postion) and sums all these values. I.e. You take the above example:
6 3 9
8 7 5
1 2 4
and this end state
1 2 3
4 5 6
7 8 9
then the heuristic does the following
6 is currently at index 0 and should by at index 5: 3 rotations needed
9 is currently at index 2 and should by at index 8: 2 rotations needed
1 is currently at index 6 and should by at index 0: 2 rotations needed
4 is currently at index 8 and should by at index 3: 3 rotations needed
h = 3 + 2 + 2 + 3 = 10
Additionally, if h is 0, but the state is not completely ordered, than h = 1.
But there is the problem, that you rotate 4 elements at once. So there a rare cases where you can do two (ore more) of theses estimated rotations in one move. This means theses heuristic overestimates the distance to the solution.
My current workaround is, to simply excluded one of the corners from the calculation which solves this problem at least for my test-cases. I've done no research if really solves the problem or if this heuristic still overestimates in some edge-cases.
So my question is: What is the best heuristic you can come up with?
(Disclaimer: This is for a university project, so this is a bit of homework. But I'm free to use any resource if can come up with, so it's okay to ask you guys. Also I will credit Stackoverflow for helping me ;) )
Simplicity is often most effective. Consider the nine digits (in the rows-first order) as forming a single integer. The solution is represented by the smallest possible integer i(g) = 123456789. Hence I suggest the following heuristic h(s) = i(s) - i(g). For your example, h(s) = 639875124 - 123456789.
You can get an admissible (i.e., not overestimating) heuristic from your approach by taking all numbers into account, and dividing by 4 and rounding up to the next integer.
To improve the heuristic, you could look at pairs of numbers. If e.g. in the top left the numbers 1 and 2 are swapped, you need at least 3 rotations to fix them both up, which is a better value than 1+1 from considering them separately. In the end, you still need to divide by 4. You can pair up numbers arbitrarily, or even try all pairs and find the best division into pairs.
All elements should be taken into account when calculating distance, not just corner elements. Imagine that all corner elements 1, 3, 7, 9 are at their home, but all other are not.
It could be argued that those elements that are neighbors in the final state should tend to become closer during each step, so neighboring distance can also be part of heuristic, but probably with weaker influence than distance of elements to their final state.

Resources