Finding the maximum sum from a matrix - algorithm

There is a matrix with m rows and n columns. The task is to find the maximum sum choosing a single element from each row and column. I came up with a solution, which finds the maximum from the whole matrix and then sets that row and column as zero adds it to the sum and proceeds with finding the next max. This repeats m times.
But the problem with this approach was if there is repetitive elements. I'll try to explain it with an example. Here is the matrix..
3 6 5 3
9 4 9 2
8 1 4 3
4 7 2 5
Now, if I follow the above method.. sum will be 9 + 7 + 5 + 3 whereas it should be 9 + 8 + 7 + 3. How to solve this problem.. I'm stuck
Update: The columns are cost of seats which can be assigned to a person and the rows are number of persons. We want to assign them in such a way, so that we get the max cost.

Isn't this just http://en.wikipedia.org/wiki/Assignment_problem, typically solved by http://en.wikipedia.org/wiki/Hungarian_algorithm? Obviously, you want a maximum rather than a minimum, but surely you can achieve that by maximising for costs that are -(the real cost) or, if you are worried about -ve costs, (Max cost in matrix) - (real cost).

Your algorithm is wrong - consider a slight change in the matrix, where the second 9 is 8:
3 6 5 3
9 4 8 2
8 1 4 3
4 7 2 5
Then your algorithm has no problem in finding 9, 7, 5, 3 (no duplicates), but the correct answer is still 8, 8, 7, 3.
You can brute force it, by trying all combinations. One good way to put that into code is to use a recursive function, that solves the problem for any matrix:
In it you would iterate through say the first row and call the function for the submatrix obtained by deleting the correspondent row and column. Then you would pick the highest sum.
That, of course, will be too slow for large matrixes. The complexity is O(N!).

Related

Minimizing difference of min & max column sums of a matrix

Given a matrix (non-square) of positive integers, where all elements on the same row are permutable, the problem is to minimize the difference between the maximum and minimum sums of columns.
For example,
9 5 7 5 7 9
9 3 4 ~> 9 4 3
10 5 9 5 10 9
---------- ----------
28 13 20 19 21 21
28-13= 15 21-19= 2
where the answer is 2.
I tried to naively sort it (combining min & max values on adjacent rows), which gives correct results for small matrices like 3x3, but what can work for bigger data (up to ~30x30)? Is there a generic solution I couldn't apply?
This problem is, unfortunately, NP-hard via a reduction from the partition problem. In the partition problem, you're given a list of numbers and want to determine whether there's a way to split those numbers into two disjoint groups so that the sums of those numbers are equal.
You can encode an instance of the partition problem as an instance of your problem as follows: create an n × 2 matrix where each row contains one of the numbers from the set and a 0. For example, given the set {1, 3, 5, 7, 9}, you'd make the matrix
1 0
3 0
5 0
7 0
9 0
If you think about it, permuting the rows here will give back two columns that can be thought of as the two sets that you've partitioned the numbers into. If the minimum difference between the columns is 0, then the set can be partitioned into two equal subsets. If it isn't, then the set can't be partitioned. Minimizing the difference between the min and the max column, therefore, would therefore allow you to solve the partition problem as a special case.
Since this problem is NP-hard, then unless P = NP, you won't be able to find any simple polynomial-time algorithms for solving it. You may be able to develop some heuristics to beat brute-force solutions, though.
Hope this helps!

What are the number of swaps required in selection sort for each case?

I believe that selection sort has the following behavior:
Best case: No swaps required as all elements are properly arranged
Worst case: n-1 swaps required i.e a swap required for each pass and there are n-1 passes as we know where n is number of elements in array
Average case: Not able to find out this. What is the procedure for finding it out?
Is the above information correct?
This says time complexity of swaps in best case is O(n)
http://ocw.utm.my/file.php/31/Module/ocwChp5SelectionSort.pdf
Each iteration of selection sort consists of scanning across the array, finding the minimum element that hasn't already been placed yet, then swapping it to the appropriate position. In a naive implementation of selection sort, this means that there will always be n - 1 swaps made regardless of distribution of elements in the input array.
If you want to minimize the number of swaps, though, you can implement selection sort so that it doesn't perform a swap in the case where the element to be moved is already in the right place. If you add in this restriction, then you're correct that zero swaps would be made in the best case. (I'm not sure whether it's worthwhile to modify selection sort this way, since swaps are pretty fast in most cases).
Really, it depends on the implementation. You could potentially have a weird implementation of selection sort that constantly swaps the candidate minimum element to its tentative final spot on each iteration, which would dramatically increase the number of swaps in the worst case. I'm not sure why you'd do this, though. It's little details like this that accounts for why your explanation seems at odds with what you've found online - depending on how the code is put together, the number of swaps can be different.
The best case and worst case running time of selection sort are n^2. This is because regardless of how the elements are initially arranged, on the i iteration of the main for loop, the algorithm always inspects each of the remaining n-i elements to find the smallest one remaining.
Selection sort is the algorithm which takes minimum number of swaps, and in the best case it takes ZERO (0) swaps, when the input is in the sorted array like 1,2,3,4. But the more pertinent question is what is the worst case of number of swaps in selection sort? And for which input does it occur?
Answer: Worst case of number of swaps is n-1. But it does not occur for the just the oppositely ordered input, rather the oppositely ordered input like 6,5,3,2,1 does not take the worst number of swaps rather it takes n/2 swaps. So what is really the input for which the number of swaps takes N-1 swaps, if you analyse a bit more you’ll see that the worst case occurs for “SINE WAVE KIND OF AN INPUT”. That is alternatively increasing and decreasing input, same as the crest and trough.
7 6 8 5 9 4 10 3 - input of eight (8) elements will therefore require 7 swaps
3 6 8 5 9 4 10 7 (1)
3 4 8 5 9 6 10 7 (2)
3 4 5 8 9 6 10 7 (3)
3 4 5 6 9 8 10 7 (4)
3 4 5 6 7 8 10 9 (5)
3 4 5 6 7 8 10 9 (6)
3 4 5 6 7 8 9 10 (7)
Hence proved that the worst case of number of swaps in selection sort is n-1, best case is 0, and average is (n-1)/2 swaps.

Generate multiple sequences of numbers with unique values at each index

I have a row with numbers 1:n. I'm looking to add a second row also with the numbers 1:n but these should be in a random order while satisfying the following:
No positions have the same number in both rows
No combination of numbers occurs twice
For example, in the following
Row 1: 1 2 3 4 5 6 7 ...
Row 2: 3 6 15 8 13 12 7 ...
the number 7 occurs at the same position in both rows 1 and 2 (namely position 7; thereby not satisfying rule 1)
while in the following
Row 1: 1 2 3 4 5 6 7 ...
Row 2: 3 7 15 8 13 12 2 ...
the combination of 2+7 appears twice (in positions 2 and 7; thereby not satisfying rule 2).
It would perhaps be possible – but unnecessarily time-consuming – to do this by hand (at least up until a reasonable number), but there must be quite an elegant solution for this in MATLAB.
This problem is called a derangment of a permutation.
Use the function randperm, in order to find a random permutation of your data.
x = [1 2 3 4 5 6 7];
y = randperm(x);
Then, you can check that the sequence is legal. If not, do it again and again..
You have a probability of about 0.3 each time to succeed, which means that you need roughly 10/3 times to try until you find it.
Therefore you will find the answer really quickly.
Alternatively, you can use this algorithm to create a random derangment.
Edit
If you want to have only cycles of size > 2, this is a generalization of the problem.
In it is written that the probability
in that case is smaller, but big enough to find it in a fixed amount of steps. So the same approach is still valid.
This is fairly straightforward. Create a random permutation of the nodes, but interpret the list as follows: Interpret it as a random walk around the nodes, and if node 'b' appears after node 'a', it means that node 'b' appears below node 'a' in the lists:
So if your initial random permutation is
3 2 5 1 4
Then the walk in this case is 3 -> 2 -> 5 -> 1 -> 4 and you creates the rows as follows:
Row 1: 1 2 3 4 5
Row 2: 4 5 2 3 1
This random walk will satisfy both conditions.
But do you wish to allow more than one cycle in your network? I know you don't want two people to have each other's hat. But what about 7 people, where 3 of them have each other's hats and the other 4 have each other's hats? Is this acceptable and/or desirable?
Andrey has already pointed you to randperm and the rejection-sampling-like approach. After generating a permutation p, an easy way to check whether it has fixed point is any(p==1:n). An easy way to check whether it contains cycles of length 2 is any(p(p)==1:n).
So this gets permutations p of 1:n fulfilling your requirements:
p=[];
while (isempty(p))
p=randperm(n);
if any(p==1:n), p=[];
elseif any(p(p)==1:n), p=[];
end
end
Surrounding this with a for loop and for each counting the iterations of the while loop, it seems that one needs to generate on average 4.5 permutations for every "valid" one (and 6.2 if cycles of length three are not allowed, either). Very interesting.

Finding good heuristic for A* search

I'm trying to find the optimal solution for a little puzzle game called Twiddle (an applet with the game can be found here). The game has a 3x3 matrix with the number from 1 to 9. The goal is to bring the numbers in the correct order using the minimum amount of moves. In each move you can rotate a 2x2 square either clockwise or counterclockwise.
I.e. if you have this state
6 3 9
8 7 5
1 2 4
and you rotate the upper left 2x2 square clockwise you get
8 6 9
7 3 5
1 2 4
I'm using a A* search to find the optimal solution. My f() is simply the number of rotations needed. My heuristic function already leads to the optimal solution (if I modify it, see the notice a t the end) but I don't think it's the best one you can find. My current heuristic takes each corner, looks at the number at the corner and calculates the manhatten distance to the position this number will have in the solved state (which gives me the number of rotation needed to bring the number to this postion) and sums all these values. I.e. You take the above example:
6 3 9
8 7 5
1 2 4
and this end state
1 2 3
4 5 6
7 8 9
then the heuristic does the following
6 is currently at index 0 and should by at index 5: 3 rotations needed
9 is currently at index 2 and should by at index 8: 2 rotations needed
1 is currently at index 6 and should by at index 0: 2 rotations needed
4 is currently at index 8 and should by at index 3: 3 rotations needed
h = 3 + 2 + 2 + 3 = 10
Additionally, if h is 0, but the state is not completely ordered, than h = 1.
But there is the problem, that you rotate 4 elements at once. So there a rare cases where you can do two (ore more) of theses estimated rotations in one move. This means theses heuristic overestimates the distance to the solution.
My current workaround is, to simply excluded one of the corners from the calculation which solves this problem at least for my test-cases. I've done no research if really solves the problem or if this heuristic still overestimates in some edge-cases.
So my question is: What is the best heuristic you can come up with?
(Disclaimer: This is for a university project, so this is a bit of homework. But I'm free to use any resource if can come up with, so it's okay to ask you guys. Also I will credit Stackoverflow for helping me ;) )
Simplicity is often most effective. Consider the nine digits (in the rows-first order) as forming a single integer. The solution is represented by the smallest possible integer i(g) = 123456789. Hence I suggest the following heuristic h(s) = i(s) - i(g). For your example, h(s) = 639875124 - 123456789.
You can get an admissible (i.e., not overestimating) heuristic from your approach by taking all numbers into account, and dividing by 4 and rounding up to the next integer.
To improve the heuristic, you could look at pairs of numbers. If e.g. in the top left the numbers 1 and 2 are swapped, you need at least 3 rotations to fix them both up, which is a better value than 1+1 from considering them separately. In the end, you still need to divide by 4. You can pair up numbers arbitrarily, or even try all pairs and find the best division into pairs.
All elements should be taken into account when calculating distance, not just corner elements. Imagine that all corner elements 1, 3, 7, 9 are at their home, but all other are not.
It could be argued that those elements that are neighbors in the final state should tend to become closer during each step, so neighboring distance can also be part of heuristic, but probably with weaker influence than distance of elements to their final state.

Adding waypoints to A* graph search

I have the ability to calculate the best route between a start and end point using A*. Right now, I am including waypoints between my start and end points by applying A* to the pairs in all permutations of my points.
Example:
I want to get from point 1 to point 4. Additionally, I want to pass through points 2 and 3.
I calculate the permutations of (1, 2, 3, 4):
1 2 3 4
1 2 4 3
1 3 2 4
1 3 4 2
1 4 2 3
1 4 3 2
2 1 3 4
2 1 4 3
2 3 1 4
2 3 4 1
2 4 1 3
2 4 3 1
3 1 2 4
3 1 4 2
3 2 1 4
3 2 4 1
3 4 1 2
3 4 2 1
4 1 2 3
4 1 3 2
4 2 1 3
4 2 3 1
4 3 1 2
4 3 2 1
Then, for each permutation, I calculate the A* route from the first to the second, then append it to the route from the second to the third, then the third to the fourth.
When I have this calculated for each permutation, I sort the routes by distance and return the shortest.
Obviously, this works but involves a lot of calculation and totally collapses when I have 6 waypoints (permutations of 8 items is 40320 :-))
Is there a better way to do this?
First of all, you should store all intermediate calculations. Once you calculated the route from 1 to 2, you should never recalculate it again, just look up in a table.
Second, if your graph is undirected, a route from 2 to 1 has exactly the same distance as a route from 1 to 2, so you should not recalculate it either.
And finally, in any case you will have an algorithm that is exponential to the number of points you need to pass. This is very similar to the traveling salesman problem, and it will be exactly this problem if you include all available points. The problem is NP-complete, i.e. it has complexity, exponential to the number of waypoints.
So if you have a lot of points that you must pass, exponential collapse is inevitable.
As a previous answer mentioned, this problem is the NP-complete Traveling Salesperson Problem.
There is a better method than the one you use. The state-of-the-art TSP solver is due to Georgia Tech's Concorde solver. If you can't simply use their freely available program in your own or use their API, I can describe the basic techniques they use.
To solve the TSP, they start with a greedy heuristic called the Lin-Kernighan heuristic to generate an upper bound. Then they use branch-and-cut on a mixed integer programming formulation of the TSP. This means they write a series of linear and integer constraints which, when solved, gives you the optimal path of the TSP. Their inner loop calls a linear programming solver such as Qsopt or Cplex to get a lower bound.
As I mentioned, this is the state-of-the-art so if you're looking for a better way to solve the TSP than what you're doing, here is the best. They can handle over 10,000 cities in a few seconds, especially on the symmmetric, planar TSP (which I suspect is the variant you're working on).
If the number of waypoints you need to eventually handle is small, say on the order of 10 to 15, then you may be able to do a branch-and-bound search using the minimum spanning tree heuristic. This is a textbook exercise in many introductory AI courses. More waypoints than that you will probably outlive the actual running time of the algorithm, and you will have to use Concorde instead.

Resources