Highest possible sum across 2D array

Highest possible sum across 2D array - algorithm

What is the best way to find the highest possible sum across a 2D integer array? You can't repeat columns and rows. Eg.
1 3 6
4 5 2
3 1 3
Max sum: 3+5+6=14
I know there is a method called the Hungarian algorithm, but that seems to be more suitable for finding minimum sum.

Yes, you can use the hungarian algorithm.
You need to modify the search criteria to look for largest sum instead of the smallest on. You also need to run Bellman-Ford instead of Dijkstra for the search component (because Dijkstra can't compute maximum sum path).
You can't run into a constantly increasing loop because the selected nodes are already paired using their maximum value, so any change would yield a lower total sum. The algorithm will chose to rearrange the connections if the loss from the already connected nodes is less than the gain from the newly connected one. You don't need to worry about it.

Related

How to optimize the order of elements based on similarity coefficients?

I have to reorder a sequence of elements based on the similarity between each other (expressed by a coefficient) so that each element is the most similar possible to each of its neighbors. I have to find an algorithm rather than a code.
Example with 10 elements and similarity coefficients calculated for each pair of the elements below :
The excel file can be find here : https://1drv.ms/x/s!AtmZN4-kjgrPms99fqgaDwAS_F4uYw
What I have tried :
Find a pair with the highest coefficient. In the example : 0.98 for T3 (left-end) and T5 (right-end)
Find maximum coefficient between the left-end and the remaining elements
Find maximum coefficient between the right-end and the remaining elements
Take the maximum between 2. and 3.
If maximum is 2. add on the left the element corresponding to the maximum coefficient for the left-end. Else, add on the right the element corresponding to the maximum coefficient for the right-end
Repeat points 2 - 6 until no elements left.
Here is the result :
The result isn't bad. One of the disadvantages I see is that 0.99>0.98 is considered in the same way as 0.99>0.01.
The second option I thought about was maximizing the sum of coefficients between all neighbors, but don't really know where to start from. Especially if there are significantly more than 10 elements. More, it could result in a more "flat" order where while having better similarities overall some extremely similar elements could be placed far from each other.
Being really new to this kind of problems I am pretty sure this should be a rather standard issue with existing solutions. Could you please point to those?
Thank you!

After researching I have found that my problem can be seen as the "Travelling Salesman Problem" (TSP). More here : https://en.wikipedia.org/wiki/Travelling_salesman_problem
To apply it you can see "elements" in my example as "cities" in TSP and (1-Similarity coefficient) as "distances".

What is the difference between clustering and matching?

What is the difference between clustering and matching?
For example: There's a pool of four elements and in the one scenario I want to generate pairs. What I do is I measure the distance of each element to each other which yields a 2x2 matrix. Then the matching algorithm finds the two pairings with the lowest or highest weighted sum.
What is a clustering algorithm doing? When I demand a cluster number of two then the result is the same, or not?

Specifying the number of elements in a cluster (pairs for example) doesn't make much sense. If you have been looking at k-means (k-medoids), the k actually indicates how many clusters will be created in total. So, if you have 4 elements and use k = 2, you can get one cluster with 1 element and another cluster with 3 elements, depending on the data you have. Anyway, clustering on 4 elements doesn't make sense.

select equal sized number groups from a sequence of random numbers

say i have a list of 100 numbers, i want to split them into 5 groups which have the sum within each group closest to the mean of the numbers.
the simplest solution is to sort the hundred numbers and take the max number and keep adding the smallest numbers until the sum goes beyond the avg.
obviously that is not going to bring the best results. i guess we could use BFS or DFS or some other search algo. like A* to get the best result.
does anyone have a simple solution out there? pseudo code is good enough. thanks!

This sounds like a variation on the knapsack problem and if I'm interpreting you correctly, it may be the multiple knapsack problem. Can't you come up with an easy problem? :)

An efficient algorithm (solution) that can be used is a variation of the Best Fit of bin packing algorithm. However, we would have to be applying a variation in which we have specifically 5 different groups of numbers that we want rather than looking for using the smallest number of groups.
The algorithm begins by finding the mean of the list of all 100 numbers. This mean will be used as the max capacity for all 5 of the groups (bins) we are trying to fit numbers into. We then find the largest number in our list of 100 numbers that doesn’t exceed the max capacity of our group and assign it to our first group. (We can find this in log(n) time because we can use a self-balancing binary search tree). We keep track of the how filled our current group is. We then find the next largest number that fits into our current group until we have either reached max capacity or there are no other numbers that will allow this group to reach max capacity. In either of these cases we move on to the next group and repeat our algorithm with the numbers left in our list of numbers. Once we move on from a group we must also keep track of the current sum of that group. We continue until we have reached the highest capacity for all 5 groups. If there are any numbers left we place them in the groups that have the lowest total sum (because we kept track of these sums as we went).
This is in fact a greedy algorithm with a Θ(nlogn) running time due to the nature of bin packing.

From an interview: Removing rows and columns in an n×n matrix to maximize the sum of remaining values

Given an n×n matrix of real numbers. You are allowed to erase any number (from 0 to n) of rows and any number (from 0 to n) of columns, and after that the sum of the remaining entries is computed. Come up with an algorithm which finds out which rows and columns to erase in order to maximize that sum.

The problem is NP-hard. (So you should not expect a polynomial-time algorithm for solving this problem. There could still be (non-polynomial time) algorithms that are slightly better than brute-force, though.) The idea behind the proof of NP-hardness is that if we could solve this problem, then we could solve the the clique problem in a general graph. (The maximum-clique problem is to find the largest set of pairwise connected vertices in a graph.)
Specifically, given any graph with n vertices, let's form the matrix A with entries a[i][j] as follows:
a[i][j] = 1 for i == j (the diagonal entries)
a[i][j] = 0 if the edge (i,j) is present in the graph (and i≠j)
a[i][j] = -n-1 if the edge (i,j) is not present in the graph.
Now suppose we solve the problem of removing some rows and columns (or equivalently, keeping some rows and columns) so that the sum of the entries in the matrix is maximized. Then the answer gives the maximum clique in the graph:
Claim: In any optimal solution, there is no row i and column j kept for which the edge (i,j) is not present in the graph. Proof: Since a[i][j] = -n-1 and the sum of all the positive entries is at most n, picking (i,j) would lead to a negative sum. (Note that deleting all rows and columns would give a better sum, of 0.)
Claim: In (some) optimal solution, the set of rows and columns kept is the same. This is because starting with any optimal solution, we can simply remove all rows i for which column i has not been kept, and vice-versa. Note that since the only positive entries are the diagonal ones, we do not decrease the sum (and by the previous claim, we do not increase it either).
All of which means that if the graph has a maximum clique of size k, then our matrix problem has a solution with sum k, and vice-versa. Therefore, if we could solve our initial problem in polynomial time, then the clique problem would also be solved in polynomial time. This proves that the initial problem is NP-hard. (Actually, it is easy to see that the decision version of the initial problem — is there a way of removing some rows and columns so that the sum is at least k — is in NP, so the (decision version of the) initial problem is actually NP-complete.)

Well the brute force method goes something like this:
For n rows there are 2n subsets.
For n columns there are 2n subsets.
For an n x n matrix there are 22n subsets.
0 elements is a valid subset but obviously if you have 0 rows or 0 columns the total is 0 so there are really 22n-2+1 subsets but that's no different.
So you can work out each combination by brute force as an O(an) algorithm. Fast. :)
It would be quicker to work out what the maximum possible value is and you do that by adding up all the positive numbers in the grid. If those numbers happen to form a valid sub-matrix (meaning you can create that set by removing rows and/or columns) then there's your answer.
Implicit in this is that if none of the numbers are negative then the complete matrix is, by definition, the answer.
Also, knowing what the highest possible maximum is possibly allows you to shortcut the brute force evaluation since if you get any combination equal to that maximum then that is your answer and you can stop checking.
Also if all the numbers are non-positive, the answer is the maximum value as you can reduce the matrix to a 1 x 1 matrix with that 1 value in it, by definition.
Here's an idea: construct 2n-1 n x m matrices where 1 <= m <= n. Process them one after the other. For each n x m matrix you can calculate:
The highest possible maximum sum (as per above); and
Whether no numbers are positive allowing you to shortcut the answer.
if (1) is below the currently calculate highest maximum sum then you can discard this n x m matrix. If (2) is true then you just need a simple comparison to the current highest maximum sum.
This is generally referred to as a pruning technique.
What's more you can start by saying that the highest number in the n x n matrix is the starting highest maximum sum since obviously it can be a 1 x 1 matrix.
I'm sure you could tweak this into a (slightly more) efficient recursive tree-based search algorithm with the above tests effectively allowing you to eliminate (hopefully many) unnecessary searches.

We can improve on Cletus's generalized brute-force solution by modelling this as a directed graph. The initial matrix is the start node of the graph; its leaves are all the matrices missing one row or column, and so forth. It's a graph rather than a tree, because the node for the matrix without both the first column and row will have two parents - the nodes with just the first column or row missing.
We can optimize our solution by turning the graph into a tree: There's never any point exploring a submatrix with a column or row deleted that comes before the one we deleted to get to the current node, as that submatrix will be arrived at anyway.
This is still a brute-force search, of course - but we've eliminated the duplicate cases where we remove the same rows in different orders.
Here's an example implementation in Python:
def maximize_sum(m):
frontier = [(m, 0, False)]
best = None
best_score = 0
while frontier:
current, startidx, cols_done = frontier.pop()
score = matrix_sum(current)
if score > best_score or not best:
best = current
best_score = score
w, h = matrix_size(current)
if not cols_done:
for x in range(startidx, w):
frontier.append((delete_column(current, x), x, False))
startidx = 0
for y in range(startidx, h):
frontier.append((delete_row(current, y), y, True))
return best_score, best
And here's the output on 280Z28's example matrix:
>>> m = ((1, 1, 3), (1, -89, 101), (1, 102, -99))
>>> maximize_sum(m)
(106, [(1, 3), (1, 101)])

Since nobody asked for an efficient algorithm, use brute force: generate every possible matrix that can be created by removing rows and/or columns from the original matrix, choose the best one. A slightly more efficent version, which most likely can be proved to still be correct, is to generate only those variants where the removed rows and columns contain at least one negative value.

To try it in a simple way:
We need the valid subset of the set of entries {A00, A01, A02, ..., A0n, A10, ...,Ann} which max. sum.
First compute all subsets (the power set).
A valid subset is a member of the power set that for each two contained entries Aij and A(i+x)(j+y), contains also the elements A(i+x)j and Ai(j+y) (which are the remaining corners of the rectangle spanned by Aij and A(i+x)(j+y)).
Aij ...
. .
. .
... A(i+x)(j+y)
By that you can eliminate the invalid ones from the power set and find the one with the biggest sum in the remaining.
I'm sure it can be improved by improving an algorithm for power set generation in order to generate only valid subsets and by that avoiding step 2 (adjusting the power set).

I think there are some angles of attack that might improve upon brute force.
memoization, since there are many distinct sequences of edits that will arrive at the same submatrix.
dynamic programming. Because the search space of matrices is highly redundant, my intuition is that there would be a DP formulation that can save a lot of repeated work
I think there's a heuristic approach, but I can't quite nail it down:
if there's one negative number, you can either take the matrix as it is, remove the column of the negative number, or remove its row; I don't think any other "moves" result in a higher sum. For two negative numbers, your options are: remove neither, remove one, remove the other, or remove both (where the act of removal is either by axing the row or the column).
Now suppose the matrix has only one positive number and the rest are all <=0. You clearly want to remove everything but the positive entry. For a matrix with only 2 positive entries and the rest <= 0, the options are: do nothing, whittle down to one, whittle down to the other, or whittle down to both (resulting in a 1x2, 2x1, or 2x2 matrix).
In general this last option falls apart (imagine a matrix with 50 positives & 50 negatives), but depending on your data (few negatives or few positives) it could provide a shortcut.

Create an n-by-1 vector RowSums, and an n-by-1 vector ColumnSums. Initialize them to the row and column sums of the original matrix. O(n²)
If any row or column has a negative sum, remove edit: the one with the minimum such and update the sums in the other direction to reflect their new values. O(n)
Stop when no row or column has a sum less than zero.
This is an iterative variation improving on another answer. It operates in O(n²) time, but fails for some cases mentioned in other answers, which is the complexity limit for this problem (there are n² entries in the matrix, and to even find the minimum you have to examine each cell once).
Edit: The following matrix has no negative rows or columns, but is also not maximized, and my algorithm doesn't catch it.
1 1 3 goal 1 3
1 -89 101 ===> 1 101
1 102 -99
The following matrix does have negative rows and columns, but my algorithm selects the wrong ones for removal.
-5 1 -5 goal 1
1 1 1 ===> 1
-10 2 -10 2
mine
===> 1 1 1

Compute the sum of each row and column. This can be done in O(m) (where m = n^2)
While there are rows or columns that sum to negative remove the row or column that has the lowest sum that is less than zero. Then recompute the sum of each row/column.
The general idea is that as long as there is a row or a column that sums to nevative, removing it will result in a greater overall value. You need to remove them one at a time and recompute because in removing that one row/column you are affecting the sums of the other rows/columns and they may or may not have negative sums any more.
This will produce an optimally maximum result. Runtime is O(mn) or O(n^3)

I cannot really produce an algorithm on top of my head, but to me it 'smells' like dynamic programming, if it serves as a start point.

Big Edit: I honestly don't think there's a way to assess a matrix and determine it is maximized, unless it is completely positive.
Maybe it needs to branch, and fathom all elimination paths. You never no when a costly elimination will enable a number of better eliminations later. We can short circuit if it's found the theoretical maximum, but other than any algorithm would have to be able to step forward and back. I've adapted my original solution to achieve this behaviour with recursion.
Double Secret Edit: It would also make great strides to reduce to complexity if each iteration didn't need to find all negative elements. Considering that they don't change much between calls, it makes more sense to just pass their positions to the next iteration.
Takes a matrix, the list of current negative elements in the matrix, and the theoretical maximum of the initial matrix. Returns the matrix's maximum sum and the list of moves required to get there. In my mind move list contains a list of moves denoting the row/column removed from the result of the previous operation.
Ie: r1,r1
Would translate
-1 1 0 1 1 1
-4 1 -4 5 7 1
1 2 4 ===>
5 7 1
Return if sum of matrix is the theoretical maximum
Find the positions of all negative elements unless an empty set was passed in.
Compute sum of matrix and store it along side an empty move list.
For negative each element:
Calculate the sum of that element's row and column.
clone the matrix and eliminate which ever collection has the minimum sum (row/column) from that clone, note that action as a move list.
clone the list of negative elements and remove any that are effected by the action taken in the previous step.
Recursively call this algorithm providing the cloned matrix, the updated negative element list and the theoretical maximum. Append the moves list returned to the move list for the action that produced the matrix passed to the recursive call.
If the returned value of the recursive call is greater than the stored sum, replace it and store the returned move list.
Return the stored sum and move list.
I'm not sure if it's better or worse than the brute force method, but it handles all the test cases now. Even those where the maximum contains negative values.

This is an optimization problem and can be solved approximately by an iterative algorithm based on simulated annealing:
Notation: C is number of columns.
For J iterations:
Look at each column and compute the absolute benefit of toggling it (turn it off if it's currently on or turn it on if it's currently off). That gives you C values, e.g. -3, 1, 4. A greedy deterministic solution would just pick the last action (toggle the last column to get a benefit of 4) because it locally improves the objective. But that might lock us into a local optimum. Instead, we probabilistically pick one of the three actions, with probabilities proportional to the benefits. To do this, transform them into a probability distribution by putting them through a Sigmoid function and normalizing. (Or use exp() instead of sigmoid()?) So for -3, 1, 4 you get 0.05, 0.73, 0.98 from the sigmoid and 0.03, 0.42, 0.56 after normalizing. Now pick the action according to the probability distribution, e.g. toggle the last column with probability 0.56, toggle the second column with probability 0.42, or toggle the first column with the tiny probability 0.03.
Do the same procedure for the rows, resulting in toggling one of the rows.
Iterate for J iterations until convergence.
We may also, in early iterations, make each of these probability distributions more uniform, so that we don't get locked into bad decisions early on. So we'd raise the unnormalized probabilities to a power 1/T, where T is high in early iterations and is slowly decreased until it approaches 0. For example, 0.05, 0.73, 0.98 from above, raised to 1/10 results in 0.74, 0.97, 1.0, which after normalization is 0.27, 0.36, 0.37 (so it's much more uniform than the original 0.05, 0.73, 0.98).

It's clearly NP-Complete (as outlined above). Given this, if I had to propose the best algorithm I could for the problem:
Try some iterations of quadratic integer programming, formulating the problem as: SUM_ij a_ij x_i y_j, with the x_i and y_j variables constrained to be either 0 or 1. For some matrices I think this will find a solution quickly, for the hardest cases it would be no better than brute force (and not much would be).
In parallel (and using most of the CPU), use a approximate search algorithm to generate increasingly better solutions. Simulating Annealing was suggested in another answer, but having done research on similar combinatorial optimisation problems, my experience is that tabu search would find good solutions faster. This is probably close to optimal in terms of wandering between distinct "potentially better" solutions in the shortest time, if you use the trick of incrementally updating the costs of single changes (see my paper "Graph domination, tabu search and the football pool problem").
Use the best solution so far from the second above to steer the first by avoiding searching possibilities that have lower bounds worse than it.
Obviously this isn't guaranteed to find the maximal solution. But, it generally would when this is feasible, and it would provide a very good locally maximal solution otherwise. If someone had a practical situation requiring such optimisation, this is the solution that I'd think would work best.
Stopping at identifying that a problem is likely to be NP-Complete will not look good in a job interview! (Unless the job is in complexity theory, but even then I wouldn't.) You need to suggest good approaches - that is the point of a question like this. To see what you can come up with under pressure, because the real world often requires tackling such things.

yes, it's NP-complete problem.
It's hard to easily find the best sub-matrix,but we can easily to find some better sub-matrix.
Assume that we give m random points in the matrix as "feeds". then let them to automatically extend by the rules like :
if add one new row or column to the feed-matrix, ensure that the sum will be incrementive.
,then we can compare m sub-matrix to find the best one.

Let's say n = 10.
Brute force (all possible sets of rows x all possible sets of columns) takes
2^10 * 2^10 =~ 1,000,000 nodes.
My first approach was to consider this a tree search, and use
the sum of positive entries is an upper bound for every node in the subtree
as a pruning method. Combined with a greedy algorithm to cheaply generate good initial bounds, this yielded answers in about 80,000 nodes on average.
but there is a better way ! i later realised that
Fix some choice of rows X.
Working out the optimal columns for this set of rows is now trivial (keep a column if its sum of its entries in the rows X is positive, otherwise discard it).
So we can just brute force over all possible choices of rows; this takes 2^10 = 1024 nodes.
Adding the pruning method brought this down to 600 nodes on average.
Keeping 'column-sums' and incrementally updating them when traversing the tree of row-sets should allow the calculations (sum of matrix etc) at each node to be O(n) instead of O(n^2). Giving a total complexity of O(n * 2^n)

For slightly less than optimal solution, I think this is a PTIME, PSPACE complexity issue.
The GREEDY algorithm could run as follows:
Load the matrix into memory and compute row totals. After that run the main loop,
1) Delete the smallest row,
2) Subtract the newly omitted values from the old row totals
--> Break when there are no more negative rows.
Point two is a subtle detail: subtracted two rows/columns has time complexity n.
While re-summing all but two columns has n^2 time complexity!

Take each row and each column and compute the sum. For a 2x2 matrix this will be:
2 1
3 -10
Row(0) = 3
Row(1) = -7
Col(0) = 5
Col(1) = -9
Compose a new matrix
Cost to take row Cost to take column
3 5
-7 -9
Take out whatever you need to, then start again.
You just look for negative values on the new matrix. Those are values that actually substract from the overall matrix value. It terminates when there're no more negative "SUMS" values to take out (therefore all columns and rows SUM something to the final result)
In an nxn matrix that would be O(n^2)Log(n) I think

function pruneMatrix(matrix) {
max = -inf;
bestRowBitField = null;
bestColBitField = null;
for(rowBitField=0; rowBitField<2^matrix.height; rowBitField++) {
for (colBitField=0; colBitField<2^matrix.width; colBitField++) {
sum = calcSum(matrix, rowBitField, colBitField);
if (sum > max) {
max = sum;
bestRowBitField = rowBitField;
bestColBitField = colBitField;
}
}
}
return removeFieldsFromMatrix(bestRowBitField, bestColBitField);
}
function calcSumForCombination(matrix, rowBitField, colBitField) {
sum = 0;
for(i=0; i<matrix.height; i++) {
for(j=0; j<matrix.width; j++) {
if (rowBitField & 1<<i && colBitField & 1<<j) {
sum += matrix[i][j];
}
}
}
return sum;
}

Sort numbers by sum algorithm

I have a language-agnostic question about an algorithm.
This comes from a (probably simple) programming challenge I read. The problem is, I'm too stupid to figure it out, and curious enough that it is bugging me.
The goal is to sort a list of integers to ascending order by swapping the positions of numbers in the list. Each time you swap two numbers, you have to add their sum to a running total. The challenge is to produce the sorted list with the smallest possible running total.
Examples:
3 2 1 - 4
1 8 9 7 6 - 41
8 4 5 3 2 7 - 34
Though you are free to just give the answer if you want, if you'd rather offer a "hint" in the right direction (if such a thing is possible), I would prefer that.

Only read the first two paragraph is you just want a hint. There is a an efficient solution to this (unless I made a mistake of course). First sort the list. Now we can write the original list as a list of products of disjoint cycles.
For example 5,3,4,2,1 has two cycles, (5,1) and (3,4,2). The cycle can be thought of as starting at 3, 4 is in 3's spot, 2 is in 4's spot, and 4 is in 3's. spot. The end goal is 1,2,3,4,5 or (1)(2)(3)(4)(5), five disjoint cycles.
If we switch two elements from different cycles, say 1 and 3 then we get: 5,1,4,2,3 and in cycle notation (1,5,3,4,2). The two cycles are joined into one cycle, this is the opposite of what we want to do.
If we switch two elements from the same cycle, say 3 and 4 then we get: 5,4,3,2,1 in cycle notation (5,1)(2,4)(3). The one cycle is split into two smaller cycles. This gets us closer to the goal of all cycles of length 1. Notice that any switch of two elements in the same cycle splits the cycle into two cycles.
If we can figure out the optimal algorithm for switching one cycle we can apply that for all cycles and get an optimal algorithm for the entire sort. One algorithm is to take the minimum element in the cycle and switch it with the the whose position it is in. So for (3,4,2) we would switch 2 with 4. This leaves us with a cycle of length 1 (the element just switched into the correct position) and a cycle of size one smaller than before. We can then apply the rule again. This algorithm switches the smallest element cycle length -1 times and every other element once.
To transform a cycle of length n into cycles of length 1 takes n - 1 operations. Each element must be operated on at least once (think about each element to be sorted, it has to be moved to its correct position). The algorithm I proposed operates on each element once, which all algorithms must do, then every other operation was done on the minimal element. No algorithm can do better.
This algorithm takes O(n log n) to sort then O(n) to mess with cycles. Solving one cycle takes O(cycle length), the total length of all cycles is n so cost of the cycle operations is O(n). The final run time is O(n log n).

I'm assuming memory is free and you can simulate the sort before performing it on the real objects.
One approach (that is likely not the fastest) is to maintain a priority queue. Each node in the queue is keyed by the swap cost to get there and it contains the current item ordering and the sequence of steps to achieve that ordering. For example, initially it would contain a 0-cost node with the original data ordering and no steps.
Run a loop that dequeues the lowest-cost queue item, and enqueues all possible single-swap steps starting at that point. Keep running the loop until the head of the queue has a sorted list.

I did a few attempts at solving one of the examples by hand:
1 8 9 7 6
6 8 9 7 1 (+6+1=7)
6 8 1 7 9 (7+1+9=17)
6 8 7 1 9 (17+1+7=25)
6 1 7 8 9 (25+1+8=34)
1 6 7 8 9 (34+1+6=41)
Since you needed to displace the 1, it seems that you may have to do an exhaustive search to complete the problem - the details of which were already posted by another user. Note that you will encounter problems if the dataset is large when doing this method.
If the problem allows for "close" answers, you can simply make a greedy algorithm that puts the largest item into position - either doing so directly, or by swapping the smallest element into that slot first.

Comparisons and traversals apparently come for free, you can pre-calculate the "distance" a number must travel (and effectively the final sort order). The puzzle is the swap algorithm.
Minimizing overall swaps is obviously important.
Minimizing swaps of larger numbers is also important.
I'm pretty sure an optimal swap process cannot be guaranteed by evaluating each ordering in a stateless fashion, although you might frequently come close (not the challenge).

I think there is no trivial solution to this problem, and my approach is likely no better than the priority queue approach.
Find the smallest number, N.
Any pairs of numbers that occupy each others' desired locations should be swapped, except for N.
Assemble (by brute force) a collection of every set of numbers that can be mutually swapped into their desired locations, such that the cost of sorting the set amongst itself is less than the cost of swapping every element of the set with N.
These sets will comprise a number of cycles. Swap within those cycles in such a way that the smallest number is swapped twice.
Swap all remaining numbers, which comprise a cycle including N, using N as a placeholder.

As a hint, this reeks of dynamic programming; that might not be precise enough a hint to help, but I'd rather start with too little!

You are charged by the number of swaps, not by the number of comparisons. Nor did you mention being charged for keeping other records.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio