Better results in set partition than by differencing

Better results in set partition than by differencing - algorithm

Partition problem is known to be NP-hard. Depending on the particular instance of the problem we can try dynamic programming or some heuristics like differencing (also known as Karmarkar-Karp algorithm).
The latter seems to be very useful for the instances with big numbers (what makes dynamic programming intractable), however not always perfect. What is an efficient way to find a better solution (random, tabu search, other approximations)?
PS: The question has some story behind it. There is a challenge Johnny Goes Shopping available at SPOJ since July 2004. Till now, the challenge has been solved by 1087 users, but only 11 of them scored better than correct Karmarkar-Karp algorithm implementation (with current scoring, Karmarkar-Karp gives 11.796614 points). How to do better? (Answers supported by accepted submission most wanted but please do not reveal your code.)

There are many papers describing various advanced algorithms for set partitioning. Here are only two of them:
"A complete anytime algorithm for number partitioning" by Richard E. Korf.
"An efficient fully polynomial approximation scheme for the Subset-Sum Problem" by Hans Kellerer et al.
Honestly, I don't know which of them gives more efficient solution. Probably neither of these advanced algorithms are needed to solve that SPOJ problem. Korf's paper is still very useful. Algorithms described there are very simple (to understand and implement). Also he overviews several even simpler algorithms (in section 2). So if you want to know the details of Horowitz-Sahni or Schroeppel-Shamir methods (mentioned below), you can find them in Korf's paper. Also (in section 8) he writes that stochastic approaches do not guarantee good enough solutions. So it is unlikely you get significant improvements with something like hill climbing, simulated annealing, or tabu search.
I tried several simple algorithms and their combinations to solve partitioning problems with size up to 10000, maximum value up to 1014, and time limit 4 sec. They were tested on random uniformly distributed numbers. And optimal solution was found for every problem instance I tried. For some problem instances optimality is guaranteed by algorithm, for others optimality is not 100% guaranteed, but probability of getting sub-optimal solution is very small.
For sizes up to 4 (green area to the left) Karmarkar-Karp algorithm always gives optimal result.
For sizes up to 54 a brute force algorithm is fast enough (red area). There is a choice between Horowitz-Sahni or Schroeppel-Shamir algorithms. I used Horowitz-Sahni because it seems more efficient for given limits. Schroeppel-Shamir uses much less memory (everything fits in L2 cache), so it may be preferable when other CPU cores perform some memory-intensive tasks or to do set partitioning using multiple threads. Or to solve bigger problems with not as strict time limit (where Horowitz-Sahni just runs out of memory).
When size multiplied by sum of all values is less than 5*109 (blue area), dynamic programming approach is applicable. Border between brute force and dynamic programming areas on diagram shows where each algorithm performs better.
Green area to the right is the place where Karmarkar-Karp algorithm gives optimal result with almost 100% probability. Here there are so many perfect partitioning options (with delta 0 or 1) that Karmarkar-Karp algorithm almost certainly finds one of them. It is possible to invent data set where Karmarkar-Karp always gives sub-optimal result. For example {17 13 10 10 10 ...}. If you multiply this to some large number, neither KK nor DP would be able to find optimal solution. Fortunately such data sets are very unlikely in practice. But problem setter could add such data set to make contest more difficult. In this case you can choose some advanced algorithm for better results (but only for grey and right green areas on diagram).
I tried 2 ways to implement Karmarkar-Karp algorithm's priority queue: with max heap and with sorted array. Sorted array option appears to be slightly faster with linear search and significantly faster with binary search.
Yellow area is the place where you can choose between guaranteed optimal result (with DP) or just optimal result with high probability (with Karmarkar-Karp).
Finally, grey area, where neither of simple algorithms by itself gives optimal result. Here we could use Karmarkar-Karp to pre-process data until it is applicable to either Horowitz-Sahni or dynamic programming. In this place there are also many perfect partitioning options, but less than in green area, so Karmarkar-Karp by itself could sometimes miss proper partitioning. Update: As noted by #mhum, it is not necessary to implement dynamic programming algorithm to make things working. Horowitz-Sahni with Karmarkar-Karp pre-processing is enough. But it is essential for Horowitz-Sahni algorithm to work on sizes up to 54 in said time limit to (almost) guarantee optimal partitioning. So C++ or other language with good optimizing compiler and fast computer are preferable.
Here is how I combined Karmarkar-Karp with other algorithms:
template<bool Preprocess = false>
i64 kk(const vector<i64>& values, i64 sum, Log& log)
{
log.name("Karmarkar-Karp");
vector<i64> pq(values.size() * 2);
copy(begin(values), end(values), begin(pq) + values.size());
sort(begin(pq) + values.size(), end(pq));
auto first = end(pq);
auto last = begin(pq) + values.size();
while (first - last > 1)
{
if (Preprocess && first - last <= kHSLimit)
{
hs(last, first, sum, log);
return 0;
}
if (Preprocess && static_cast<double>(first - last) * sum <= kDPLimit)
{
dp(last, first, sum, log);
return 0;
}
const auto diff = *(first - 1) - *(first - 2);
sum -= *(first - 2) * 2;
first -= 2;
const auto place = lower_bound(last, first, diff);
--last;
copy(last + 1, place, last);
*(place - 1) = diff;
}
const auto result = (first - last)? *last: 0;
log(result);
return result;
}
Link to full C++11 implementation. This program only determines difference between partition sums, it does not report the partitions themselves. Warning: if you want to run it on a computer with less than 1 Gb free memory, decrease kHSLimit constant.

For whatever it's worth, a straightforward, unoptimized Python implementation of the "complete Karmarkar Karp" (CKK) search procedure in [Korf88] -- modified only slightly to bail out of the search after a given time limit (say, 4.95 seconds) and return the best solution found so far -- is sufficient to score 14.204234 on the SPOJ problem, beating the score for Karmarkar-Karp. As of this writing, this is #3 on the rankings (see Edit #2 below)
A slightly more readable presentation of Korf's CKK algorithm can be found in [Mert99].
EDIT #2 - I've implemented Evgeny Kluev's hybrid heuristic of applying Karmarkar-Karp until the list of numbers is below some threshold and then switching over to the exact Horowitz-Sahni subset enumeration method [HS74] (a concise description may be found in [Korf88]). As suspected, my Python implementation required lowering the switchover threshold versus his C++ implementation. With some trial and error, I found that a threshold of 37 was the maximum that allowed my program to finish within the time limit. Yet, even at that lower threshold, I was able to achieve a score of 15.265633, good enough for second place.
I further attempted to incorporate this hybrid KK/HS method into the CKK tree search, basically by using HS as a very aggressive and expensive pruning strategy. In plain CKK, I was unable to find a switchover threshold that even matched the KK/HS method. However, using the ILDS (see below) search strategy for CKK and HS (with a threshold of 25) to prune, I was able to yield a very small gain over the previous score, up to 15.272802. It probably should not be surprising that CKK+ILDS would outperform plain CKK in this context since it would, by design, provide a greater diversity of inputs to the HS phase.
EDIT #1 -
I've tried two further refinements to the base CKK algorithm:
"Improved Limited Discrepancy Search" (ILDS) [Korf96] This is an alternative to the natural DFS ordering of paths within the search tree. It has a tendency to explore more diverse solutions earlier on than regular Depth-First Search.
"Speeding up 2-Way Number Partitioning" [Cerq12] This generalizes one of the pruning criteria in CKK from nodes within 4 levels of the leaf nodes to nodes within 5, 6, and 7 levels above leaf nodes.
In my test cases, both of these refinements generally provided noticeable benefits over the original CKK in reducing the number of nodes explored (in the case of the latter) and in arriving at better solutions sooner (in the case of the former). However, within the confines of the SPOJ problem structure, neither of these were sufficient to improve my score.
Given the idiosyncratic nature of this SPOJ problem (i.e.: 5-second time limit and only one specific and undisclosed problem instance), it is hard to give advice on what may actually improve the score*. For example, should we continue to pursue alternate search ordering strategies (e.g.: many of the papers by Wheeler Ruml listed here)? Or should we try incorporating some form of local improvement heuristic to solutions found by CKK in order to help pruning? Or maybe we should abandon CKK-based approaches altogether and try for a dynamic programming approach? How about a PTAS? Without knowing more about the specific shape of the instance used in the SPOJ problem, it's very difficult to guess at what kind of approach would yield the most benefit. Each one has its strengths and weaknesses, depending on the specific properties of a given instance.
* Aside from simply running the same thing faster, say, by implementing in C++ instead of Python.
References
[Cerq12] Cerquides, Jesús, and Pedro Meseguer. "Speeding Up 2-way Number Partitioning." ECAI. 2012, doi:10.3233/978-1-61499-098-7-223
[HS74] Horowitz, Ellis, and Sartaj Sahni. "Computing partitions with applications to the knapsack problem." Journal of the ACM (JACM) 21.2 (1974): 277-292.
[Korf88] Korf, Richard E. (1998), "A complete anytime algorithm for number partitioning", Artificial Intelligence 106 (2): 181–203, doi:10.1016/S0004-3702(98)00086-1,
[Korf96] Korf, Richard E. "Improved limited discrepancy search." AAAI/IAAI, Vol. 1. 1996.
[Mert99] Mertens, Stephan (1999), A complete anytime algorithm for balanced number partitioning, arXiv:cs/9903011

EDIT Here's a implementation that starts with Karmarkar-Karp differencing then tries to optimize the resulting partitions.
The only optimizations that time allows are giving 1 from one partition to the other and swapping 1 for 1 between both partitions.
My implementation of Karmarkar-Karp at the beginning must be inaccurate since the resulting score with just Karmarkar-Karp is 2.711483 not 11.796614 points cited by OP. The score goes to 7.718049 when the optimizations are used.
SPOILER WARNING C# submission code follows
using System;
using System.Collections.Generic;
using System.Linq;
public class Test
{
// some comparer's to lazily avoid using a proper max-heap implementation
public class Index0 : IComparer<long[]>
{
public int Compare(long[] x, long[] y)
{
if(x[0] == y[0]) return 0;
return x[0] < y[0] ? -1 : 1;
}
public static Index0 Inst = new Index0();
}
public class Index1 : IComparer<long[]>
{
public int Compare(long[] x, long[] y)
{
if(x[1] == y[1]) return 0;
return x[1] < y[1] ? -1 : 1;
}
}
public static void Main()
{
// load the data
var start = DateTime.Now;
var list = new List<long[]>();
int size = int.Parse(Console.ReadLine());
for(int i=1; i<=size; i++) {
var tuple = new long[]{ long.Parse(Console.ReadLine()), i };
list.Add(tuple);
}
list.Sort((x, y) => { if(x[0] == y[0]) return 0; return x[0] < y[0] ? -1 : 1; });
// Karmarkar-Karp differences
List<long[]> diffs = new List<long[]>();
while(list.Count > 1) {
// get max
var b = list[list.Count - 1];
list.RemoveAt(list.Count - 1);
// get max
var a = list[list.Count - 1];
list.RemoveAt(list.Count - 1);
// (b - a)
var diff = b[0] - a[0];
var tuple = new long[]{ diff, -1 };
diffs.Add(new long[] { a[0], b[0], diff, a[1], b[1] });
// insert (b - a) back in
var fnd = list.BinarySearch(tuple, new Index0());
list.Insert(fnd < 0 ? ~fnd : fnd, tuple);
}
var approx = list[0];
list.Clear();
// setup paritions
var listA = new List<long[]>();
var listB = new List<long[]>();
long sumA = 0;
long sumB = 0;
// Karmarkar-Karp rebuild partitions from differences
bool toggle = false;
for(int i=diffs.Count-1; i>=0; i--) {
var inB = listB.BinarySearch(new long[]{diffs[i][2]}, Index0.Inst);
var inA = listA.BinarySearch(new long[]{diffs[i][2]}, Index0.Inst);
if(inB >= 0 && inA >= 0) {
toggle = !toggle;
}
if(toggle == false) {
if(inB >= 0) {
listB.RemoveAt(inB);
}else if(inA >= 0) {
listA.RemoveAt(inA);
}
var tb = new long[]{diffs[i][1], diffs[i][4]};
var ta = new long[]{diffs[i][0], diffs[i][3]};
var fb = listB.BinarySearch(tb, Index0.Inst);
var fa = listA.BinarySearch(ta, Index0.Inst);
listB.Insert(fb < 0 ? ~fb : fb, tb);
listA.Insert(fa < 0 ? ~fa : fa, ta);
} else {
if(inA >= 0) {
listA.RemoveAt(inA);
}else if(inB >= 0) {
listB.RemoveAt(inB);
}
var tb = new long[]{diffs[i][1], diffs[i][4]};
var ta = new long[]{diffs[i][0], diffs[i][3]};
var fb = listA.BinarySearch(tb, Index0.Inst);
var fa = listB.BinarySearch(ta, Index0.Inst);
listA.Insert(fb < 0 ? ~fb : fb, tb);
listB.Insert(fa < 0 ? ~fa : fa, ta);
}
}
listA.ForEach(a => sumA += a[0]);
listB.ForEach(b => sumB += b[0]);
// optimize our partitions with give/take 1 or swap 1 for 1
bool change = false;
while(DateTime.Now.Subtract(start).TotalSeconds < 4.8) {
change = false;
// give one from A to B
for(int i=0; i<listA.Count; i++) {
var a = listA[i];
if(Math.Abs(sumA - sumB) > Math.Abs((sumA - a[0]) - (sumB + a[0]))) {
var fb = listB.BinarySearch(a, Index0.Inst);
listB.Insert(fb < 0 ? ~fb : fb, a);
listA.RemoveAt(i);
i--;
sumA -= a[0];
sumB += a[0];
change = true;
} else {break;}
}
// give one from B to A
for(int i=0; i<listB.Count; i++) {
var b = listB[i];
if(Math.Abs(sumA - sumB) > Math.Abs((sumA + b[0]) - (sumB - b[0]))) {
var fa = listA.BinarySearch(b, Index0.Inst);
listA.Insert(fa < 0 ? ~fa : fa, b);
listB.RemoveAt(i);
i--;
sumA += b[0];
sumB -= b[0];
change = true;
} else {break;}
}
// swap 1 for 1
for(int i=0; i<listA.Count; i++) {
var a = listA[i];
for(int j=0; j<listB.Count; j++) {
var b = listB[j];
if(Math.Abs(sumA - sumB) > Math.Abs((sumA - a[0] + b[0]) - (sumB -b[0] + a[0]))) {
listA.RemoveAt(i);
listB.RemoveAt(j);
var fa = listA.BinarySearch(b, Index0.Inst);
var fb = listB.BinarySearch(a, Index0.Inst);
listA.Insert(fa < 0 ? ~fa : fa, b);
listB.Insert(fb < 0 ? ~fb : fb, a);
sumA = sumA - a[0] + b[0];
sumB = sumB - b[0] + a[0];
change = true;
break;
}
}
}
//
if(change == false) { break; }
}
/*
// further optimization with 2 for 1 swaps
while(DateTime.Now.Subtract(start).TotalSeconds < 4.8) {
change = false;
// trade 2 for 1
for(int i=0; i<listA.Count >> 1; i++) {
var a1 = listA[i];
var a2 = listA[listA.Count - 1 - i];
for(int j=0; j<listB.Count; j++) {
var b = listB[j];
if(Math.Abs(sumA - sumB) > Math.Abs((sumA - a1[0] - a2[0] + b[0]) - (sumB - b[0] + a1[0] + a2[0]))) {
listA.RemoveAt(listA.Count - 1 - i);
listA.RemoveAt(i);
listB.RemoveAt(j);
var fa = listA.BinarySearch(b, Index0.Inst);
var fb1 = listB.BinarySearch(a1, Index0.Inst);
var fb2 = listB.BinarySearch(a2, Index0.Inst);
listA.Insert(fa < 0 ? ~fa : fa, b);
listB.Insert(fb1 < 0 ? ~fb1 : fb1, a1);
listB.Insert(fb2 < 0 ? ~fb2 : fb2, a2);
sumA = sumA - a1[0] - a2[0] + b[0];
sumB = sumB - b[0] + a1[0] + a2[0];
change = true;
break;
}
}
}
//
if(DateTime.Now.Subtract(start).TotalSeconds > 4.8) { break; }
// trade 2 for 1
for(int i=0; i<listB.Count >> 1; i++) {
var b1 = listB[i];
var b2 = listB[listB.Count - 1 - i];
for(int j=0; j<listA.Count; j++) {
var a = listA[j];
if(Math.Abs(sumA - sumB) > Math.Abs((sumA - a[0] + b1[0] + b2[0]) - (sumB - b1[0] - b2[0] + a[0]))) {
listB.RemoveAt(listB.Count - 1 - i);
listB.RemoveAt(i);
listA.RemoveAt(j);
var fa1 = listA.BinarySearch(b1, Index0.Inst);
var fa2 = listA.BinarySearch(b2, Index0.Inst);
var fb = listB.BinarySearch(a, Index0.Inst);
listA.Insert(fa1 < 0 ? ~fa1 : fa1, b1);
listA.Insert(fa2 < 0 ? ~fa2 : fa2, b2);
listB.Insert(fb < 0 ? ~fb : fb, a);
sumA = sumA - a[0] + b1[0] + b2[0];
sumB = sumB - b1[0] - b2[0] + a[0];
change = true;
break;
}
}
}
//
if(change == false) { break; }
}
*/
// output the correct ordered values
listA.Sort(new Index1());
foreach(var t in listA) {
Console.WriteLine(t[1]);
}
// DEBUG/TESTING
//Console.WriteLine(approx[0]);
//foreach(var t in listA) Console.Write(": " + t[0] + "," + t[1]);
//Console.WriteLine();
//foreach(var t in listB) Console.Write(": " + t[0] + "," + t[1]);
}
}

Related

Efficient algorithm to search a element in rectangular Young Tableau [duplicate]

I was recently given this interview question and I'm curious what a good solution to it would be.
Say I'm given a 2d array where all the
numbers in the array are in increasing
order from left to right and top to
bottom.
What is the best way to search and
determine if a target number is in the
array?
Now, my first inclination is to utilize a binary search since my data is sorted. I can determine if a number is in a single row in O(log N) time. However, it is the 2 directions that throw me off.
Another solution I thought may work is to start somewhere in the middle. If the middle value is less than my target, then I can be sure it is in the left square portion of the matrix from the middle. I then move diagonally and check again, reducing the size of the square that the target could potentially be in until I have honed in on the target number.
Does anyone have any good ideas on solving this problem?
Example array:
Sorted left to right, top to bottom.
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11

Here's a simple approach:
Start at the bottom-left corner.
If the target is less than that value, it must be above us, so move up one.
Otherwise we know that the target can't be in that column, so move right one.
Goto 2.
For an NxM array, this runs in O(N+M). I think it would be difficult to do better. :)
Edit: Lots of good discussion. I was talking about the general case above; clearly, if N or M are small, you could use a binary search approach to do this in something approaching logarithmic time.
Here are some details, for those who are curious:
History
This simple algorithm is called a Saddleback Search. It's been around for a while, and it is optimal when N == M. Some references:
David Gries, The Science of Programming. Springer-Verlag, 1989.
Edsgar Dijkstra, The Saddleback Search. Note EWD-934, 1985.
However, when N < M, intuition suggests that binary search should be able to do better than O(N+M): For example, when N == 1, a pure binary search will run in logarithmic rather than linear time.
Worst-case bound
Richard Bird examined this intuition that binary search could improve the Saddleback algorithm in a 2006 paper:
Richard S. Bird, Improving Saddleback Search: A Lesson in Algorithm Design, in Mathematics of Program Construction, pp. 82--89, volume 4014, 2006.
Using a rather unusual conversational technique, Bird shows us that for N <= M, this problem has a lower bound of Ω(N * log(M/N)). This bound make sense, as it gives us linear performance when N == M and logarithmic performance when N == 1.
Algorithms for rectangular arrays
One approach that uses a row-by-row binary search looks like this:
Start with a rectangular array where N < M. Let's say N is rows and M is columns.
Do a binary search on the middle row for value. If we find it, we're done.
Otherwise we've found an adjacent pair of numbers s and g, where s < value < g.
The rectangle of numbers above and to the left of s is less than value, so we can eliminate it.
The rectangle below and to the right of g is greater than value, so we can eliminate it.
Go to step (2) for each of the two remaining rectangles.
In terms of worst-case complexity, this algorithm does log(M) work to eliminate half the possible solutions, and then recursively calls itself twice on two smaller problems. We do have to repeat a smaller version of that log(M) work for every row, but if the number of rows is small compared to the number of columns, then being able to eliminate all of those columns in logarithmic time starts to become worthwhile.
This gives the algorithm a complexity of T(N,M) = log(M) + 2 * T(M/2, N/2), which Bird shows to be O(N * log(M/N)).
Another approach posted by Craig Gidney describes an algorithm similar the approach above: it examines a row at a time using a step size of M/N. His analysis shows that this results in O(N * log(M/N)) performance as well.
Performance Comparison
Big-O analysis is all well and good, but how well do these approaches work in practice? The chart below examines four algorithms for increasingly "square" arrays:
(The "naive" algorithm simply searches every element of the array. The "recursive" algorithm is described above. The "hybrid" algorithm is an implementation of Gidney's algorithm. For each array size, performance was measured by timing each algorithm over fixed set of 1,000,000 randomly-generated arrays.)
Some notable points:
As expected, the "binary search" algorithms offer the best performance on rectangular arrays and the Saddleback algorithm works the best on square arrays.
The Saddleback algorithm performs worse than the "naive" algorithm for 1-d arrays, presumably because it does multiple comparisons on each item.
The performance hit that the "binary search" algorithms take on square arrays is presumably due to the overhead of running repeated binary searches.
Summary
Clever use of binary search can provide O(N * log(M/N) performance for both rectangular and square arrays. The O(N + M) "saddleback" algorithm is much simpler, but suffers from performance degradation as arrays become increasingly rectangular.

This problem takes Θ(b lg(t)) time, where b = min(w,h) and t=b/max(w,h). I discuss the solution in this blog post.
Lower bound
An adversary can force an algorithm to make Ω(b lg(t)) queries, by restricting itself to the main diagonal:
Legend: white cells are smaller items, gray cells are larger items, yellow cells are smaller-or-equal items and orange cells are larger-or-equal items. The adversary forces the solution to be whichever yellow or orange cell the algorithm queries last.
Notice that there are b independent sorted lists of size t, requiring Ω(b lg(t)) queries to completely eliminate.
Algorithm
(Assume without loss of generality that w >= h)
Compare the target item against the cell t to the left of the top right corner of the valid area
If the cell's item matches, return the current position.
If the cell's item is less than the target item, eliminate the remaining t cells in the row with a binary search. If a matching item is found while doing this, return with its position.
Otherwise the cell's item is more than the target item, eliminating t short columns.
If there's no valid area left, return failure
Goto step 2
Finding an item:
Determining an item doesn't exist:
Legend: white cells are smaller items, gray cells are larger items, and the green cell is an equal item.
Analysis
There are b*t short columns to eliminate. There are b long rows to eliminate. Eliminating a long row costs O(lg(t)) time. Eliminating t short columns costs O(1) time.
In the worst case we'll have to eliminate every column and every row, taking time O(lg(t)*b + b*t*1/t) = O(b lg(t)).
Note that I'm assuming lg clamps to a result above 1 (i.e. lg(x) = log_2(max(2,x))). That's why when w=h, meaning t=1, we get the expected bound of O(b lg(1)) = O(b) = O(w+h).
Code
public static Tuple<int, int> TryFindItemInSortedMatrix<T>(this IReadOnlyList<IReadOnlyList<T>> grid, T item, IComparer<T> comparer = null) {
if (grid == null) throw new ArgumentNullException("grid");
comparer = comparer ?? Comparer<T>.Default;
// check size
var width = grid.Count;
if (width == 0) return null;
var height = grid[0].Count;
if (height < width) {
var result = grid.LazyTranspose().TryFindItemInSortedMatrix(item, comparer);
if (result == null) return null;
return Tuple.Create(result.Item2, result.Item1);
}
// search
var minCol = 0;
var maxRow = height - 1;
var t = height / width;
while (minCol < width && maxRow >= 0) {
// query the item in the minimum column, t above the maximum row
var luckyRow = Math.Max(maxRow - t, 0);
var cmpItemVsLucky = comparer.Compare(item, grid[minCol][luckyRow]);
if (cmpItemVsLucky == 0) return Tuple.Create(minCol, luckyRow);
// did we eliminate t rows from the bottom?
if (cmpItemVsLucky < 0) {
maxRow = luckyRow - 1;
continue;
}
// we eliminated most of the current minimum column
// spend lg(t) time eliminating rest of column
var minRowInCol = luckyRow + 1;
var maxRowInCol = maxRow;
while (minRowInCol <= maxRowInCol) {
var mid = minRowInCol + (maxRowInCol - minRowInCol + 1) / 2;
var cmpItemVsMid = comparer.Compare(item, grid[minCol][mid]);
if (cmpItemVsMid == 0) return Tuple.Create(minCol, mid);
if (cmpItemVsMid > 0) {
minRowInCol = mid + 1;
} else {
maxRowInCol = mid - 1;
maxRow = mid - 1;
}
}
minCol += 1;
}
return null;
}

I would use the divide-and-conquer strategy for this problem, similar to what you suggested, but the details are a bit different.
This will be a recursive search on subranges of the matrix.
At each step, pick an element in the middle of the range. If the value found is what you are seeking, then you're done.
Otherwise, if the value found is less than the value that you are seeking, then you know that it is not in the quadrant above and to the left of your current position. So recursively search the two subranges: everything (exclusively) below the current position, and everything (exclusively) to the right that is at or above the current position.
Otherwise, (the value found is greater than the value that you are seeking) you know that it is not in the quadrant below and to the right of your current position. So recursively search the two subranges: everything (exclusively) to the left of the current position, and everything (exclusively) above the current position that is on the current column or a column to the right.
And ba-da-bing, you found it.
Note that each recursive call only deals with the current subrange only, not (for example) ALL rows above the current position. Just those in the current subrange.
Here's some pseudocode for you:
bool numberSearch(int[][] arr, int value, int minX, int maxX, int minY, int maxY)
if (minX == maxX and minY == maxY and arr[minX,minY] != value)
return false
if (arr[minX,minY] > value) return false; // Early exits if the value can't be in
if (arr[maxX,maxY] < value) return false; // this subrange at all.
int nextX = (minX + maxX) / 2
int nextY = (minY + maxY) / 2
if (arr[nextX,nextY] == value)
{
print nextX,nextY
return true
}
else if (arr[nextX,nextY] < value)
{
if (numberSearch(arr, value, minX, maxX, nextY + 1, maxY))
return true
return numberSearch(arr, value, nextX + 1, maxX, minY, nextY)
}
else
{
if (numberSearch(arr, value, minX, nextX - 1, minY, maxY))
return true
reutrn numberSearch(arr, value, nextX, maxX, minY, nextY)
}

The two main answers give so far seem to be the arguably O(log N) "ZigZag method" and the O(N+M) Binary Search method. I thought I'd do some testing comparing the two methods with some various setups. Here are the details:
The array is N x N square in every test, with N varying from 125 to 8000 (the largest my JVM heap could handle). For each array size, I picked a random place in the array to put a single 2. I then put a 3 everywhere possible (to the right and below of the 2) and then filled the rest of the array with 1. Some of the earlier commenters seemed to think this type of setup would yield worst case run time for both algorithms. For each array size, I picked 100 different random locations for the 2 (search target) and ran the test. I recorded avg run time and worst case run time for each algorithm. Because it was happening too fast to get good ms readings in Java, and because I don't trust Java's nanoTime(), I repeated each test 1000 times just to add a uniform bias factor to all the times. Here are the results:
ZigZag beat binary in every test for both avg and worst case times, however, they are all within an order of magnitude of each other more or less.
Here is the Java code:
public class SearchSortedArray2D {
static boolean findZigZag(int[][] a, int t) {
int i = 0;
int j = a.length - 1;
while (i <= a.length - 1 && j >= 0) {
if (a[i][j] == t) return true;
else if (a[i][j] < t) i++;
else j--;
}
return false;
}
static boolean findBinarySearch(int[][] a, int t) {
return findBinarySearch(a, t, 0, 0, a.length - 1, a.length - 1);
}
static boolean findBinarySearch(int[][] a, int t,
int r1, int c1, int r2, int c2) {
if (r1 > r2 || c1 > c2) return false;
if (r1 == r2 && c1 == c2 && a[r1][c1] != t) return false;
if (a[r1][c1] > t) return false;
if (a[r2][c2] < t) return false;
int rm = (r1 + r2) / 2;
int cm = (c1 + c2) / 2;
if (a[rm][cm] == t) return true;
else if (a[rm][cm] > t) {
boolean b1 = findBinarySearch(a, t, r1, c1, r2, cm - 1);
boolean b2 = findBinarySearch(a, t, r1, cm, rm - 1, c2);
return (b1 || b2);
} else {
boolean b1 = findBinarySearch(a, t, r1, cm + 1, rm, c2);
boolean b2 = findBinarySearch(a, t, rm + 1, c1, r2, c2);
return (b1 || b2);
}
}
static void randomizeArray(int[][] a, int N) {
int ri = (int) (Math.random() * N);
int rj = (int) (Math.random() * N);
a[ri][rj] = 2;
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
if (i == ri && j == rj) continue;
else if (i > ri || j > rj) a[i][j] = 3;
else a[i][j] = 1;
}
}
}
public static void main(String[] args) {
int N = 8000;
int[][] a = new int[N][N];
int randoms = 100;
int repeats = 1000;
long start, end, duration;
long zigMin = Integer.MAX_VALUE, zigMax = Integer.MIN_VALUE;
long binMin = Integer.MAX_VALUE, binMax = Integer.MIN_VALUE;
long zigSum = 0, zigAvg;
long binSum = 0, binAvg;
for (int k = 0; k < randoms; k++) {
randomizeArray(a, N);
start = System.currentTimeMillis();
for (int i = 0; i < repeats; i++) findZigZag(a, 2);
end = System.currentTimeMillis();
duration = end - start;
zigSum += duration;
zigMin = Math.min(zigMin, duration);
zigMax = Math.max(zigMax, duration);
start = System.currentTimeMillis();
for (int i = 0; i < repeats; i++) findBinarySearch(a, 2);
end = System.currentTimeMillis();
duration = end - start;
binSum += duration;
binMin = Math.min(binMin, duration);
binMax = Math.max(binMax, duration);
}
zigAvg = zigSum / randoms;
binAvg = binSum / randoms;
System.out.println(findZigZag(a, 2) ?
"Found via zigzag method. " : "ERROR. ");
//System.out.println("min search time: " + zigMin + "ms");
System.out.println("max search time: " + zigMax + "ms");
System.out.println("avg search time: " + zigAvg + "ms");
System.out.println();
System.out.println(findBinarySearch(a, 2) ?
"Found via binary search method. " : "ERROR. ");
//System.out.println("min search time: " + binMin + "ms");
System.out.println("max search time: " + binMax + "ms");
System.out.println("avg search time: " + binAvg + "ms");
}
}

This is a short proof of the lower bound on the problem.
You cannot do it better than linear time (in terms of array dimensions, not the number of elements). In the array below, each of the elements marked as * can be either 5 or 6 (independently of other ones). So if your target value is 6 (or 5) the algorithm needs to examine all of them.
1 2 3 4 *
2 3 4 * 7
3 4 * 7 8
4 * 7 8 9
* 7 8 9 10
Of course this expands to bigger arrays as well. This means that this answer is optimal.
Update: As pointed out by Jeffrey L Whitledge, it is only optimal as the asymptotic lower bound on running time vs input data size (treated as a single variable). Running time treated as two-variable function on both array dimensions can be improved.

I think Here is the answer and it works for any kind of sorted matrix
bool findNum(int arr[][ARR_MAX],int xmin, int xmax, int ymin,int ymax,int key)
{
if (xmin > xmax || ymin > ymax || xmax < xmin || ymax < ymin) return false;
if ((xmin == xmax) && (ymin == ymax) && (arr[xmin][ymin] != key)) return false;
if (arr[xmin][ymin] > key || arr[xmax][ymax] < key) return false;
if (arr[xmin][ymin] == key || arr[xmax][ymax] == key) return true;
int xnew = (xmin + xmax)/2;
int ynew = (ymin + ymax)/2;
if (arr[xnew][ynew] == key) return true;
if (arr[xnew][ynew] < key)
{
if (findNum(arr,xnew+1,xmax,ymin,ymax,key))
return true;
return (findNum(arr,xmin,xmax,ynew+1,ymax,key));
} else {
if (findNum(arr,xmin,xnew-1,ymin,ymax,key))
return true;
return (findNum(arr,xmin,xmax,ymin,ynew-1,key));
}
}

Interesting question. Consider this idea - create one boundary where all the numbers are greater than your target and another where all the numbers are less than your target. If anything is left in between the two, that's your target.
If I'm looking for 3 in your example, I read across the first row until I hit 4, then look for the smallest adjacent number (including diagonals) greater than 3:
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Now I do the same for those numbers less than 3:
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11
Now I ask, is anything inside the two boundaries? If yes, it must be 3. If no, then there is no 3. Sort of indirect since I don't actually find the number, I just deduce that it must be there. This has the added bonus of counting ALL the 3's.
I tried this on some examples and it seems to work OK.

Binary search through the diagonal of the array is the best option.
We can find out whether the element is less than or equal to the elements in the diagonal.

I've been asking this question in interviews for the better part of a decade and I think there's only been one person who has been able to come up with an optimal algorithm.
My solution has always been:
Binary search the middle diagonal, which is the diagonal running down and right, containing the item at (rows.count/2, columns.count/2).
If the target number is found, return true.
Otherwise, two numbers (u and v) will have been found such that u is smaller than the target, v is larger than the target, and v is one right and one down from u.
Recursively search the sub-matrix to the right of u and top of v and the one to the bottom of u and left of v.
I believe this is a strict improvement over the algorithm given by Nate here, since searching the diagonal often allows a reduction of over half the search space (if the matrix is close to square), whereas searching a row or column always results in an elimination of exactly half.
Here's the code in (probably not terribly Swifty) Swift:
import Cocoa
class Solution {
func searchMatrix(_ matrix: [[Int]], _ target: Int) -> Bool {
if (matrix.isEmpty || matrix[0].isEmpty) {
return false
}
return _searchMatrix(matrix, 0..<matrix.count, 0..<matrix[0].count, target)
}
func _searchMatrix(_ matrix: [[Int]], _ rows: Range<Int>, _ columns: Range<Int>, _ target: Int) -> Bool {
if (rows.count == 0 || columns.count == 0) {
return false
}
if (rows.count == 1) {
return _binarySearch(matrix, rows.lowerBound, columns, target, true)
}
if (columns.count == 1) {
return _binarySearch(matrix, columns.lowerBound, rows, target, false)
}
var lowerInflection = (-1, -1)
var upperInflection = (Int.max, Int.max)
var currentRows = rows
var currentColumns = columns
while (currentRows.count > 0 && currentColumns.count > 0 && upperInflection.0 > lowerInflection.0+1) {
let rowMidpoint = (currentRows.upperBound + currentRows.lowerBound) / 2
let columnMidpoint = (currentColumns.upperBound + currentColumns.lowerBound) / 2
let value = matrix[rowMidpoint][columnMidpoint]
if (value == target) {
return true
}
if (value > target) {
upperInflection = (rowMidpoint, columnMidpoint)
currentRows = currentRows.lowerBound..<rowMidpoint
currentColumns = currentColumns.lowerBound..<columnMidpoint
} else {
lowerInflection = (rowMidpoint, columnMidpoint)
currentRows = rowMidpoint+1..<currentRows.upperBound
currentColumns = columnMidpoint+1..<currentColumns.upperBound
}
}
if (lowerInflection.0 == -1) {
lowerInflection = (upperInflection.0-1, upperInflection.1-1)
} else if (upperInflection.0 == Int.max) {
upperInflection = (lowerInflection.0+1, lowerInflection.1+1)
}
return _searchMatrix(matrix, rows.lowerBound..<lowerInflection.0+1, upperInflection.1..<columns.upperBound, target) || _searchMatrix(matrix, upperInflection.0..<rows.upperBound, columns.lowerBound..<lowerInflection.1+1, target)
}
func _binarySearch(_ matrix: [[Int]], _ rowOrColumn: Int, _ range: Range<Int>, _ target: Int, _ searchRow : Bool) -> Bool {
if (range.isEmpty) {
return false
}
let midpoint = (range.upperBound + range.lowerBound) / 2
let value = (searchRow ? matrix[rowOrColumn][midpoint] : matrix[midpoint][rowOrColumn])
if (value == target) {
return true
}
if (value > target) {
return _binarySearch(matrix, rowOrColumn, range.lowerBound..<midpoint, target, searchRow)
} else {
return _binarySearch(matrix, rowOrColumn, midpoint+1..<range.upperBound, target, searchRow)
}
}
}

A. Do a binary search on those lines where the target number might be on.
B. Make it a graph : Look for the number by taking always the smallest unvisited neighbour node and backtracking when a too big number is found

Binary search would be the best approach, imo. Starting at 1/2 x, 1/2 y will cut it in half. IE a 5x5 square would be something like x == 2 / y == 3 . I rounded one value down and one value up to better zone in on the direction of the targeted value.
For clarity the next iteration would give you something like x == 1 / y == 2 OR x == 3 / y == 5

Well, to begin with, let us assume we are using a square.
1 2 3
2 3 4
3 4 5
1. Searching a square
I would use a binary search on the diagonal. The goal is the locate the smaller number that is not strictly lower than the target number.
Say I am looking for 4 for example, then I would end up locating 5 at (2,2).
Then, I am assured that if 4 is in the table, it is at a position either (x,2) or (2,x) with x in [0,2]. Well, that's just 2 binary searches.
The complexity is not daunting: O(log(N)) (3 binary searches on ranges of length N)
2. Searching a rectangle, naive approach
Of course, it gets a bit more complicated when N and M differ (with a rectangle), consider this degenerate case:
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
And let's say I am looking for 9... The diagonal approach is still good, but the definition of diagonal changes. Here my diagonal is [1, (5 or 6), 17]. Let's say I picked up [1,5,17], then I know that if 9 is in the table it is either in the subpart:
5 6 7 8
6 7 8 9
10 11 12 13 14 15 16
This gives us 2 rectangles:
5 6 7 8 10 11 12 13 14 15 16
6 7 8 9
So we can recurse! probably beginning by the one with less elements (though in this case it kills us).
I should point that if one of the dimensions is less than 3, we cannot apply the diagonal methods and must use a binary search. Here it would mean:
Apply binary search on 10 11 12 13 14 15 16, not found
Apply binary search on 5 6 7 8, not found
Apply binary search on 6 7 8 9, not found
It's tricky because to get good performance you might want to differentiate between several cases, depending on the general shape....
3. Searching a rectangle, brutal approach
It would be much easier if we dealt with a square... so let's just square things up.
1 2 3 4 5 6 7 8
2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17
17 . . . . . . 17
. .
. .
. .
17 . . . . . . 17
We now have a square.
Of course, we will probably NOT actually create those rows, we could simply emulate them.
def get(x,y):
if x < N and y < M: return table[x][y]
else: return table[N-1][M-1] # the max
so it behaves like a square without occupying more memory (at the cost of speed, probably, depending on cache... oh well :p)

EDIT:
I misunderstood the question. As the comments point out this only works in the more restricted case.
In a language like C that stores data in row-major order, simply treat it as a 1D array of size n * m and use a binary search.

I have a recursive Divide & Conquer Solution.
Basic Idea for one step is: We know that the Left-Upper(LU) is smallest and the right-bottom(RB) is the largest no., so the given No(N) must: N>=LU and N<=RB
IF N==LU and N==RB::::Element Found and Abort returning the position/Index
If N>=LU and N<=RB = FALSE, No is not there and abort.
If N>=LU and N<=RB = TRUE, Divide the 2D array in 4 equal parts of 2D array each in logical manner..
And then apply the same algo step to all four sub-array.
My Algo is Correct I have implemented on my friends PC.
Complexity: each 4 comparisons can b used to deduce the total no of elements to one-fourth at its worst case..
So My complexity comes to be 1 + 4 x lg(n) + 4
But really expected this to be working on O(n)
I think something is wrong somewhere in my calculation of Complexity, please correct if so..

The optimal solution is to start at the top-left corner, that has minimal value. Move diagonally downwards to the right until you hit an element whose value >= value of the given element. If the element's value is equal to that of the given element, return found as true.
Otherwise, from here we can proceed in two ways.
Strategy 1:
Move up in the column and search for the given element until we reach the end. If found, return found as true
Move left in the row and search for the given element until we reach the end. If found, return found as true
return found as false
Strategy 2:
Let i denote the row index and j denote the column index of the diagonal element we have stopped at. (Here, we have i = j, BTW). Let k = 1.
Repeat the below steps until i-k >= 0
Search if a[i-k][j] is equal to the given element. if yes, return found as true.
Search if a[i][j-k] is equal to the given element. if yes, return found as true.
Increment k
1 2 4 5 6
2 3 5 7 8
4 6 8 9 10
5 8 9 10 11

public boolean searchSortedMatrix(int arr[][] , int key , int minX , int maxX , int minY , int maxY){
// base case for recursion
if(minX > maxX || minY > maxY)
return false ;
// early fails
// array not properly intialized
if(arr==null || arr.length==0)
return false ;
// arr[0][0]> key return false
if(arr[minX][minY]>key)
return false ;
// arr[maxX][maxY]<key return false
if(arr[maxX][maxY]<key)
return false ;
//int temp1 = minX ;
//int temp2 = minY ;
int midX = (minX+maxX)/2 ;
//if(temp1==midX){midX+=1 ;}
int midY = (minY+maxY)/2 ;
//if(temp2==midY){midY+=1 ;}
// arr[midX][midY] = key ? then value found
if(arr[midX][midY] == key)
return true ;
// alas ! i have to keep looking
// arr[midX][midY] < key ? search right quad and bottom matrix ;
if(arr[midX][midY] < key){
if( searchSortedMatrix(arr ,key , minX,maxX , midY+1 , maxY))
return true ;
// search bottom half of matrix
if( searchSortedMatrix(arr ,key , midX+1,maxX , minY , maxY))
return true ;
}
// arr[midX][midY] > key ? search left quad matrix ;
else {
return(searchSortedMatrix(arr , key , minX,midX-1,minY,midY-1));
}
return false ;
}

I suggest, store all characters in a 2D list. then find index of required element if it exists in list.
If not present print appropriate message else print row and column as:
row = (index/total_columns) and column = (index%total_columns -1)
This will incur only the binary search time in a list.
Please suggest any corrections. :)

If O(M log(N)) solution is ok for an MxN array -
template <size_t n>
struct MN * get(int a[][n], int k, int M, int N){
struct MN *result = new MN;
result->m = -1;
result->n = -1;
/* Do a binary search on each row since rows (and columns too) are sorted. */
for(int i = 0; i < M; i++){
int lo = 0; int hi = N - 1;
while(lo <= hi){
int mid = lo + (hi-lo)/2;
if(k < a[i][mid]) hi = mid - 1;
else if (k > a[i][mid]) lo = mid + 1;
else{
result->m = i;
result->n = mid;
return result;
}
}
}
return result;
}
Working C++ demo.
Please do let me know if this wouldn't work or if there is a bug it it.

class Solution {
public boolean searchMatrix(int[][] matrix, int target) {
if(matrix == null)
return false;
int i=0;
int j=0;
int m = matrix.length;
int n = matrix[0].length;
boolean found = false;
while(i<m && !found){
while(j<n && !found){
if(matrix[i][j] == target)
found = true;
if(matrix[i][j] < target)
j++;
else
break;
}
i++;
j=0;
}
return found;
}}
129 / 129 test cases passed.
Status: Accepted
Runtime: 39 ms
Memory Usage: 55 MB

Given a square matrix as follows:
[ a b c ]
[ d e f ]
[ i j k ]
We know that a < c, d < f, i < k. What we don't know is whether d < c or d > c, etc. We have guarantees only in 1-dimension.
Looking at the end elements (c,f,k), we can do a sort of filter: is N < c ? search() : next(). Thus, we have n iterations over the rows, with each row taking either O( log( n ) ) for binary search or O( 1 ) if filtered out.
Let me given an EXAMPLE where N = j,
1) Check row 1. j < c? (no, go next)
2) Check row 2. j < f? (yes, bin search gets nothing)
3) Check row 3. j < k? (yes, bin search finds it)
Try again with N = q,
1) Check row 1. q < c? (no, go next)
2) Check row 2. q < f? (no, go next)
3) Check row 3. q < k? (no, go next)
There is probably a better solution out there but this is easy to explain.. :)

As this is an interview question, it would seem to lead towards a discussion of Parallel programming and Map-reduce algorithms.
See http://code.google.com/intl/de/edu/parallel/mapreduce-tutorial.html

How to find trend (growth/decrease/stationarity) of a data series

I am trying to extract the OEE trend of a manufacturing machine. I already have a dataset of OEE calculated more or less every 30 seconds for each manufacturing machine and stored in a database.
What I want to do is to extract a subset of the dataset (say, last 30 minutes) and state if the OEE has grown, decreased or has been stable (withing a certain threshold). My task is NOT to forecast what will be the next value of OEE, but just to know if has decreased (desired return value: -1), grown (desired return value: +1) or been stable (desired return value: 0) based on the dataset. I am using Java 8 in my project.
Here is an example of dataset:
71.37
71.37
70.91
70.30
70.30
70.42
70.42
69.77
69.77
69.29
68.92
68.92
68.61
68.61
68.91
68.91
68.50
68.71
69.27
69.26
69.89
69.85
69.98
69.93
69.39
68.97
69.03
From this dataset is possible to state that the OEE has been decreasing (of couse based on a threshold), thus the algorithm would return -1.
I have been searching on the web unsuccessfully. I have found this, or this github project, or this stackoverflow question. However, all those are (more or less) complex forecasting algorithm. I am searching for a much easier solution. Any help is apreciated.

You could go for a
sliding average of the last n values.
Or a
sliding median of the last n values.
It highly depends on your application what is appropriate. But both these are very simple to implement and in a lot of cases more than good enough.

As you know from math, one would use d/dt, which more or less is using the step differences.
A trend is should have some weight.
class Trend {
int direction;
double probability;
}
Trend trend(double[] lastData) {
double[] deltas = Arrays.copyOf(lastData, lastData.length - 1);
for (int i = 0; i < deltas.length; ++i) {
deltas[i] -= lastData[i + 1];
}
// Trend based on two parts:
int parts = 2;
int splitN = (deltas.length + 1) / parts;
int i = 0;
int[] trends = new int[parts];
for (int j = 0; j < parts.length; ++j) {
int n = Math.min(splitN, parts.length - i);
double partAvg = DoubleStream.of(deltas).skip(i).limit(n).sum() / n;
trends[j] = tendency(partAvg);
}
Trend result = new Trend();
trend.direction = trends[parts - 1];
double avg = IntStream.of(trends).average().orElse((double)trend.direction);
trend.probability = ((direction - avg) + 1) / 2;
return trends[parts - 1];
}
int tendency(double sum) {
final double EPS = 0.0001;
return sum < -EPS ? -1 : sum > EPS ? 1 : 0;
}
This is not very sophisticated. For more elaborate treatment a math forum might be useful.

Improving next_permutation algorithm

I have the following homework:
We have N works, which durations are: t1, t2, ..., tN, which's deadlines are d1, d2, ..., dN. If the works aren't done till the deadline, a penalty is given accordingly b1, b2, ..., bN. In what order should the jobs be done, that the penalty would be minimum?
I've written this code so far and it's working but I want to improve it by skipping unnecessary permutations. For example, I know that the jobs in order:
1 2 3 4 5 - will give me 100 points of penalty and if I change the order let's say like this:
2 1 ..... - it gives me instantly 120 penalty and from this moment I know I don't have to check all of the rest permutations which start with 2 1, I have to skip them somehow.
Here's the code:
int finalPenalty = -1;
bool z = true;
while(next_permutation(jobs.begin(), jobs.end(), compare) || z)
{
int time = 0;
int penalty = 0;
z = false;
for (int i = 0; i < verseNumber; i++)
{
if (penalty > finalPenalty && finalPenalty >= 0)
break;
time += jobs[i].duration;
if (time > jobs[i].deadline)
penalty += jobs[i].penalty;
}
if (finalPenalty < 0 || penalty < finalPenalty)
{
sortedJobs = jobs;
finalPenalty = penalty;
}
if (finalPenalty == 0)
break;
}
I think I should do this somewhere here:
if (penalty > finalPenalty && finalPenalty >= 0)
break;
But I'm not sure how to do this. It skips me one permutation here if the penalty is already higher, but it doesn't skip everything and it still does next_permutation. Any ideas?
EDIT:
I'm using vector and my job structure looks like this:
struct job
{
int ID;
int duration;
int deadline;
int penalty;
};
ID is given automatically when reading from file and the rest is read from file (for example: ID = 1, duration = 5, deadline = 10, penalty = 10)

If you are planning to use next_permutation function provided by STL, there is not much you can do.
Say the last k digits are redundant to check. If you will use next_permutation function, a simple, yet inefficient strategy you can use is calling next_permutation for k! times(i.e. number of permutations of those last k elements) and just not go through with computing their penalties, as you know they will be higher. (k! assumes there are not repetitions. if you have repetitions, you would need to take extra measures to be able to compute that) This would cost you O(k!n) operations on the worst case, as next_permutation has linear time complexity.
Let's consider how we can improve this. A sound strategy may be, once an inefficient setting is found, before calling next_permutation again, ordering those k digits in descending order so that the next call would effectively skip the inefficient portion of permutations that need not be checked. Consider the following example.
Say our method found 1 2 3 4 5 has a penalty of 100. Then, while computing 2 1 3 4 5 at the next step, if our method finds that we got a penalty higher than 100 only after computing 2 1, if could just sort 3 4 5 in descending order using sort along with your custom comparison mechanism, and just skip the rest of the loop, arriving at another next_permutation call, which would give you 2 1 4 3 5, the next sequence to continue.
Let's consider how much skipping costs. This method requires sorting those k digits and calling next_permutation, which has an overall time complexity of O(klogk + n). This is a huge improvement over the previous method which has O(k!n).
See below for an crude implementation of the method I propose as an improvement over your existing code. I had to use type auto as you did not provide the exact type for jobs. I also sorted then reversed those k digits, as you did not provide your comparison function and I wanted to emphasize that what I was doing was reversing the ascending order.
int finalPenalty = -1;
bool z = true;
while(next_permutation(jobs.begin(), jobs.end(), compare) || z)
{
int time = 0;
int penalty = 0;
z = false;
auto it = jobs.begin();
for (int i = 0; i < verseNumber; i++)
{
time += jobs[i].duration;
if (time > jobs[i].deadline)
{
penalty += jobs[i].penalty;
if(finalPenalty >= 0 && penalty > finalPenalty)
{
it++; // only the remaining jobs need to be sorted in reverse
sort(it, jobs.end(), compare);
reverse(it, jobs.end());
break;
}
}
it++;
}
if (finalPenalty < 0 || penalty < finalPenalty)
{
sortedJobs = jobs;
finalPenalty = penalty;
}
if (finalPenalty == 0)
break;
}

Optimized TSP Algorithms

I am interested in ways to improve or come up with algorithms that are able to solve the Travelling salesman problem for about n = 100 to 200 cities.
The wikipedia link I gave lists various optimizations, but it does so at a pretty high level, and I don't know how to go about actually implementing them in code.
There are industrial strength solvers out there, such as Concorde, but those are way too complex for what I want, and the classic solutions that flood the searches for TSP all present randomized algorithms or the classic backtracking or dynamic programming algorithms that only work for about 20 cities.
So, does anyone know how to implement a simple (by simple I mean that an implementation doesn't take more than 100-200 lines of code) TSP solver that works in reasonable time (a few seconds) for at least 100 cities? I am only interested in exact solutions.
You may assume that the input will be randomly generated, so I don't care for inputs that are aimed specifically at breaking a certain algorithm.

200 lines and no libraries is a tough constraint. The advanced solvers use branch and bound with the Held–Karp relaxation, and I'm not sure if even the most basic version of that would fit into 200 normal lines. Nevertheless, here's an outline.
Held Karp
One way to write TSP as an integer program is as follows (Dantzig, Fulkerson, Johnson). For all edges e, constant we denotes the length of edge e, and variable xe is 1 if edge e is on the tour and 0 otherwise. For all subsets S of vertices, ∂(S) denotes the edges connecting a vertex in S with a vertex not in S.
minimize sumedges e we xe
subject to
1. for all vertices v, sumedges e in ∂({v}) xe = 2
2. for all nonempty proper subsets S of vertices, sumedges e in ∂(S) xe ≥ 2
3. for all edges e in E, xe in {0, 1}
Condition 1 ensures that the set of edges is a collection of tours. Condition 2 ensures that there's only one. (Otherwise, let S be the set of vertices visited by one of the tours.) The Held–Karp relaxation is obtained by making this change.
3. for all edges e in E, xe in {0, 1}
3. for all edges e in E, 0 ≤ xe ≤ 1
Held–Karp is a linear program but it has an exponential number of constraints. One way to solve it is to introduce Lagrange multipliers and then do subgradient optimization. That boils down to a loop that computes a minimum spanning tree and then updates some vectors, but the details are sort of involved. Besides "Held–Karp" and "subgradient (descent|optimization)", "1-tree" is another useful search term.
(A slower alternative is to write an LP solver and introduce subtour constraints as they are violated by previous optima. This means writing an LP solver and a min-cut procedure, which is also more code, but it might extend better to more exotic TSP constraints.)
Branch and bound
By "partial solution", I mean an partial assignment of variables to 0 or 1, where an edge assigned 1 is definitely in the tour, and an edge assigned 0 is definitely out. Evaluating Held–Karp with these side constraints gives a lower bound on the optimum tour that respects the decisions already made (an extension).
Branch and bound maintains a set of partial solutions, at least one of which extends to an optimal solution. The pseudocode for one variant, depth-first search with best-first backtracking is as follows.
let h be an empty minheap of partial solutions, ordered by Held–Karp value
let bestsolsofar = null
let cursol be the partial solution with no variables assigned
loop
while cursol is not a complete solution and cursol's H–K value is at least as good as the value of bestsolsofar
choose a branching variable v
let sol0 be cursol union {v -> 0}
let sol1 be cursol union {v -> 1}
evaluate sol0 and sol1
let cursol be the better of the two; put the other in h
end while
if cursol is better than bestsolsofar then
let bestsolsofar = cursol
delete all heap nodes worse than cursol
end if
if h is empty then stop; we've found the optimal solution
pop the minimum element of h and store it in cursol
end loop
The idea of branch and bound is that there's a search tree of partial solutions. The point of solving Held–Karp is that the value of the LP is at most the length OPT of the optimal tour but also conjectured to be at least 3/4 OPT (in practice, usually closer to OPT).
The one detail in the pseudocode I've left out is how to choose the branching variable. The goal is usually to make the "hard" decisions first, so fixing a variable whose value is already near 0 or 1 is probably not wise. One option is to choose the closest to 0.5, but there are many, many others.
EDIT
Java implementation. 198 nonblank, noncomment lines. I forgot that 1-trees don't work with assigning variables to 1, so I branch by finding a vertex whose 1-tree has degree >2 and delete each edge in turn. This program accepts TSPLIB instances in EUC_2D format, e.g., eil51.tsp and eil76.tsp and eil101.tsp and lin105.tsp from http://www2.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/tsp/.
// simple exact TSP solver based on branch-and-bound/Held--Karp
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class TSP {
// number of cities
private int n;
// city locations
private double[] x;
private double[] y;
// cost matrix
private double[][] cost;
// matrix of adjusted costs
private double[][] costWithPi;
Node bestNode = new Node();
public static void main(String[] args) throws IOException {
// read the input in TSPLIB format
// assume TYPE: TSP, EDGE_WEIGHT_TYPE: EUC_2D
// no error checking
TSP tsp = new TSP();
tsp.readInput(new InputStreamReader(System.in));
tsp.solve();
}
public void readInput(Reader r) throws IOException {
BufferedReader in = new BufferedReader(r);
Pattern specification = Pattern.compile("\\s*([A-Z_]+)\\s*(:\\s*([0-9]+))?\\s*");
Pattern data = Pattern.compile("\\s*([0-9]+)\\s+([-+.0-9Ee]+)\\s+([-+.0-9Ee]+)\\s*");
String line;
while ((line = in.readLine()) != null) {
Matcher m = specification.matcher(line);
if (!m.matches()) continue;
String keyword = m.group(1);
if (keyword.equals("DIMENSION")) {
n = Integer.parseInt(m.group(3));
cost = new double[n][n];
} else if (keyword.equals("NODE_COORD_SECTION")) {
x = new double[n];
y = new double[n];
for (int k = 0; k < n; k++) {
line = in.readLine();
m = data.matcher(line);
m.matches();
int i = Integer.parseInt(m.group(1)) - 1;
x[i] = Double.parseDouble(m.group(2));
y[i] = Double.parseDouble(m.group(3));
}
// TSPLIB distances are rounded to the nearest integer to avoid the sum of square roots problem
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
double dx = x[i] - x[j];
double dy = y[i] - y[j];
cost[i][j] = Math.rint(Math.sqrt(dx * dx + dy * dy));
}
}
}
}
}
public void solve() {
bestNode.lowerBound = Double.MAX_VALUE;
Node currentNode = new Node();
currentNode.excluded = new boolean[n][n];
costWithPi = new double[n][n];
computeHeldKarp(currentNode);
PriorityQueue<Node> pq = new PriorityQueue<Node>(11, new NodeComparator());
do {
do {
boolean isTour = true;
int i = -1;
for (int j = 0; j < n; j++) {
if (currentNode.degree[j] > 2 && (i < 0 || currentNode.degree[j] < currentNode.degree[i])) i = j;
}
if (i < 0) {
if (currentNode.lowerBound < bestNode.lowerBound) {
bestNode = currentNode;
System.err.printf("%.0f", bestNode.lowerBound);
}
break;
}
System.err.printf(".");
PriorityQueue<Node> children = new PriorityQueue<Node>(11, new NodeComparator());
children.add(exclude(currentNode, i, currentNode.parent[i]));
for (int j = 0; j < n; j++) {
if (currentNode.parent[j] == i) children.add(exclude(currentNode, i, j));
}
currentNode = children.poll();
pq.addAll(children);
} while (currentNode.lowerBound < bestNode.lowerBound);
System.err.printf("%n");
currentNode = pq.poll();
} while (currentNode != null && currentNode.lowerBound < bestNode.lowerBound);
// output suitable for gnuplot
// set style data vector
System.out.printf("# %.0f%n", bestNode.lowerBound);
int j = 0;
do {
int i = bestNode.parent[j];
System.out.printf("%f\t%f\t%f\t%f%n", x[j], y[j], x[i] - x[j], y[i] - y[j]);
j = i;
} while (j != 0);
}
private Node exclude(Node node, int i, int j) {
Node child = new Node();
child.excluded = node.excluded.clone();
child.excluded[i] = node.excluded[i].clone();
child.excluded[j] = node.excluded[j].clone();
child.excluded[i][j] = true;
child.excluded[j][i] = true;
computeHeldKarp(child);
return child;
}
private void computeHeldKarp(Node node) {
node.pi = new double[n];
node.lowerBound = Double.MIN_VALUE;
node.degree = new int[n];
node.parent = new int[n];
double lambda = 0.1;
while (lambda > 1e-06) {
double previousLowerBound = node.lowerBound;
computeOneTree(node);
if (!(node.lowerBound < bestNode.lowerBound)) return;
if (!(node.lowerBound < previousLowerBound)) lambda *= 0.9;
int denom = 0;
for (int i = 1; i < n; i++) {
int d = node.degree[i] - 2;
denom += d * d;
}
if (denom == 0) return;
double t = lambda * node.lowerBound / denom;
for (int i = 1; i < n; i++) node.pi[i] += t * (node.degree[i] - 2);
}
}
private void computeOneTree(Node node) {
// compute adjusted costs
node.lowerBound = 0.0;
Arrays.fill(node.degree, 0);
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) costWithPi[i][j] = node.excluded[i][j] ? Double.MAX_VALUE : cost[i][j] + node.pi[i] + node.pi[j];
}
int firstNeighbor;
int secondNeighbor;
// find the two cheapest edges from 0
if (costWithPi[0][2] < costWithPi[0][1]) {
firstNeighbor = 2;
secondNeighbor = 1;
} else {
firstNeighbor = 1;
secondNeighbor = 2;
}
for (int j = 3; j < n; j++) {
if (costWithPi[0][j] < costWithPi[0][secondNeighbor]) {
if (costWithPi[0][j] < costWithPi[0][firstNeighbor]) {
secondNeighbor = firstNeighbor;
firstNeighbor = j;
} else {
secondNeighbor = j;
}
}
}
addEdge(node, 0, firstNeighbor);
Arrays.fill(node.parent, firstNeighbor);
node.parent[firstNeighbor] = 0;
// compute the minimum spanning tree on nodes 1..n-1
double[] minCost = costWithPi[firstNeighbor].clone();
for (int k = 2; k < n; k++) {
int i;
for (i = 1; i < n; i++) {
if (node.degree[i] == 0) break;
}
for (int j = i + 1; j < n; j++) {
if (node.degree[j] == 0 && minCost[j] < minCost[i]) i = j;
}
addEdge(node, node.parent[i], i);
for (int j = 1; j < n; j++) {
if (node.degree[j] == 0 && costWithPi[i][j] < minCost[j]) {
minCost[j] = costWithPi[i][j];
node.parent[j] = i;
}
}
}
addEdge(node, 0, secondNeighbor);
node.parent[0] = secondNeighbor;
node.lowerBound = Math.rint(node.lowerBound);
}
private void addEdge(Node node, int i, int j) {
double q = node.lowerBound;
node.lowerBound += costWithPi[i][j];
node.degree[i]++;
node.degree[j]++;
}
}
class Node {
public boolean[][] excluded;
// Held--Karp solution
public double[] pi;
public double lowerBound;
public int[] degree;
public int[] parent;
}
class NodeComparator implements Comparator<Node> {
public int compare(Node a, Node b) {
return Double.compare(a.lowerBound, b.lowerBound);
}
}

If your graph satisfy the triangle inequality and you want a guarantee of 3/2 within the optimum I suggest the christofides algorithm. I've wrote an implementation in php at phpclasses.org.

As of 2013, It is possible to solve for 100 cities using only the exact formulation in Cplex. Add degree equations for each vertex, but include subtour-avoiding constraints only as they appear. Most of them are not necessary. Cplex has an example on this.
You should be able to solve for 100 cities. You will have to iterate every time a new subtour is found. I ran an example here and in a couple of minutes and 100 iterations later I got my results.

I took Held-Karp algorithm from concorde library and 25 cities are solved in 0.15 seconds. This performance is perfectly good for me! You can extract the code (writen in ANSI C) of held-karp from concorde library: http://www.math.uwaterloo.ca/tsp/concorde/downloads/downloads.htm. If the download has the extension gz, it should be tgz. You might need to rename it. Then you should make little ajustments to port in in VC++. First take the file heldkarp h and c (rename it cpp) and other about 5 files, make adjustments and it should work calling CCheldkarp_small(...) with edgelen: euclid_ceiling_edgelen.

TSP is an NP-hard problem. (As far as we know) there is no algorithm for NP-hard problems which runs in polynomial time, so you ask for something that doesn't exist.
It's either fast enough to finish in a reasonable time and then it's not exact, or exact but won't finish in your lifetime for 100 cities.

To give a dumb answer: me too. Everyone is interrested in such algorithm, but as others already stated: I does not (yet?) exist. Esp your combination of exact, 200 nodes, few seconds runtime and just 200 lines of code is impossible. You already know that is it NP hard and if you got the slightest impression of asymptotic behaviour you should know that there is no way of achieving this (except you prove that NP=P, and even that I would say thats not possible). Even the exact commercial solvers need for such instances far more than some seconds and as you can imagine they have far more than 200 lines of code (even when you just consider their kernels).
EDIT: The wiki algorithms are the "usual suspects" of the field: Linear Programming and branch-and-bound. Their solutions for the instances with thousands of nodes took Years to solve (they just did it with very very much CPUs parallel, so they can do it faster). Some even use for the branch-and-bound problem specific knowledge for the bounding, so they are no general approaches.
Branch and bound just enumerates all possible paths (e.g. with backtracking) and applies once it has a solution this for to stop a started recursion when it can prove that the result is not better than the already found solution (e.g. if you just visited 2 of your cities and the path is already longer than a found 200 city tour. You can discard all tours that start with that 2 city combination). Here you can invest very much problem specific knowledge in the function that tells you, that the path is not going to be better than the already found solution. The better it is, the less paths you have to look at, the faster is your algorithm.
Linear Programming is an optimization method so solve linear inequality problems. It works in polynomial time (simplex just practically, but that doesnt matter here), but the solution is real. When you have the additional constraint that the solution must be integer, it gets NP-complete. For small instances it is possible, e.g. one method to solve it, then look which variable of the solution violates the integer part and add addition inequalities to change it (this is called cutting-plane, the name cames from the fact that the inequalities define (higher-dimensional) plane, the solution space is a polytop and by adding additional inequalities you cut something with a plane from the polytop). The topic is very complex and even a general simple simplex is hard to understand when you dont want dive deep into the math. There are several good books about, one of the betters is from Chvatal, Linear Programming, but there are several more.

I have a theory, but I've never had the time to pursue it:
The TSP is a bounding problem (single shape where all points lie on the perimeter) where the optimal solution is that solution that has the shortest perimeter.
There are plenty of simple ways to get all the points that lie on a minimum bounding perimeter (imagine a large elastic band stretched around a bunch of nails in a large board.)
My theory is that if you start pushing in on the elastic band so that the length of band increases by the same amount between adjacent points on the perimeter, and each segment remains in the shape of an eliptical arc, the stretched elastic will cross points on the optimal path before crossing points on non-optimal paths. See this page on mathopenref.com on drawing ellipses--particularly steps 5 and 6. Points on the bounding perimeter can be viewed as focal points of the ellipse (F1, F2) in the images below.
What I don't know is if the "bubble stretching" process needs to be reset after each new point is added, or if the existing "bubbles" continue to grow and each new point on the perimeter causes only the localized "bubble" to turn into two line segments. I'll leave that for you to figure out.

Shuffle list, ensuring that no item remains in same position

I want to shuffle a list of unique items, but not do an entirely random shuffle. I need to be sure that no element in the shuffled list is at the same position as in the original list. Thus, if the original list is (A, B, C, D, E), this result would be OK: (C, D, B, E, A), but this one would not: (C, E, A, D, B) because "D" is still the fourth item. The list will have at most seven items. Extreme efficiency is not a consideration. I think this modification to Fisher/Yates does the trick, but I can't prove it mathematically:
function shuffle(data) {
for (var i = 0; i < data.length - 1; i++) {
var j = i + 1 + Math.floor(Math.random() * (data.length - i - 1));
var temp = data[j];
data[j] = data[i];
data[i] = temp;
}
}

You are looking for a derangement of your entries.
First of all, your algorithm works in the sense that it outputs a random derangement, ie a permutation with no fixed point. However it has a enormous flaw (which you might not mind, but is worth keeping in mind): some derangements cannot be obtained with your algorithm. In other words, it gives probability zero to some possible derangements, so the resulting distribution is definitely not uniformly random.
One possible solution, as suggested in the comments, would be to use a rejection algorithm:
pick a permutation uniformly at random
if it hax no fixed points, return it
otherwise retry
Asymptotically, the probability of obtaining a derangement is close to 1/e = 0.3679 (as seen in the wikipedia article). Which means that to obtain a derangement you will need to generate an average of e = 2.718 permutations, which is quite costly.
A better way to do that would be to reject at each step of the algorithm. In pseudocode, something like this (assuming the original array contains i at position i, ie a[i]==i):
for (i = 1 to n-1) {
do {
j = rand(i, n) // random integer from i to n inclusive
} while a[j] != i // rejection part
swap a[i] a[j]
}
The main difference from your algorithm is that we allow j to be equal to i, but only if it does not produce a fixed point. It is slightly longer to execute (due to the rejection part), and demands that you be able to check if an entry is at its original place or not, but it has the advantage that it can produce every possible derangement (uniformly, for that matter).
I am guessing non-rejection algorithms should exist, but I would believe them to be less straight-forward.
Edit:
My algorithm is actually bad: you still have a chance of ending with the last point unshuffled, and the distribution is not random at all, see the marginal distributions of a simulation:
An algorithm that produces uniformly distributed derangements can be found here, with some context on the problem, thorough explanations and analysis.
Second Edit:
Actually your algorithm is known as Sattolo's algorithm, and is known to produce all cycles with equal probability. So any derangement which is not a cycle but a product of several disjoint cycles cannot be obtained with the algorithm. For example, with four elements, the permutation that exchanges 1 and 2, and 3 and 4 is a derangement but not a cycle.
If you don't mind obtaining only cycles, then Sattolo's algorithm is the way to go, it's actually much faster than any uniform derangement algorithm, since no rejection is needed.

As #FelixCQ has mentioned, the shuffles you are looking for are called derangements. Constructing uniformly randomly distributed derangements is not a trivial problem, but some results are known in the literature. The most obvious way to construct derangements is by the rejection method: you generate uniformly randomly distributed permutations using an algorithm like Fisher-Yates and then reject permutations with fixed points. The average running time of that procedure is e*n + o(n) where e is Euler's constant 2.71828... That would probably work in your case.
The other major approach for generating derangements is to use a recursive algorithm. However, unlike Fisher-Yates, we have two branches to the algorithm: the last item in the list can be swapped with another item (i.e., part of a two-cycle), or can be part of a larger cycle. So at each step, the recursive algorithm has to branch in order to generate all possible derangements. Furthermore, the decision of whether to take one branch or the other has to be made with the correct probabilities.
Let D(n) be the number of derangements of n items. At each stage, the number of branches taking the last item to two-cycles is (n-1)D(n-2), and the number of branches taking the last item to larger cycles is (n-1)D(n-1). This gives us a recursive way of calculating the number of derangements, namely D(n)=(n-1)(D(n-2)+D(n-1)), and gives us the probability of branching to a two-cycle at any stage, namely (n-1)D(n-2)/D(n-1).
Now we can construct derangements by deciding to which type of cycle the last element belongs, swapping the last element to one of the n-1 other positions, and repeating. It can be complicated to keep track of all the branching, however, so in 2008 some researchers developed a streamlined algorithm using those ideas. You can see a walkthrough at http://www.cs.upc.edu/~conrado/research/talks/analco08.pdf . The running time of the algorithm is proportional to 2n + O(log^2 n), a 36% improvement in speed over the rejection method.
I have implemented their algorithm in Java. Using longs works for n up to 22 or so. Using BigIntegers extends the algorithm to n=170 or so. Using BigIntegers and BigDecimals extends the algorithm to n=40000 or so (the limit depends on memory usage in the rest of the program).
package io.github.edoolittle.combinatorics;
import java.math.BigInteger;
import java.math.BigDecimal;
import java.math.MathContext;
import java.util.Random;
import java.util.HashMap;
import java.util.TreeMap;
public final class Derangements {
// cache calculated values to speed up recursive algorithm
private static HashMap<Integer,BigInteger> numberOfDerangementsMap
= new HashMap<Integer,BigInteger>();
private static int greatestNCached = -1;
// load numberOfDerangementsMap with initial values D(0)=1 and D(1)=0
static {
numberOfDerangementsMap.put(0,BigInteger.valueOf(1));
numberOfDerangementsMap.put(1,BigInteger.valueOf(0));
greatestNCached = 1;
}
private static Random rand = new Random();
// private default constructor so class isn't accidentally instantiated
private Derangements() { }
public static BigInteger numberOfDerangements(int n)
throws IllegalArgumentException {
if (numberOfDerangementsMap.containsKey(n)) {
return numberOfDerangementsMap.get(n);
} else if (n>=2) {
// pre-load the cache to avoid stack overflow (occurs near n=5000)
for (int i=greatestNCached+1; i<n; i++) numberOfDerangements(i);
greatestNCached = n-1;
// recursion for derangements: D(n) = (n-1)*(D(n-1) + D(n-2))
BigInteger Dn_1 = numberOfDerangements(n-1);
BigInteger Dn_2 = numberOfDerangements(n-2);
BigInteger Dn = (Dn_1.add(Dn_2)).multiply(BigInteger.valueOf(n-1));
numberOfDerangementsMap.put(n,Dn);
greatestNCached = n;
return Dn;
} else {
throw new IllegalArgumentException("argument must be >= 0 but was " + n);
}
}
public static int[] randomDerangement(int n)
throws IllegalArgumentException {
if (n<2)
throw new IllegalArgumentException("argument must be >= 2 but was " + n);
int[] result = new int[n];
boolean[] mark = new boolean[n];
for (int i=0; i<n; i++) {
result[i] = i;
mark[i] = false;
}
int unmarked = n;
for (int i=n-1; i>=0; i--) {
if (unmarked<2) break; // can't move anything else
if (mark[i]) continue; // can't move item at i if marked
// use the rejection method to generate random unmarked index j &lt i;
// this could be replaced by more straightforward technique
int j;
while (mark[j=rand.nextInt(i)]);
// swap two elements of the array
int temp = result[i];
result[i] = result[j];
result[j] = temp;
// mark position j as end of cycle with probability (u-1)D(u-2)/D(u)
double probability
= (new BigDecimal(numberOfDerangements(unmarked-2))).
multiply(new BigDecimal(unmarked-1)).
divide(new BigDecimal(numberOfDerangements(unmarked)),
MathContext.DECIMAL64).doubleValue();
if (rand.nextDouble() < probability) {
mark[j] = true;
unmarked--;
}
// position i now becomes out of play so we could mark it
//mark[i] = true;
// but we don't need to because loop won't touch it from now on
// however we do have to decrement unmarked
unmarked--;
}
return result;
}
// unit tests
public static void main(String[] args) {
// test derangement numbers D(i)
for (int i=0; i<100; i++) {
System.out.println("D(" + i + ") = " + numberOfDerangements(i));
}
System.out.println();
// test quantity (u-1)D_(u-2)/D_u for overflow, inaccuracy
for (int u=2; u<100; u++) {
double d = numberOfDerangements(u-2).doubleValue() * (u-1) /
numberOfDerangements(u).doubleValue();
System.out.println((u-1) + " * D(" + (u-2) + ") / D(" + u + ") = " + d);
}
System.out.println();
// test derangements for correctness, uniform distribution
int size = 5;
long reps = 10000000;
TreeMap<String,Integer> countMap = new TreeMap&ltString,Integer>();
System.out.println("Derangement\tCount");
System.out.println("-----------\t-----");
for (long rep = 0; rep < reps; rep++) {
int[] d = randomDerangement(size);
String s = "";
String sep = "";
if (size > 10) sep = " ";
for (int i=0; i<d.length; i++) {
s += d[i] + sep;
}
if (countMap.containsKey(s)) {
countMap.put(s,countMap.get(s)+1);
} else {
countMap.put(s,1);
}
}
for (String key : countMap.keySet()) {
System.out.println(key + "\t\t" + countMap.get(key));
}
System.out.println();
// large random derangement
int size1 = 1000;
System.out.println("Random derangement of " + size1 + " elements:");
int[] d1 = randomDerangement(size1);
for (int i=0; i<d1.length; i++) {
System.out.print(d1[i] + " ");
}
System.out.println();
System.out.println();
System.out.println("We start to run into memory issues around u=40000:");
{
// increase this number from 40000 to around 50000 to trigger
// out of memory-type exceptions
int u = 40003;
BigDecimal d = (new BigDecimal(numberOfDerangements(u-2))).
multiply(new BigDecimal(u-1)).
divide(new BigDecimal(numberOfDerangements(u)),MathContext.DECIMAL64);
System.out.println((u-1) + " * D(" + (u-2) + ") / D(" + u + ") = " + d);
}
}
}

In C++:
template <class T> void shuffle(std::vector<T>&arr)
{
int size = arr.size();
for (auto i = 1; i < size; i++)
{
int n = rand() % (size - i) + i;
std::swap(arr[i-1], arr[n]);
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio