Sort items with minimal renumber - algorithm

I need to quickly save a re-ordered sequence back to my items' integer sortOrder columns.
The simple renumber-by-one approach can be slow - if last item moved to first, all N rows are modified. A multi-row update statement would let database do the work, but I'd like to explore smarter ways, like making sortOrder floating point except I don't have that option :(
The solution I imagine would take a renumbered list like this: (100,200,1700,300,400...1600,1800,...) and produce (100,200,250,300,400...1600,1800,...) (by modifying one row in this example). It seems simple at first, but I'm having a hard time expressing it in code...
Can someone help me with this logic? There could be sequences of adjacent items that need to be shifted for a new one to fit - I was hoping someone might have this already written? It has to be faster than what I have now, but still readable and simple to understand/maintain.
OK, after answer, posting back with resulting code I came up with (comments welcome):
/**
* Renumber list with minimal changes
*
* Given a list of SortOrderNumbers in the 'new' sequence they are to be saved in, determine the
* minimal set of changes (described by Change(i, newSortOrderNumber)) that can be saved that
* results in a properly ordered sequence.
*
* A simple answer would always be List(change<1,1>, change<2,2>, ...change<n,n>) which is of length N.
* This method returns a set of changes no larger than N (often much smaller for simple moves).
*
* #author Jim Pinkham
* #param s
* #return Set<Change>
*/
private Set<Change> renumber(int[] s) {
Set<Change> ret = new HashSet<Change>();
// pass1 goes from start forwards looking for decrease in numbering
for (int i=1; i<s.length; i++) {
// if predecessor of item is larger and it's large enough to renumber from start of list
if (s[i-1]>s[i] && s[i]>i) {
int beforeStart=0;
int beforeIndex=-1;
// go back towards start looking for anchor
for (int j=i-2; j>=0; --j) {
int diff = s[i]-s[j];
if (diff>(i-j)) {
beforeIndex=j;
beforeStart=s[beforeIndex];
break;
}
}
int diff = s[i]-beforeStart;
int stepsToDiff=i-beforeIndex;
int delta = diff/stepsToDiff;
// renumber from start of list or anchor thru decrease
int fixCnt=0;
for (int j=beforeIndex+1; j<i; ++j) {
s[j] = beforeStart + (delta*++fixCnt);
System.out.println("s["+j+"]="+s[j]);
ret.add(new Change(j, s[j]));
}
}
}
// pass1 could leave some decreases in sequence
// pass2 goes from end back to start
for (int i=s.length-1; i>0; i--) {
// if predecessor of item is larger
if (s[i-1] > s[i]) {
int afterIndex=s.length;
int delta=DEFAULT_RENUMBER_GAP;
// go back towards end looking for anchor
for (int j=i; j<s.length; ++j) {
int diff = s[j]-s[i-1];
if (diff>(j-(i-1))) {
afterIndex=j;
int afterEnd=s[afterIndex];
int stepsToDiff=afterIndex-(i-1);
int gap = afterEnd-s[i-1];
delta = gap/stepsToDiff;
break;
}
}
// renumber from decrease thru anchor or end of list
int fixCnt=0;
for (int j=i; j<afterIndex; ++j) {
s[j] = s[i-1] + (delta*++fixCnt);
System.out.println("s["+j+"]="+s[j]);
ret.add(new Change(j, s[j]));
}
}
}
return ret;
}
class Change {
int i;
int sortOrder;
Change(int i, int sortOrder) {
this.i=i; this.sortOrder=sortOrder;
}
public boolean equals(Change other) {
return Integer.valueOf(i).equals(Integer.valueOf(other.i));
}
public int hashCode() {
return Integer.valueOf(i).hashCode();
}
}

I'd like to explore smarter ways, like making sortOrder floating point except I don't have that option
If you find it easier to think of it in terms of floating point, why not imagine the number as fixed point.
e.g. for the purposes of your algorithm interpret 1000000 as 100.0000. You'll need to choose the point position so that there as many decimal (or binary) places as you can fit given (max number of items in your array+2) vs the integer size. So let's say the max number of entries is 998, you'd need 3 digits before the point, the rest would be available for 'gaps'.
A move operation then can be as simple as setting its new sortnumber to half the sum of the sortnumber of the items either side, i.e. slotting the moved item between its new neighbors. Use 0 and size(array)+1 as the end cases. Again I'm assuming that your UI can record the moves done by the user - regardless I think it should be fairly straightforward to work them out, and a standard sort algorithm could probably be used, just redefine 'swap'.
So for example moving last to first in this array (with imaginary decimal point):
1.0000
2.0000
3.0000
4.0000
5.0000
becomes
1.0000
2.0000
3.0000
4.0000
0.5000 = (0.0000 + 1.0000)/2
giving a sort order of
0.5000
1.0000
2.0000
3.0000
4.0000
Which changes just one record, the last one in the array
Moving last to second would do this:
1.0000
2.0000
3.0000
4.0000
5.0000
Becomes
1.0000
2.0000
3.0000
4.0000
1.5000 = (1.0000+2.0000)/2
resulting in a sort order of
1.0000
1.5000
2.0000
3.0000
4.0000
Again, just one record changed.
You will still need to cater for the case where you you run out of room 'between' two numbers, which you will eventually. I think this is true regardless of algorithm. This will require 'swap' to renumber more entries to make more room. Again regardless of algorithm I don't think you can rule out the case where everything has to be renumbered, it will just be very unlikely. I also suspect that extensive renumbers become more likely over time, again no matter what you do - the available space will fragment. However by choosing the position of the point to give as much room as possible, it should be optimal, i.e. you postpone that as long as possible.
To avoid having to do a more extensive renumber at an inconvenient time, it would probably be advisable to regularly do some kind of batch renumber during quiet periods - basically stretching the gaps again to make room for further user driven sorts. Again, I think you probably need this no matter what method you use.
This is just a sketch and I think it is probably equivalent to any other way of doing it, though perhaps a more intuitive/maintainable way of thinking about it and a way of maximising the room for expansion. Also if you're really worried about poor performance of degenerate cases - and from your description it sounds like you should be - I'd suggest to run whatever algorithm you go with in a test harness with a lot of random data (no database) over a long period, to see how many renumbers it really performs in practice and especially to see if it degrades with use over a long period. I suspect any algorithm for this will.

Following your example you could do something like this:
Walk your numbers array. If the successor of an item x is smaller than x itself walk the array backwards until you find the item y with the minimum difference between y and x+1. Count the steps you walked backwards, take the minimum distance, walk forewards from y and set the items to y+((x+1)-y)/count.

An additional level of indirection may help, e.g. implement a relocatable handle to each row in place of a row index. So instead of dealing with row[x], deal with row[handle[x]]
edit: ok so this is not possible in your situation...can you clarify then how much reordering you expect?
I gather from the phrasing of the question that you expect only M of N items to move, where M is significantly less than N. So you want to avoid N updates - you'd rather have something like M updates.
If M is less than N/2 then it should be faster to define the reordering in terms of swap operations. You don't say what your UI is, but the user is probably effectively doing swap operations there anyhow. So by recording those, or using a standard sort algorithm to get from the original state to the desired state, you should be able to generate the set of M swap operations needed to reorder the elements. That should only require M*2 row updates - i.e. if only two items trade places you need update only 2 rows.
There may be some degenerate cases where this is actually slower than just rewriting everything though - seems unlikely though if as implied by the question it is just the user reordering stuff.

Related

Memory-constrained coin changing for numbers up to one billion

I faced this problem on one training. Namely we have given N different values (N<= 100). Let's name this array A[N], for this array A we are sure that we have 1 in the array and A[i] ≤ 109. Secondly we have given number S where S ≤ 109.
Now we have to solve classic coin problem with this values. Actually we need to find minimum number of element which will sum to exactly S. Every element from A can be used infinite number of times.
Time limit: 1 sec
Memory limit: 256 MB
Example:
S = 1000, N = 10
A[] = {1,12,123,4,5,678,7,8,9,10}. The result is 10.
1000 = 678 + 123 + 123 + 12 + 12 + 12 + 12 + 12 + 12 + 4
What I have tried
I tried to solve this with classic dynamic programming coin problem technique but it uses too much memory and it gives memory limit exceeded.
I can't figure out what should we keep about those values. Thanks in advance.
Here are the couple test cases that cannot be solved with the classic dp coin problem.
S = 1000000000 N = 100
1 373241370 973754081 826685384 491500595 765099032 823328348 462385937
251930295 819055757 641895809 106173894 898709067 513260292 548326059
741996520 959257789 328409680 411542100 329874568 352458265 609729300
389721366 313699758 383922849 104342783 224127933 99215674 37629322
230018005 33875545 767937253 763298440 781853694 420819727 794366283
178777428 881069368 595934934 321543015 27436140 280556657 851680043
318369090 364177373 431592761 487380596 428235724 134037293 372264778
267891476 218390453 550035096 220099490 71718497 860530411 175542466
548997466 884701071 774620807 118472853 432325205 795739616 266609698
242622150 433332316 150791955 691702017 803277687 323953978 521256141
174108096 412366100 813501388 642963957 415051728 740653706 68239387
982329783 619220557 861659596 303476058 85512863 72420422 645130771
228736228 367259743 400311288 105258339 628254036 495010223 40223395
110232856 856929227 25543992 957121494 359385967 533951841 449476607
134830774
OUTPUT FOR THIS TEST CASE: 5
S = 999865497 N = 7
1 267062069 637323855 219276511 404376890 528753603 199747292
OUTPUT FOR THIS TEST CASE: 1129042
S = 1000000000 N = 40
1 12 123 4 5 678 7 8 9 10 400 25 23 1000 67 98 33 46 79 896 11 112 1223 412
532 6781 17 18 19 170 1400 925 723 11000 607 983 313 486 739 896
OUTPUT FOR THIS TEST CASE: 90910
(NOTE: Updated and edited for clarity. Complexity Analysis added at the end.)
OK, here is my solution, including my fixes to the performance issues found by #PeterdeRivaz. I have tested this against all of the test cases provided in the question and the comments and it finishes all in under a second (well, 1.5s in one case), using primarily only the memory for the partial results cache (I'd guess about 16MB).
Rather than using the traditional DP solution (which is both too slow and requires too much memory), I use a Depth-First, Greedy-First combinatorial search with pruning using current best results. I was surprised (very) that this works as well as it does, but I still suspect that you could construct test sets that would take a worst-case exponential amount of time.
First there is a master function that is the only thing that calling code needs to call. It handles all of the setup and initialization and calls everything else. (all code is C#)
// Find the min# of coins for a specified sum
int CountChange(int targetSum, int[] coins)
{
// init the cache for (partial) memoization
PrevResultCache = new PartialResult[1048576];
// make sure the coins are sorted lowest to highest
Array.Sort(coins);
int curBest = targetSum;
int result = CountChange_r(targetSum, coins, coins.GetLength(0)-1, 0, ref curBest);
return result;
}
Because of the problem test-cases raised by #PeterdeRivaz I have also added a partial results cache to handle when there are large numbers in N[] that are close together.
Here is the code for the cache:
// implement a very simple cache for previous results of remainder counts
struct PartialResult
{
public int PartialSum;
public int CoinVal;
public int RemainingCount;
}
PartialResult[] PrevResultCache;
// checks the partial count cache for already calculated results
int PrevAddlCount(int currSum, int currCoinVal)
{
int cacheAddr = currSum & 1048575; // AND with (2^20-1) to get only the first 20 bits
PartialResult prev = PrevResultCache[cacheAddr];
// use it, as long as it's actually the same partial sum
// and the coin value is at least as large as the current coin
if ((prev.PartialSum == currSum) && (prev.CoinVal >= currCoinVal))
{
return prev.RemainingCount;
}
// otherwise flag as empty
return 0;
}
// add or overwrite a new value to the cache
void AddPartialCount(int currSum, int currCoinVal, int remainingCount)
{
int cacheAddr = currSum & 1048575; // AND with (2^20-1) to get only the first 20 bits
PartialResult prev = PrevResultCache[cacheAddr];
// only add if the Sum is different or the result is better
if ((prev.PartialSum != currSum)
|| (prev.CoinVal <= currCoinVal)
|| (prev.RemainingCount == 0)
|| (prev.RemainingCount >= remainingCount)
)
{
prev.PartialSum = currSum;
prev.CoinVal = currCoinVal;
prev.RemainingCount = remainingCount;
PrevResultCache[cacheAddr] = prev;
}
}
And here is the code for the recursive function that does the actual counting:
/*
* Find the minimum number of coins required totaling to a specifuc sum
* using a list of coin denominations passed.
*
* Memory Requirements: O(N) where N is the number of coin denominations
* (primarily for the stack)
*
* CPU requirements: O(Sqrt(S)*N) where S is the target Sum
* (Average, estimated. This is very hard to figure out.)
*/
int CountChange_r(int targetSum, int[] coins, int coinIdx, int curCount, ref int curBest)
{
int coinVal = coins[coinIdx];
int newCount = 0;
// check to see if we are at the end of the search tree (curIdx=0, coinVal=1)
// or we have reached the targetSum
if ((coinVal == 1) || (targetSum == 0))
{
// just use math get the final total for this path/combination
newCount = curCount + targetSum;
// update, if we have a new curBest
if (newCount < curBest) curBest = newCount;
return newCount;
}
// prune this whole branch, if it cannot possibly improve the curBest
int bestPossible = curCount + (targetSum / coinVal);
if (bestPossible >= curBest)
return bestPossible; //NOTE: this is a false answer, but it shouldnt matter
// because we should never use it.
// check the cache to see if a remainder-count for this partial sum
// already exists (and used coins at least as large as ours)
int prevRemCount = PrevAddlCount(targetSum, coinVal);
if (prevRemCount > 0)
{
// it exists, so use it
newCount = prevRemCount + targetSum;
// update, if we have a new curBest
if (newCount < curBest) curBest = newCount;
return newCount;
}
// always try the largest remaining coin first, starting with the
// maximum possible number of that coin (greedy-first searching)
newCount = curCount + targetSum;
for (int cnt = targetSum / coinVal; cnt >= 0; cnt--)
{
int tmpCount = CountChange_r(targetSum - (cnt * coinVal), coins, coinIdx - 1, curCount + cnt, ref curBest);
if (tmpCount < newCount) newCount = tmpCount;
}
// Add our new partial result to the cache
AddPartialCount(targetSum, coinVal, newCount - curCount);
return newCount;
}
Analysis:
Memory: Memory usage is pretty easy to determine for this algorithm. Basiclly there's only the partial results cache and the stack. The cache is fixed at appx. 1 million entries times the size of each entry (3*4 bytes), so about 12MB. The stack is limited to O(N), so together, memory is clearly not a problem.
CPU: The run-time complexity of this algorithm starts out hard to determine and then gets harder, so please excuse me because there's a lot of hand-waving here. I tried to search for an analysis of just the brute-force problem (combinatorial search of sums of N*kn base values summing to S) but not much turned up. What little there was tended to say it was O(N^S), which is clearly too high. I think that a fairer estimate is O(N^(S/N)) or possibly O(N^(S/AVG(N)) or even O(N^(S/(Gmean(N))) where Gmean(N) is the geometric mean of the elements of N[]. This solution starts out with the brute-force combinatorial search and then improves it with two significant optimizations.
The first is the pruning of branches based on estimates of the best possible results for that branch versus what the best result it has already found. If the best-case estimators were perfectly accurate and the work for branches was perfectly distributed, this would mean that if we find a result that is better than 90% of the other possible cases, then pruning would effectively eliminate 90% of the work from that point on. To make a long story short here, this should work out that the amount of work still remaining after pruning should shrink harmonically as it progress. Assuming that some kind of summing/integration should be applied to get a work total, this appears to me to work out to a logarithm of the original work. So let's call it O(Log(N^(S/N)), or O(N*Log(S/N)) which is pretty darn good. (Though O(N*Log(S/Gmean(N))) is probably more accurate).
However, there are two obvious holes with this. First, it is true that the best-case estimators are not perfectly accurate and thus they will not prune as effectively as assumed above, but, this is somewhat counter-balanced by the Greedy-First ordering of the branches which gives the best chances for finding better solutions early in the search which increase the effectiveness of pruning.
The second problem is that the best-case estimator works better when the different values of N are far apart. Specifically, if |(S/n2 - S/n1)| > 1 for any 2 values in N, then it becomes almost perfectly effective. For values of N less than SQRT(S), then even two adjacent values (k, k+1) are far enough apart that that this rule applies. However for increasing values above SQRT(S) a window opens up so that any number of N-values within that window will not be able to effectively prune each other. The size of this window is approximately K/SQRT(S). So if S=10^9, when K is around 10^6 this window will be almost 30 numbers wide. This means that N[] could contain 1 plus every number from 1000001 to 1000029 and the pruning optimization would provide almost no benefit.
To address this, I added the partial results cache which allows memoization of the most recent partial sums up to the target S. This takes advantage of the fact that when the N-values are close together, they will tend to have an extremely high number of duplicates in their sums. As best as I can figure, this effectiveness is approximately the N times the J-th root of the problem size where J = S/K and K is some measure of the average size of the N-values (Gmean(N) is probably the best estimate). If we apply this to the brute-force combinatorial search, assuming that pruning is ineffective, we get O((N^(S/Gmean(N)))^(1/Gmean(N))), which I think is also O(N^(S/(Gmean(N)^2))).
So, at this point take your pick. I know this is really sketchy, and even if it is correct, it is still very sensitive to the distribution of the N-values, so lots of variance.
[I've replaced the previous idea about bit operations because it seems to be too time consuming]
A bit crazy idea and incomplete but may work.
Let's start with introducing f(n,s) which returns number of combinations in which s can be composed from n coins.
Now, how f(n+1,s) is related to f(n)?
One of possible ways to calculate it is:
f(n+1,s)=sum[coin:coins]f(n,s-coin)
For example, if we have coins 1 and 3,
f(0,)=[1,0,0,0,0,0,0,0] - with zero coins we can have only zero sum
f(1,)=[0,1,0,1,0,0,0,0] - what we can have with one coin
f(2,)=[0,0,1,0,2,0,1,0] - what we can have with two coins
We can rewrite it a bit differently:
f(n+1,s)=sum[i=0..max]f(n,s-i)*a(i)
a(i)=1 if we have coin i and 0 otherwise
What we have here is convolution: f(n+1,)=conv(f(n,),a)
https://en.wikipedia.org/wiki/Convolution
Computing it as definition suggests gives O(n^2)
But we can use Fourier transform to reduce it to O(n*log n).
https://en.wikipedia.org/wiki/Convolution#Convolution_theorem
So now we have more-or-less cheap way to find out what numbers are possible with n coins without going incrementally - just calculate n-th power of F(a) and apply inverse Fourier transform.
This allows us to make a kind of binary search which can help handling cases when the answer is big.
As I said the idea is incomplete - for now I have no idea how to combine bit representation with Fourier transforms (to satisfy memory constraint) and whether we will fit into 1 second on any "regular" CPU...

Get N samples given iterator

Given are an iterator it over data points, the number of data points we have n, and the maximum number of samples we want to use to do some calculations (maxSamples).
Imagine a function calculateStatistics(Iterator it, int n, int maxSamples). This function should use the iterator to retrieve the data and do some (heavy) calculations on the data element retrieved.
if n <= maxSamples we will of course use each element we get from the iterator
if n > maxSamples we will have to choose which elements to look at and which to skip
I've been spending quite some time on this. The problem is of course how to choose when to skip an element and when to keep it. My approaches so far:
I don't want to take the first maxSamples coming from the iterator, because the values might not be evenly distributed.
Another idea was to use a random number generator and let me create maxSamples (distinct) random numbers between 0 and n and take the elements at these positions. But if e.g. n = 101 and maxSamples = 100 it gets more and more difficult to find a new distinct number not yet in the list, loosing lot of time just in the random number generation
My last idea was to do the contrary: to generate n - maxSamples random numbers and exclude the data elements at these positions elements. But this also doesn't seem to be a very good solution.
Do you have a good idea for this problem? Are there maybe standard known algorithms for this?
To provide some answer, a good way to collect a set of random numbers given collection size > elements needed, is the following. (in C++ ish pseudo code).
EDIT: you may need to iterate over and create the "someElements" vector first. If your elements are large they can be "pointers" to these elements to save space.
vector randomCollectionFromVector(someElements, numElementsToGrab) {
while(numElementsToGrab--) {
randPosition = rand() % someElements.size();
resultVector.push(someElements.get(randPosition))
someElements.remove(randPosition);
}
return resultVector;
}
If you don't care about changing your vector of elements, you could also remove random elements from someElements, as you mentioned. The algorithm would look very similar, and again, this is conceptually the same idea, you just pass someElements by reference, and manipulate it.
Something worth noting, is the quality of psuedo random distributions as far as how random they are, grows as the size of the distribution you used increases. So, you may tend to get better results if you pick which method you use based on which method results in the use of more random numbers. Example: if you have 100 values, and need 99, you should probably pick 99 values, as this will result in you using 99 pseudo random numbers, instead of just 1. Conversely, if you have 1000 values, and need 99, you should probably prefer the version where you remove 901 values, because you use more numbers from the psuedo random distribution. If what you want is a solid random distribution, this is a very simple optimization, that will greatly increase the quality of "fake randomness" that you see. Alternatively, if performance matters more than distribution, you would take the alternative or even just grab the first 99 values approach.
interval = n/(n-maxSamples) //an euclidian division of course
offset = random(0..(n-1)) //a random number between 0 and n-1
totalSkip = 0
indexSample = 0;
FOR it IN samples DO
indexSample++ // goes from 1 to n
IF totalSkip < (n-maxSamples) AND indexSample+offset % interval == 0 THEN
//do nothing with this sample
totalSkip++
ELSE
//work with this sample
ENDIF
ENDFOR
ASSERT(totalSkip == n-maxSamples) //to be sure
interval represents the distance between two samples to skip.
offset is not mandatory but it allows to have a very little diversity.
Based on the discussion, and greater understanding of your problem, I suggest the following. You can take advantage of a property of prime numbers that I think will net you a very good solution, that will appear to grab pseudo random numbers. It is illustrated in the following code.
#include <iostream>
using namespace std;
int main() {
const int SOME_LARGE_PRIME = 577; //This prime should be larger than the size of your data set.
const int NUM_ELEMENTS = 100;
int lastValue = 0;
for(int i = 0; i < NUM_ELEMENTS; i++) {
lastValue += SOME_LARGE_PRIME;
cout << lastValue % NUM_ELEMENTS << endl;
}
}
Using the logic presented here, you can create a table of all values from 1 to "NUM_ELEMENTS". Because of the properties of prime numbers, you will not get any duplicates until you rotate all the way around back to the size of your data set. If you then take the first "NUM_SAMPLES" of these, and sort them, you can iterate through your data structure, and grab a pseudo random distribution of numbers(not very good random, but more random than a pre-determined interval), without extra space and only one pass over your data. Better yet, you can change the layout of the distribution by grabbing a random prime number each time, again must be larger than your data set, or the following example breaks.
PRIME = 3, data set size = 99. Won't work.
Of course, ultimately this is very similar to the pre-determined interval, but it inserts a level of randomness that you do not get by simply grabbing every "size/num_samples"th element.
This is called the Reservoir sampling

Sorting algorithm for list of integers

I have a list of about 200 integers whose values are between 1 and 5.
I want to get into learning about sorting algorithms and knowing where to apply each because at the moment I use bubble-sort for everything which I've been told is a terrible way to do things.
What would be the fastest sorting algorithm for this integer sorting?
EDIT: It turns out that because I know the numbers are 1 to 5 then I can use a bucket sort (?) algorithm which if I'm not mistaken - and I definitely could be - means that for each integer of value 1, I put it in the 1 group, value 2 I put it in the 2 group etc, then concatenate the groups at the end. This seems like a simple and efficient way to do it.
However since this is (currently) a learning excercise for me I am going to remove the 1 - 5 limitation and try to implement bubble-sort and merge-sort then compare the two to see which is faster.
Thanks for your help!
... which I've been told is a terrible way to do things.
First off, don't accept as gospel anything you hear from random bods on the internet (even me).
Bubble sort is fine under certain conditions, such as when the data is already mostly sorted, or the item count is relatively small (such as 200) (a), or you have no sort functionality built into the language and you're on a tight deadline where lack of performance will annoy the customer but lack of functionality will get you fired :-)
This bias against bubble sort is similar to the "only one exit point from a function" and "no goto" rules. You should understand the reasoning behind them so that you know when the rules can be ignored safely.
Anyway, on to the question proper. An efficient way for your specific case is to just count the items then output them, something like:
dim count[1..5] = {0, 0, 0, 0, 0};
for each item in list:
count[item] = count[item] + 1
for val in 1..5:
for quant in 1..count[val]:
output val
That's an O(n) time and O(1) space solution and you won't find a more efficient big-O for a generalised sort routine - it's only possible in this case because of the extra information you have about the data (limited to the values 1 through 5).
If you wanted to examine all the different sort algorithms, the Wikipedia Sorting Algorithm page is a useful starting point, including the major algorithms and their properties.
(a) As an aside, the following code (using worst case data for bubble sort), when run under CygWin on a not-very-powerful IBM T60 (2GHz dual core) laptop, completes in, on average, 0.157 seconds (5 samples: 0.150, 0.125, 0.192, 0.199, 0.115).
I wouldn't use it for sorting a million items (everyone knows bubble sort scales poorly) but 200 should be fine in most cases:
#include <stdio.h>
#define COUNT 200
int main (void) {
int i, swapped, tmp, item[COUNT];
// Set up worst case (reverse order) data.
for (i = 0; i < COUNT; i++)
item[i] = 200 - i;
// Slightly optimised bubble sort.
swapped = 1;
while (swapped) {
swapped = 0;
for (i = 1; i < COUNT; i++) {
if (item[i-1] > item[i]) {
tmp = item[i-1];
item[i-1] = item[i];
item[i] = tmp;
swapped = 1;
}
}
}
// for (i = 0; i < COUNT; i++)
// printf ("%d ", item[i]);
// putchar ('\n');
return 0;
}
You may not need sorting here, since you only have 5 possible values.
You could use 5 containers (or buckets) and as you scan your list of integers you place the values in the right bucket.
At the end, join the buckets together, in order.
Merge sort is an O(n log n) I think its way better than QuickSort
You can find some C# code here.

Algorithm problem- with the picture attached

I am attaching a picture where I have shown the diagram for which I need to check the good/bad blocks. Basically, I have the information of size of each block and number of rows and column. I also know if the row has even or odd number of blocks.
I need to make a cluster of 2 blocks and check if the resultant block(with the combination of 2) is good or bad. If the 2 blocks are good, then the resultant is good block , otherwise bad.
I need to know the algorithm of it.
If the row has odd numbers of blocks, I am ignoring the middle block and considering the last blocks.
The diagram is in the shape of circle but the blocks on the circumference are ignored. So, I have to consider only the middle block as shown in the picture.
I need to iterate over each row, make a group of 2, find the result. But if the row has odd number of blocks, ignore the middle one, and make a group of last two blocks at the corner.
The shape inside the circle as shown in picture, is the real figure.
I guess, I have given enough information this time.
NOTE: In this example, I making a group of two, but I need to make a group of 2, 3 or 4 blocks in the row ,just like a generic case. If any block in the group is bad,the whole group is bad whether its a group of ,3, or 4.I need to write the code in visual basic language. The size, no. of blocks in the row shown in the picture are not the real data.It is just an example.
I have some type of solution that checks for each block and its surrounding block which is not right. But Can it be done in this way:
Here's solution:
If you are adding two, then one badBlock means both on either side are also bad leading to 3 bad on
1) Set up NxN array of struct {bool inCircle, badBlock, badGroup;} Where inCircle is true if the block is in the circle, badBlock is true if the block is a bad on and initially badGroup is false.
int length=2;
for (int i=0; i<N;i++)
for(int j=0; j<N;j++)
if(array[i,j].badBlock){
for(int x=-length;x<=length;x++)
if(i+x>=0 and i+x<N and array[i+x,j].inCircle) then array[i+x,j].badGroup=true;
for(int y=-length;y<=length;y++)
if(j+y>=0 and j+y<N and array[i,j+y].inCircle) then array[i,j+y].badGroup=true;
}
I also the know the x and Y co-ordinate of each block.
simple recursion will do, pseudo-code:
GroupSize = 2;
bool Calc(row, start, end)
{
if (end-start <= GroupSize -1) return true;
if (end - start < GroupSize*2) //Single group in the middle, but smaller than 2 groups (calculate only the first group size)
{
bool result = true;
for (i = start ; i < GroupSize; i++)
{
result = result && row[i];
}
}
else
{
return Calc(row, start, start + GroupSize) && Calc(row,end-GroupSize,end) && GroupSize(row, start + GroupSize,end-GroupSize);
}
}
Something like that.
The idea is to recursively calculate both sides of the row and then send the middle for some more calculating.
Recursion might be simplest way (or not for everyone), bu any recursion can be turned into a loop.

Algorithms for testing a poker hand for a straight draw (4 to a straight)?

I'm in the throes of writing a poker evaluation library for fun and am looking to add the ability to test for draws (open ended, gutshot) for a given set of cards.
Just wondering what the "state of the art" is for this? I'm trying to keep my memory footprint reasonable, so the idea of using a look up table doesn't sit well but could be a necessary evil.
My current plan is along the lines of:
subtract the lowest rank from the rank of all cards in the set.
look to see if certain sequence i.e.: 0,1,2,3 or 1,2,3,4 (for OESDs) is a subset of the modified collection.
I'm hoping to do better complexity wise, as 7 card or 9 card sets will grind things to a halt using my approach.
Any input and/or better ideas would be appreciated.
The fastest approach probably to assign a bit mask for each card rank (e.g. deuce=1, three=2, four=4, five=8, six=16, seven=32, eight=64, nine=128, ten=256, jack=512, queen=1024, king=2048, ace=4096), and OR together the mask values of all the cards in the hand. Then use an 8192-element lookup table to indicate whether the hand is a straight, an open-ender, a gut-shot, or a nothing of significance (one could also include the various types of backdoor straight draw without affecting execution time).
Incidentally, using different bitmask values, one can quickly detect other useful hands like two-of-a-kind, three-of-a-kind, etc. If one has 64-bit integer math available, use the cube of the indicated bit masks above (so deuce=1, three=8, etc. up to ace=2^36) and add together the values of the cards. If the result, and'ed with 04444444444444 (octal) is non-zero, the hand is a four-of-a kind. Otherwise, if adding plus 01111111111111, and and'ing with 04444444444444 yields non-zero, the hand is a three-of-a-kind or full-house. Otherwise, if the result, and'ed with 02222222222222 is non-zero, the hand is either a pair or two-pair. To see if a hand contains two or more pairs, 'and' the hand value with 02222222222222, and save that value. Subtract 1, and 'and' the result with the saved value. If non-zero, the hand contains at least two pairs (so if it contains a three-of-a-kind, it's a full house; otherwise it's two-pair).
As a parting note, the computation done to check for a straight will also let you determine quickly how many different ranks of card are in the hand. If there are N cards and N different ranks, the hand cannot contain any pairs or better (but might contain a straight or flush, of course). If there are N-1 different ranks, the hand contains precisely one pair. Only if there are fewer different ranks must one use more sophisticated logic (if there are N-2, the hand could be two-pair or three-of-a-kind; if N-3 or fewer, the hand could be a "three-pair" (scores as two-pair), full house, or four-of-a-kind).
One more thing: if you can't manage an 8192-element lookup table, you could use a 512-element lookup table. Compute the bitmask as above, and then do lookups on array[bitmask & 511] and array[bitmask >> 4], and OR the results. Any legitimate straight or draw will register on one or other lookup. Note that this won't directly give you the number of different ranks (since cards six through ten will get counted in both lookups) but one more lookup to the same array (using array[bitmask >> 9]) would count just the jacks through aces.
I know you said you want to keep the memory footprint as small as possible, but there is one quite memory efficient lookup table optimization which I've seen used in some poker hand evaluators and I have used it myself. If you're doing heavy poker simulations and need the best possible performance, you might wanna consider this. Though I admit in this case the difference isn't that big because testing for a straight draw isn't very expensive operation, but the same principle can be used for pretty much every type of hand evaluation in poker programming.
The idea is that we create a kind of a hash function that has the following properties:
1) calculates a unique value for each different set of card ranks
2) is symmetric in the sense that it doesn't depend on the order of the cards
The purpose of this is to reduce the number of elements needed in the lookup table.
A neat way of doing this is to assign a prime number to each rank (2->2, 3->3, 4->5, 5->7, 6->11, 7->13, 8->17, 9->19, T->23, J->29, Q->31, K->37, A->41), and then calculate the product of the primes. For example if the cards are 39TJQQ, then the hash is 36536259.
To create the lookup table you go through all the possible combinations of ranks, and use some simple algorithm to determine whether they form a straight draw. For each combination you also calculate the hash value and then store the results in a map where Key is the hash and Value is the result of the straight draw check. If the maximum number of cards is small (4 or less) then even a linear array might be feasible.
To use the lookup table you first calculate the hash for the particular set of cards and then read the corresponding value from the map.
Here's an example in C++. I don't guarantee that it's working correctly and it could probably be optimized a lot by using a sorted array and binary search instead of hash_map. hash_map is kinda slow for this purpose.
#include <iostream>
#include <vector>
#include <hash_map>
#include <numeric>
using namespace std;
const int MAXCARDS = 9;
stdext::hash_map<long long, bool> lookup;
//"Hash function" that is unique for a each set of card ranks, and also
//symmetric so that the order of cards doesn't matter.
long long hash(const vector<int>& cards)
{
static const int primes[52] = {
2,3,5,7,11,13,17,19,23,29,31,37,41,
2,3,5,7,11,13,17,19,23,29,31,37,41,
2,3,5,7,11,13,17,19,23,29,31,37,41,
2,3,5,7,11,13,17,19,23,29,31,37,41
};
long long res=1;
for(vector<int>::const_iterator i=cards.begin();i!=cards.end();i++)
res *= primes[*i];
return res;
}
//Tests whether there is a straight draw (assuming there is no
//straight). Only used for filling the lookup table.
bool is_draw_slow(const vector<int>& cards)
{
int ranks[14];
memset(ranks,0,14*sizeof(int));
for(vector<int>::const_iterator i=cards.begin();i!=cards.end();i++)
ranks[ *i % 13 + 1 ] = 1;
ranks[0]=ranks[13]; //ace counts also as 1
int count = ranks[0]+ranks[1]+ranks[2]+ranks[3];
for(int i=0; i<=9; i++) {
count += ranks[i+4];
if(count==4)
return true;
count -= ranks[i];
}
return false;
};
void create_lookup_helper(vector<int>& cards, int idx)
{
for(;cards[idx]<13;cards[idx]++) {
if(idx==cards.size()-1)
lookup[hash(cards)] = is_draw_slow(cards);
else {
cards[idx+1] = cards[idx];
create_lookup_helper(cards,idx+1);
}
}
}
void create_lookup()
{
for(int i=1;i<=MAXCARDS;i++) {
vector<int> cards(i);
create_lookup_helper(cards,0);
}
}
//Test for a draw using the lookup table
bool is_draw(const vector<int>& cards)
{
return lookup[hash(cards)];
};
int main(int argc, char* argv[])
{
create_lookup();
cout<<lookup.size()<<endl; //497419
int cards1[] = {1,2,3,4};
int cards2[] = {0,1,2,7,12};
int cards3[] = {3,16,29,42,4,17,30,43};
cout << is_draw(vector<int>(cards1,cards1+4)) <<endl; //true
cout << is_draw(vector<int>(cards2,cards2+5)) <<endl; //true
cout << is_draw(vector<int>(cards3,cards3+8)) <<endl; //false
}
This may be a naive solution, but I am pretty sure it would work, although I am not sure about the perfomance issues.
Assuming again that the cards are represented by the numbers 1 - 13, then if your 4 cards have a numeric range of 3 or 4 (from highest to lowest card rank) and contain no duplicates then you have a possible straight draw.
A range of 3 implies you have an open-ended draw eg 2,3,4,5 has a range of 3 and contains no duplicates.
A range of 4 implies you have a gutshot (as you called it) eg 5,6,8,9 has a range of 4 and contains no duplicates.
Update: per Christian Mann's comment... it can be this:
let's say, A is represented as 1. J as 11, Q as 12, etc.
loop through 1 to 13 as i
if my cards already has this card i, then don't worry about this case, skip to next card
for this card i, look to the left for number of consecutive cards there is
same as above, but look to the right
if count_left_consecutive + count_right_consecutive == 4, then found case
you will need to define the functions to look for the count of left consecutive cards and right consecutive cards... and also handle the case when when looking right consecutive, after K, the A is consecutive.

Resources