What is the fastest possible way to sort an array of 7 integers?

What is the fastest possible way to sort an array of 7 integers? - algorithm

This is a part of a program that analyzes the odds of poker, specifically Texas Hold'em. I have a program I'm happy with, but it needs some small optimizations to be perfect.
I use this type (among others, of course):
type
T7Cards = array[0..6] of integer;
There are two things about this array that may be important when deciding how to sort it:
Every item is a value from 0 to 51. No other values are possible.
There are no duplicates. Never.
With this information, what is the absolutely fastest way to sort this array? I use Delphi, so pascal code would be the best, but I can read C and pseudo, albeit a bit more slowly :-)
At the moment I use quicksort, but the funny thing is that this is almost no faster than bubblesort! Possible because of the small number of items. The sorting counts for almost 50% of the total running time of the method.
EDIT:
Mason Wheeler asked why it's necessary to optimize. One reason is that the method will be called 2118760 times.
Basic poker information: All players are dealt two cards (the pocket) and then five cards are dealt to the table (the 3 first are called the flop, the next is the turn and the last is the river. Each player picks the five best cards to make up their hand)
If I have two cards in the pocket, P1 and P2, I will use the following loops to generate all possible combinations:
for C1 := 0 to 51-4 do
if (C1<>P1) and (C1<>P2) then
for C2 := C1+1 to 51-3 do
if (C2<>P1) and (C2<>P2) then
for C3 := C2+1 to 51-2 do
if (C3<>P1) and (C3<>P2) then
for C4 := C3+1 to 51-1 do
if (C4<>P1) and (C4<>P2) then
for C5 := C4+1 to 51 do
if (C5<>P1) and (C5<>P2) then
begin
//This code will be executed 2 118 760 times
inc(ComboCounter[GetComboFromCards([P1,P2,C1,C2,C3,C4,C5])]);
end;
As I write this I notice one thing more: The last five elements of the array will always be sorted, so it's just a question of putting the first two elements in the right position in the array. That should simplify matters a bit.
So, the new question is: What is the fastest possible way to sort an array of 7 integers when the last 5 elements are already sorted. I believe this could be solved with a couple (?) of if's and swaps :-)

For a very small set, insertion sort can usually beat quicksort because it has very low overhead.
WRT your edit, if you're already mostly in sort order (last 5 elements are already sorted), insertion sort is definitely the way to go. In an almost-sorted set of data, it'll beat quicksort every time, even for large sets. (Especially for large sets! This is insertion sort's best-case scenario and quicksort's worst case.)

Don't know how you are implementing this, but what you could do is have an array of 52 instead of 7, and just insert the card in its slot directly when you get it since there can never be duplicates, that way you never have to sort the array. This might be faster depending on how its used.

I don't know that much about Texas Hold'em: Does it matter what suit P1 and P2 are, or does it only matter if they are of the same suit or not? If only suit(P1)==suit(P2) matters, then you could separate the two cases, you have only 13x12/2 different possibilities for P1/P2, and you can easily precalculate a table for the two cases.
Otherwise, I would suggest something like this:
(* C1 < C2 < P1 *)
for C1:=0 to P1-2 do
for C2:=C1+1 to P1-1 do
Cards[0] = C1;
Cards[1] = C2;
Cards[2] = P1;
(* generate C3...C7 *)
(* C1 < P1 < C2 *)
for C1:=0 to P1-1 do
for C2:=P1+1 to 51 do
Cards[0] = C1;
Cards[1] = P1;
Cards[2] = C2;
(* generate C3...C7 *)
(* P1 < C1 < C2 *)
for C1:=P1+1 to 51 do
for C2:=C1+1 to 51 do
Cards[0] = P1;
Cards[1] = C1;
Cards[2] = C2;
(* generate C3...C7 *)
(this is just a demonstration for one card P1, you would have to expand that for P2, but I think that's straightforward. Although it'll be a lot of typing...)
That way, the sorting doesn't take any time at all. The generated permutations are already ordered.

There are only 5040 permutations of 7 elements. You can programmaticaly generate a program that finds the one represented by your input in a minimal number of comparisons. It will be a big tree of if-then-else instructions, each comparing a fixed pair of nodes, for example if (a[3]<=a[6]).
The tricky part is deciding which 2 elements to compare in a particular internal node. For this, you have to take into account the results of comparisons in the ancestor nodes from root to the particular node (for example a[0]<=a[1], not a[2]<=a[7], a[2]<=a[5]) and the set of possible permutations that satisfy the comparisons. Compare the pair of elements that splits the set into as equal parts as possible (minimize the size of the larger part).
Once you have the permutation, it is trivial to sort it in a minimal set of swaps.

Since the last 5 items are already sorted, the code can be written just to reposition the first 2 items. Since you're using Pascal, I've written and tested a sorting algorithm that can execute 2,118,760 times in about 62 milliseconds.
procedure SortT7Cards(var Cards: T7Cards);
const
CardsLength = Length(Cards);
var
I, J, V: Integer;
V1, V2: Integer;
begin
// Last 5 items will always be sorted, so we want to place the first two into
// the right location.
V1 := Cards[0];
V2 := Cards[1];
if V2 < V1 then
begin
I := V1;
V1 := V2;
V2 := I;
end;
J := 0;
I := 2;
while I < CardsLength do
begin
V := Cards[I];
if V1 < V then
begin
Cards[J] := V1;
Inc(J);
Break;
end;
Cards[J] := V;
Inc(J);
Inc(I);
end;
while I < CardsLength do
begin
V := Cards[I];
if V2 < V then
begin
Cards[J] := V2;
Break;
end;
Cards[J] := V;
Inc(J);
Inc(I);
end;
if J = (CardsLength - 2) then
begin
Cards[J] := V1;
Cards[J + 1] := V2;
end
else if J = (CardsLength - 1) then
begin
Cards[J] := V2;
end;
end;

Use min-sort. Search for minimal and maximal element at once and place them into resultant array. Repeat three times. (EDIT: No, I won't try to measure the speed theoretically :_))
var
cards,result: array[0..6] of integer;
i,min,max: integer;
begin
n=0;
while (n<3) do begin
min:=-1;
max:=52;
for i from 0 to 6 do begin
if cards[i]<min then min:=cards[i]
else if cards[i]>max then max:=cards[i]
end
result[n]:=min;
result[6-n]:=max;
inc(n);
end
for i from 0 to 6 do
if (cards[i]<52) and (cards[i]>=0) then begin
result[3] := cards[i];
break;
end
{ Result is sorted here! }
end

This is the fastest method: since the 5-card list is already sorted, sort the two-card list (a compare & swap), and then merge the two lists, which is O(k * (5+2). In this case (k) will normally be 5: the loop test(1), the compare(2), the copy(3), the input-list increment(4) and the output list increment(5). That's 35 + 2.5. Throw in loop initialization and you get 41.5 statements, total.
You could also unroll the loops which would save you maybe 8 statements or execution, but make the whole routine about 4-5 times longer which may mess with your instruction cache hit ratio.
Given P(0 to 2), C(0 to 5) and copying to H(0 to 6)
with C() already sorted (ascending):
If P(0) > P(1) Then
// Swap:
T = P(0)
P(0) = P(1)
P(1) = T
// 1stmt + (3stmt * 50%) = 2.5stmt
End
P(2), C(5) = 53 \\ Note these are end-of-list flags
k = 0 \\ P() index
J = 0 \\ H() index
i = 0 \\ C() index
// 4 stmt
Do While (j) < 7
If P(k) < C(I) then
H(j) = P(k)
k = k+1
Else
H(j) = C(i)
j = j+1
End if
j = j+1
// 5stmt * 7loops = 35stmt
Loop
And note that this is faster than the other algorithm that would be "fastest" if you had to truly sort all 7 cards: use a bit-mask(52) to map & bit-set all 7 cards into that range of all possible 52 cards (the bit-mask), and then scan the bit-mask in order looking for the 7 bits that are set. That takes 60-120 statements at best (but is still faster than any other sorting approach).

For seven numbers, the most efficient algorithm that exists with regards to the number of comparisons is Ford-Johnson's. In fact, wikipedia references a paper, easily found on google, that claims Ford-Johnson's the best for up to 47 numbers. Unfortunately, references to Ford-Johnson's aren't all that easy to found, and the algorithm uses some complex data structures.
It appears on The Art Of Computer Programming, Volume 3, by Donald Knuth, if you have access to that book.
There's a paper which describes FJ and a more memory efficient version here.
At any rate, because of the memory overhead of that algorithm, I doubt it would be worth your while for integers, as the cost of comparing two integers is rather cheap compared to the cost of allocating memory and manipulating pointers.
Now, you mentioned that 5 cards are already sorted, and you just need to insert two. You can do this with insertion sort most efficiently like this:
Order the two cards so that P1 > P2
Insert P1 going from the high end to the low end
(list) Insert P2 going from after P1 to the low end
(array) Insert P2 going from the low end to the high end
How you do that will depend on the data structure. With an array you'll be swapping each element, so place P1 at 1st, P2 and 7th (ordered high to low), and then swap P1 up, and then P2 down. With a list, you just need to fix the pointers as appropriate.
However once more, because of the particularity of your code, it really is best if you follow nikie suggestion and just generate the for loops apropriately for every variation in which P1 and P2 can appear in the list.
For example, sort P1 and P2 so that P1 < P2. Let's make Po1 and Po2 the position from 0 to 6, of P1 and P2 on the list. Then do this:
Loop Po1 from 0 to 5
Loop Po2 from Po1 + 1 to 6
If (Po2 == 1) C1start := P2 + 1; C1end := 51 - 4
If (Po1 == 0 && Po2 == 2) C1start := P1+1; C1end := P2 - 1
If (Po1 == 0 && Po2 > 2) C1start := P1+1; C1end := 51 - 5
If (Po1 > 0) C1start := 0; C1end := 51 - 6
for C1 := C1start to C1end
// Repeat logic to compute C2start and C2end
// C2 can begin at C1+1, P1+1 or P2+1
// C2 can finish at P1-1, P2-1, 51 - 3, 51 - 4 or 51 -5
etc
You then call a function passing Po1, Po2, P1, P2, C1, C2, C3, C4, C5, and have this function return all possible permutations based on Po1 and Po2 (that's 36 combinations).
Personally, I think that's the fastest you can get. You completely avoid having to order anything, because the data will be pre-ordered. You incur in some comparisons anyway to compute the starts and ends, but their cost is minimized as most of them will be on the outermost loops, so they won't be repeated much. And they can even be more optimized at the cost of more code duplication.

For 7 elements, there are only few options. You can easily write a generator that produces method to sort all possible combinations of 7 elements. Something like this method for 3 elements:
if a[1] < a[2] {
if a[2] < a[3] {
// nothing to do, a[1] < a[2] < a[3]
} else {
if a[1] < a[3] {
// correct order should be a[1], a[3], a[2]
swap a[2], a[3]
} else {
// correct order should be a[3], a[1], a[2]
swap a[2], a[3]
swap a[1], a[3]
}
}
} else {
// here we know that a[1] >= a[2]
...
}
Of course method for 7 elements will be bigger, but it's not that hard to generate.

The code below is close to optimal. It could be made better by composing a list to be traversed while making the tree, but I'm out of time right now. Cheers!
object Sort7 {
def left(i: Int) = i * 4
def right(i: Int) = i * 4 + 1
def up(i: Int) = i * 4 + 2
def value(i: Int) = i * 4 + 3
val a = new Array[Int](7 * 4)
def reset = {
0 until 7 foreach {
i => {
a(left(i)) = -1
a(right(i)) = -1
a(up(i)) = -1
a(value(i)) = scala.util.Random.nextInt(52)
}
}
}
def sortN(i : Int) {
var index = 0
def getNext = if (a(value(i)) < a(value(index))) left(index) else right(index)
var next = getNext
while(a(next) != -1) {
index = a(next)
next = getNext
}
a(next) = i
a(up(i)) = index
}
def sort = 1 until 7 foreach (sortN(_))
def print {
traverse(0)
def traverse(i: Int): Unit = {
if (i != -1) {
traverse(a(left(i)))
println(a(value(i)))
traverse(a(right(i)))
}
}
}
}

In pseudo code:
int64 temp = 0;
int index, bit_position;
for index := 0 to 6 do
temp |= 1 << cards[index];
for index := 0 to 6 do
begin
bit_position = find_first_set(temp);
temp &= ~(1 << bit_position);
cards[index] = bit_position;
end;
It's an application of bucket sort, which should generally be faster than any of the comparison sorts that were suggested.
Note: The second part could also be implemented by iterating over bits in linear time, but in practice it may not be faster:
index = 0;
for bit_position := 0 to 51 do
begin
if (temp & (1 << bit_position)) > 0 then
begin
cards[index] = bit_position;
index++;
end;
end;

Assuming that you need an array of cards at the end of it.
Map the original cards to bits in a 64 bit integer ( or any integer with >= 52 bits ).
If during the initial mapping the array is sorted, don't change it.
Partition the integer into nibbles - each will correspond to values 0x0 to 0xf.
Use the nibbles as indices to corresponding sorted sub-arrays. You'll need 13 sets of 16 sub-arrays ( or just 16 sub-arrays and use a second indirection, or do the bit ops rather than looking the answer up; what is faster will vary by platform ).
Concatenate the non-empty sub-arrays into the final array.
You could use larger than nibbles if you want; bytes would give 7 sets of 256 arrays and make it more likely that the non-empty arrays require concatenating.
This assumes that branches are expensive and cached array accesses cheap.
#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>
// for general case of 7 from 52, rather than assuming last 5 sorted
uint32_t card_masks[16][5] = {
{ 0, 0, 0, 0, 0 },
{ 1, 0, 0, 0, 0 },
{ 2, 0, 0, 0, 0 },
{ 1, 2, 0, 0, 0 },
{ 3, 0, 0, 0, 0 },
{ 1, 3, 0, 0, 0 },
{ 2, 3, 0, 0, 0 },
{ 1, 2, 3, 0, 0 },
{ 4, 0, 0, 0, 0 },
{ 1, 4, 0, 0, 0 },
{ 2, 4, 0, 0, 0 },
{ 1, 2, 4, 0, 0 },
{ 3, 4, 0, 0, 0 },
{ 1, 3, 4, 0, 0 },
{ 2, 3, 4, 0, 0 },
{ 1, 2, 3, 4, 0 },
};
void sort7 ( uint32_t* cards) {
uint64_t bitset = ( ( 1LL << cards[ 0 ] ) | ( 1LL << cards[ 1LL ] ) | ( 1LL << cards[ 2 ] ) | ( 1LL << cards[ 3 ] ) | ( 1LL << cards[ 4 ] ) | ( 1LL << cards[ 5 ] ) | ( 1LL << cards[ 6 ] ) ) >> 1;
uint32_t* p = cards;
uint32_t base = 0;
do {
uint32_t* card_mask = card_masks[ bitset & 0xf ];
// you might remove this test somehow, as well as unrolling the outer loop
// having separate arrays for each nibble would save 7 additions and the increment of base
while ( *card_mask )
*(p++) = base + *(card_mask++);
bitset >>= 4;
base += 4;
} while ( bitset );
}
void print_cards ( uint32_t* cards ) {
printf ( "[ %d %d %d %d %d %d %d ]\n", cards[0], cards[1], cards[2], cards[3], cards[4], cards[5], cards[6] );
}
int main ( void ) {
uint32_t cards[7] = { 3, 9, 23, 17, 2, 42, 52 };
print_cards ( cards );
sort7 ( cards );
print_cards ( cards );
return 0;
}

Use a sorting network, like in this C++ code:
template<class T>
inline void sort7(T data) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
//DD = Define Data, create a local copy of the data to aid the optimizer.
#define DD1(a) register auto data##a=*(data+a);
#define DD2(a,b) register auto data##a=*(data+a);register auto data##b=*(data+b);
//CB = Copy Back
#define CB1(a) *(data+a)=data##a;
#define CB2(a,b) *(data+a)=data##a;*(data+b)=data##b;
DD2(1,2) SORT2(1,2)
DD2(3,4) SORT2(3,4)
DD2(5,6) SORT2(5,6)
DD1(0) SORT2(0,2)
SORT2(3,5)
SORT2(4,6)
SORT2(0,1)
SORT2(4,5)
SORT2(2,6) CB1(6)
SORT2(0,4)
SORT2(1,5)
SORT2(0,3) CB1(0)
SORT2(2,5) CB1(5)
SORT2(1,3) CB1(1)
SORT2(2,4) CB1(4)
SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}
Use the function above if you want to pass it an iterator or a pointer and use the function below if you want to pass it the seven arguments one by one. BTW, using templates allows compilers to generate really optimized code so don't get ride of the template<> unless you want C code (or some other language's code).
template<class T>
inline void sort7(T& e0, T& e1, T& e2, T& e3, T& e4, T& e5, T& e6) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
#define DD1(a) register auto data##a=e##a;
#define DD2(a,b) register auto data##a=e##a;register auto data##b=e##b;
#define CB1(a) e##a=data##a;
#define CB2(a,b) e##a=data##a;e##b=data##b;
DD2(1,2) SORT2(1,2)
DD2(3,4) SORT2(3,4)
DD2(5,6) SORT2(5,6)
DD1(0) SORT2(0,2)
SORT2(3,5)
SORT2(4,6)
SORT2(0,1)
SORT2(4,5)
SORT2(2,6) CB1(6)
SORT2(0,4)
SORT2(1,5)
SORT2(0,3) CB1(0)
SORT2(2,5) CB1(5)
SORT2(1,3) CB1(1)
SORT2(2,4) CB1(4)
SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}

Take a look at this:
http://en.wikipedia.org/wiki/Sorting_algorithm
You would need to pick one that will have a stable worst case cost...
Another option could be to keep the array sorted the whole time, so an addition of a card would keep the array sorted automatically, that way you could skip to sorting...

What JRL is referring to is a bucket sort. Since you have a finite discrete set of possible values, you can declare 52 buckets and just drop each element in a bucket in O(1) time. Hence bucket sort is O(n). Without the guarantee of a finite number of different elements, the fastest theoretical sort is O(n log n) which things like merge sort an quick sort are. It's just a balance of best and worst case scenarios then.
But long answer short, use bucket sort.

If you like the above mentioned suggestion to keep a 52 element array which always keeps your array sorted, then may be you could keep another list of 7 elements which would reference the 7 valid elements in the 52 element array. This way we can even avoid parsing the 52 element array.
I guess for this to be really efficient, we would need to have a linked list type of structure which be supports operations: InsertAtPosition() and DeleteAtPosition() and be efficient at that.

There are a lot of loops in the answers. Given his speed requirement and the tiny size of the data set I would not do ANY loops.
I have not tried it but I suspect the best answer is a fully unrolled bubble sort. It would also probably gain a fair amount of advantage from being done in assembly.
I wonder if this is the right approach, though. How are you going to analyze a 7 card hand?? I think you're going to end up converting it to some other representation for analysis anyway. Would not a 4x13 array be a more useful representation? (And it would render the sorting issue moot, anyway.)

Considering that last 5 elements are always sorted:
for i := 0 to 1 do begin
j := i;
x := array[j];
while (j+1 <= 6) and (array[j+1] < x) do begin
array[j] := array[j+1];
inc(j);
end;
array[j] := X;
end;

bubble sort is your friend. Other sorts have too many overhead codes and not suitable for small number of elements
Cheers

Here is your basic O(n) sort. I'm not sure how it compares to the others. It uses unrolled loops.
char card[7]; // the original table of 7 numbers in range 0..51
char table[52]; // workspace
// clear the workspace
memset(table, 0, sizeof(table));
// set the 7 bits corresponding to the 7 cards
table[card[0]] = 1;
table[card[1]] = 1;
...
table[card[6]] = 1;
// read the cards back out
int j = 0;
if (table[0]) card[j++] = 0;
if (table[1]) card[j++] = 1;
...
if (table[51]) card[j++] = 51;

If you are looking for a very low overhead, optimal sort, you should create a sorting network. You can generate the code for a 7 integer network using the Bose-Nelson algorithm.
This would guarentee a fixed number of compares and an equal number of swaps in the worst case.
The generated code is ugly, but it is optimal.

Your data is in a sorted array and I'll assume you swap the new two if needed so also sorted, so
a. if you want to keep it in place then use a form of insertion sort;
b. if you want to have it the result in another array do a merging by copying.
With the small numbers, binary chop is overkill, and ternary chop is appropriate anyway:
One new card will mostly like split into two and three, viz. 2+3 or 3+2,
two cards into singles and pairs, e.g. 2+1+2.
So the most time-space efficient approach to placing the smaller new card is to compare with a[1] (viz. skip a[0]) and then search left or right to find the card it should displace, then swap and move right (shifting rather than bubbling), comparing with the larger new card till you find where it goes. After this you'll be shifting forward by twos (two cards have been inserted).
The variables holding the new cards (and swaps) should be registers.
The look up approach would be faster but use more memory.

Related

Reconstructing input to encoder from output

I would like to understand how to solve the Codility ArrayRecovery challenge, but I don't even know what branch of knowledge to consult. Is it combinatorics, optimization, computer science, set theory, or something else?
Edit:
The branch of knowledge to consult is constraint programming, particularly constraint propagation. You also need some combinatorics to know that if you take k numbers at a time from the range [1..n], with the restriction that no number can be bigger than the one before it, that works out to be
(n+k-1)!/k!(n-1)! possible combinations
which is the same as the number of combinations with replacements of n things taken k at a time, which has the mathematical notation . You can read about why it works out like that here.
Peter Norvig provides an excellent example of how to solve this kind of problem with his Sudoku solver.
You can read the full description of the ArrayRecovery problem via the link above. The short story is that there is an encoder that takes a sequence of integers in the range 1 up to some given limit (say 100 for our purposes) and for each element of the input sequence outputs the most recently seen integer that is smaller than the current input, or 0 if none exists.
input 1, 2, 3, 4 => output 0, 1, 2, 3
input 2, 4, 3 => output 0, 2, 2
The full task is, given the output (and the range of allowable input), figure out how many possible inputs could have generated it. But before I even get to that calculation, I'm not confident about how to even approach formulating the equation. That is what I am asking for help with. (Of course a full solution would be welcome, too, if it is explained.)
I just look at some possible outputs and wonder. Here are some sample encoder outputs and the inputs I can come up with, with * meaning any valid input and something like > 4 meaning any valid input greater than 4. If needed, inputs are referred to as A1, A2, A3, ... (1-based indexing)
Edit #2
Part of the problem I was having with this challenge is that I did not manually generate the exactly correct sets of possible inputs for an output. I believe the set below is correct now. Look at this answer's edit history if you want to see my earlier mistakes.
output #1: 0, 0, 0, 4
possible inputs: [>= 4, A1 >= * >= 4, 4, > 4]
output #2: 0, 0, 0, 2, 3, 4 # A5 ↴ See more in discussion below
possible inputs: [>= 2, A1 >= * >=2, 2, 3, 4, > 4]
output #3: 0, 0, 0, 4, 3, 1
possible inputs: none # [4, 3, 1, 1 >= * > 4, 4, > 1] but there is no number 1 >= * > 4
The second input sequence is very tightly constrained compared to the first just by adding 2 more outputs. The third sequence is so constrained as to be impossible.
But the set of constraints on A5 in example #2 is a bit harder to articulate. Of course A5 > O5, that is the basic constraint on all the inputs. But any output > A4 and after O5 has to appear in the input after A4, so A5 has to be an element of the set of numbers that comes after A5 that is also > A4. Since there is only 1 such number (A6 == 4), A5 has to be it, but it gets more complicated if there is a longer string of numbers that follow. (Editor's note: actually it doesn't.)
As the output set gets longer, I worry these constraints just get more complicated and harder to get right. I cannot think of any data structures for efficiently representing these in a way that leads to efficiently calculating the number of possible combinations. I also don't quite see how to algorithmically add constraint sets together.
Here are the constraints I see so far for any given An
An > On
An <= min(Set of other possible numbers from O1 to n-1 > On). How to define the set of possible numbers greater than On?
Numbers greater than On that came after the most recent occurrence of On in the input
An >= max(Set of other possible numbers from O1 to n-1 < On). How to define the set of possible numbers less than On?
Actually this set is empty because On is, by definition, the largest possible number from the previous input sequence. (Which it not to say it is strictly the largest number from the previous input sequence.)
Any number smaller than On that came before the last occurrence of it in the input would be ineligible because of the "nearest" rule. No numbers smaller that On could have occurred after the most recent occurrence because of the "nearest" rule and because of the transitive property: if Ai < On and Aj < Ai then Aj < On
Then there is the set theory:
An must be an element of the set of unaccounted-for elements of the set of On+1 to Om, where m is the smallest m > n such that Om < On. Any output after such Om and larger than Om (which An is) would have to appear as or after Am.
An element is unaccounted-for if it is seen in the output but does not appear in the input in a position that is consistent with the rest of the output. Obviously I need a better definition than this in order to code and algorithm to calculate it.
It seems like perhaps some kind of set theory and/or combinatorics or maybe linear algebra would help with figuring out the number of possible sequences that would account for all of the unaccounted-for outputs and fit the other constraints. (Editor's note: actually, things never get that complicated.)

The code below passes all of Codility's tests. The OP added a main function to use it on the command line.
The constraints are not as complex as the OP thinks. In particular, there is never a situation where you need to add a restriction that an input be an element of some set of specific integers seen elsewhere in the output. Every input position has a well-defined minimum and maximum.
The only complication to that rule is that sometimes the maximum is "the value of the previous input" and that input itself has a range. But even then, all the values like that are consecutive and have the same range, so the number of possibilities can be calculated with basic combinatorics, and those inputs as a group are independent of the other inputs (which only serve to set the range), so the possibilities of that group can be combined with the possibilities of other input positions by simple multiplication.
Algorithm overview
The algorithm makes a single pass through the output array updating the possible numbers of input arrays after every span, which is what I am calling repetitions of numbers in the output. (You might say maximal subsequences of the output where every element is identical.) For example, for output 0,1,1,2 we have three spans: 0, 1,1 and 2. When a new span begins, the number of possibilities for the previous span is calculated.
This decision was based on a few observations:
For spans longer than 1 in length, the minimum value of the input
allowed in the first position is whatever the value is of the input
in the second position. Calculating the number of possibilities of a
span is straightforward combinatorics, but the standard formula
requires knowing the range of the numbers and the length of the span.
Every time the value of the
output changes (and a new span beings), that strongly constrains the value of the previous span:
When the output goes up, the only possible reason is that the previous input was the value of the new, higher output and the input corresponding to the position of the new, higher output, was even higher.
When an output goes down, new constraints are established, but those are a bit harder to articulate. The algorithm stores stairs (see below) in order to quantify the constraints imposed when the output goes down
The aim here was to confine the range of possible values for every span. Once we do that accurately, calculating the number of combinations is straightforward.
Because the encoder backtracks looking to output a number that relates to the input in 2 ways, both smaller and closer, we know we can throw out numbers that are larger and farther away. After a small number appears in the output, no larger number from before that position can have any influence on what follows.
So to confine these ranges of input when the output sequence decreased, we need to store stairs - a list of increasingly larger possible values for the position in the original array. E.g for 0,2,5,7,2,4 stairs build up like this: 0, 0,2, 0,2,5, 0,2,5,7, 0,2, 0,2,4.
Using these bounds we can tell for sure that the number in the position of the second 2 (next to last position in the example) must be in (2,5], because 5 is the next stair. If the input were greater than 5, a 5 would have been output in that space instead of a 2. Observe, that if the last number in the encoded array was not 4, but 6, we would exit early returning 0, because we know that the previous number couldn't be bigger than 5.
The complexity is O(n*lg(min(n,m))).
Functions
CombinationsWithReplacement - counts number of combinations with replacements of size k from n numbers. E.g. for (3, 2) it counts 3,3, 3,2, 3,1, 2,2, 2,1, 1,1, so returns 6 It is the same as choose(n - 1 + k, n - 1).
nextBigger - finds next bigger element in a range. E.g. for 4 in sub-array 1,2,3,4,5 it returns 5, and in sub-array 1,3 it returns its parameter Max.
countSpan (lambda) - counts how many different combinations a span we have just passed can have. Consider span 2,2 for 0,2,5,7,2,2,7.
When curr gets to the final position, curr is 7 and prev is the final 2 of the 2,2 span.
It computes maximum and minimum possible values of the prev span. At this point stairs consist of 2,5,7 then maximum possible value is 5 (nextBigger after 2 in the stair 2,5,7). A value of greater than 5 in this span would have output a 5, not a 2.
It computes a minimum value for the span (which is the minimum value for every element in the span), which is prev at this point, (remember curr at this moment equals to 7 and prev to 2). We know for sure that in place of the final 2 output, the original input has to have 7, so the minimum is 7. (This is a consequence of the "output goes up" rule. If we had 7,7,2 and curr would be 2 then the minimum for the previous span (the 7,7) would be 8 which is prev + 1.
It adjusts the number of combinations. For a span of length L with a range of n possibilities (1+max-min), there are possibilities, with k being either L or L-1 depending on what follows the span.
For a span followed by a larger number, like 2,2,7, k = L - 1 because the last position of the 2,2 span has to be 7 (the value of the first number after the span).
For a span followed by a smaller number, like 7,7,2, k = L because
the last element of 7,7 has no special constraints.
Finally, it calls CombinationsWithReplacement to find out the number of branches (or possibilities), computes new res partial results value (remainder values in the modulo arithmetic we are doing), and returns new res value and max for further handling.
solution - iterates over the given Encoder Output array. In the main loop, while in a span it counts the span length, and at span boundaries it updates res by calling countSpan and possibly updates the stairs.
If the current span consists of a bigger number than the previous one, then:
Check validity of the next number. E.g 0,2,5,2,7 is invalid input, becuase there is can't be 7 in the next-to-last position, only 3, or 4, or 5.
It updates the stairs. When we have seen only 0,2, the stairs are 0,2, but after the next 5, the stairs become 0,2,5.
If the current span consists of a smaller number then the previous one, then:
It updates stairs. When we have seen only 0,2,5, our stairs are 0,2,5, but after we have seen 0,2,5,2 the stairs become 0,2.
After the main loop it accounts for the last span by calling countSpan with -1 which triggers the "output goes down" branch of calculations.
normalizeMod, extendedEuclidInternal, extendedEuclid, invMod - these auxiliary functions help to deal with modulo arithmetic.
For stairs I use storage for the encoded array, as the number of stairs never exceeds current position.
#include <algorithm>
#include <cassert>
#include <vector>
#include <tuple>
const int Modulus = 1'000'000'007;
int CombinationsWithReplacement(int n, int k);
template <class It>
auto nextBigger(It begin, It end, int value, int Max) {
auto maxIt = std::upper_bound(begin, end, value);
auto max = Max;
if (maxIt != end) {
max = *maxIt;
}
return max;
}
auto solution(std::vector<int> &B, const int Max) {
auto res = 1;
const auto size = (int)B.size();
auto spanLength = 1;
auto prev = 0;
// Stairs is the list of numbers which could be smaller than number in the next position
const auto stairsBegin = B.begin();
// This includes first entry (zero) into stairs
// We need to include 0 because we can meet another zero later in encoded array
// and we need to be able to find in stairs
auto stairsEnd = stairsBegin + 1;
auto countSpan = [&](int curr) {
const auto max = nextBigger(stairsBegin, stairsEnd, prev, Max);
// At the moment when we switch from the current span to the next span
// prev is the number from previous span and curr from current.
// E.g. 1,1,7, when we move to the third position cur = 7 and prev = 1.
// Observe that, in this case minimum value possible in place of any of 1's can be at least 2=1+1=prev+1.
// But if we consider 7, then we have even more stringent condition for numbers in place of 1, it is 7
const auto min = std::max(prev + 1, curr);
const bool countLast = prev > curr;
const auto branchesCount = CombinationsWithReplacement(max - min + 1, spanLength - (countLast ? 0 : 1));
return std::make_pair(res * (long long)branchesCount % Modulus, max);
};
for (int i = 1; i < size; ++i) {
const auto curr = B[i];
if (curr == prev) {
++spanLength;
}
else {
int max;
std::tie(res, max) = countSpan(curr);
if (prev < curr) {
if (curr > max) {
// 0,1,5,1,7 - invalid because number in the fourth position lies in [2,5]
// and so in the fifth encoded position we can't something bigger than 5
return 0;
}
// It is time to possibly shrink stairs.
// E.g if we had stairs 0,2,4,9,17 and current value is 5,
// then we no more interested in 9 and 17, and we change stairs to 0,2,4,5.
// That's because any number bigger than 9 or 17 also bigger than 5.
const auto s = std::lower_bound(stairsBegin, stairsEnd, curr);
stairsEnd = s;
*stairsEnd++ = curr;
}
else {
assert(curr < prev);
auto it = std::lower_bound(stairsBegin, stairsEnd, curr);
if (it == stairsEnd || *it != curr) {
// 0,5,1 is invalid sequence because original sequence lloks like this 5,>5,>1
// and there is no 1 in any of the two first positions, so
// it can't appear in the third position of the encoded array
return 0;
}
}
spanLength = 1;
}
prev = curr;
}
res = countSpan(-1).first;
return res;
}
template <class T> T normalizeMod(T a, T m) {
if (a < 0) return a + m;
return a;
}
template <class T> std::pair<T, std::pair<T, T>> extendedEuclidInternal(T a, T b) {
T old_x = 1;
T old_y = 0;
T x = 0;
T y = 1;
while (true) {
T q = a / b;
T t = a - b * q;
if (t == 0) {
break;
}
a = b;
b = t;
t = x; x = old_x - x * q; old_x = t;
t = y; y = old_y - y * q; old_y = t;
}
return std::make_pair(b, std::make_pair(x, y));
}
// Returns gcd and Bezout's coefficients
template <class T> std::pair<T, std::pair<T, T>> extendedEuclid(T a, T b) {
if (a > b) {
if (b == 0) return std::make_pair(a, std::make_pair(1, 0));
return extendedEuclidInternal(a, b);
}
else {
if (a == 0) return std::make_pair(b, std::make_pair(0, 1));
auto p = extendedEuclidInternal(b, a);
std::swap(p.second.first, p.second.second);
return p;
}
}
template <class T> T invMod(T a, T m) {
auto p = extendedEuclid(a, m);
assert(p.first == 1);
return normalizeMod(p.second.first, m);
}
int CombinationsWithReplacement(int n, int k) {
int res = 1;
for (long long i = n; i < n + k; ++i) {
res = res * i % Modulus;
}
int denom = 1;
for (long long i = k; i > 0; --i) {
denom = denom * i % Modulus;
}
res = res * (long long)invMod(denom, Modulus) % Modulus;
return res;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: 0 1 2,3, 4 M
// Last arg is M, the max value for an input.
// Remaining args are B (the output of the encoder) separated by commas and/or spaces
// Parentheses and brackets are ignored, so you can use the same input form as Codility's tests: ([1,2,3], M)
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,[]()";
if (argc < 2 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
for (int i = 1; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
Max = B.back();
B.pop_back();
printf("%d\n", solution(B, Max));
return 0;
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
#include <string.h>
// Usage: M 0 1 2,3, 4
// first arg is M, the max value for an input.
// remaining args are B (the output of the encoder) separated by commas and/or spaces
int main(int argc, char* argv[]) {
int Max;
std::vector<int> B;
const char* delim = " ,";
if (argc < 3 ) {
printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
return 1;
}
Max = atoi(argv[1]);
for (int i = 2; i < argc; i++) {
char* parse;
parse = strtok(argv[i], delim);
while (parse != NULL)
{
B.push_back(atoi(parse));
parse = strtok (NULL, delim);
}
}
printf("%d\n", solution(B, Max));
return 0;
}
Let's see an example:
Max = 5
Array is
0 1 3 0 1 1 3
1
1 2..5
1 3 4..5
1 3 4..5 1
1 3 4..5 1 2..5
1 3 4..5 1 2..5 >=..2 (sorry, for a cumbersome way of writing)
1 3 4..5 1 3..5 >=..3 4..5
Now count:
1 1 2 1 3 2 which amounts to 12 total.

Here's an idea. One known method to construct the output is to use a stack. We pop it while the element is greater or equal, then output the smaller element if it exists, then push the greater element onto the stack. Now what if we attempted to do this backwards from the output?
First we'll demonstrate the stack method using the c∅dility example.
[2, 5, 3, 7, 9, 6]
2: output 0, stack [2]
5: output 2, stack [2,5]
3: pop 5, output, 2, stack [2,3]
7: output 3, stack [2,3,7]
... etc.
Final output: [0, 2, 2, 3, 7, 3]
Now let's try reconstruction! We'll use stack both as the imaginary stack and as the reconstituted input:
(Input: [2, 5, 3, 7, 9, 6])
Output: [0, 2, 2, 3, 7, 3]
* Something >3 that reached 3 in the stack
stack = [3, 3 < *]
* Something >7 that reached 7 in the stack
but both of those would've popped before 3
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >3, 7 qualifies
stack = [3, 7, 7 < x, 3 < * <= x]
* Something >2, 3 qualifies
stack = [2, 3, 7, 7 < x, 3 < * <= x]
* Something >2 and >=3 since 3 reached 2
stack = [2, 2 < *, 3, 7, 7 < x, 3 < * <= x]
Let's attempt your examples:
Example 1:
[0, 0, 0, 2, 3, 4]
* Something >4
stack = [4, 4 < *]
* Something >3, 4 qualifies
stack = [3, 4, 4 < *]
* Something >2, 3 qualifies
stack = [2, 3, 4, 4 < *]
* The rest is non-increasing with lowerbound 2
stack = [y >= x, x >= 2, 2, 3, 4, >4]
Example 2:
[0, 0, 0, 4]
* Something >4
stack [4, 4 < *]
* Non-increasing
stack = [z >= y, y >= 4, 4, 4 < *]
Calculating the number of combinations is achieved by multiplying together the possibilities for all the sections. A section is either a bounded single cell; or a bound, non-increasing subarray of one or more cells. To calculate the latter we use the multi-choose binomial, (n + k - 1) choose (k - 1). Consider that we can express the differences between the cells of a bound, non-increasing sequence of 3 cells as:
(ub - cell_3) + (cell_3 - cell_2) + (cell_2 - cell_1) + (cell_1 - lb) = ub - lb
Then the number of ways to distribute ub - lb into (x + 1) cells is
(n + k - 1) choose (k - 1)
or
(ub - lb + x) choose x
For example, the number of non-increasing sequences between
(3,4) in two cells is (4 - 3 + 2) choose 2 = 3: [3,3] [4,3] [4,4]
And the number of non-increasing sequences between
(3,4) in three cells is (4 - 3 + 3) choose 3 = 4: [3,3,3] [4,3,3] [4,4,3] [4,4,4]
(Explanation attributed to Brian M. Scott.)
Rough JavaScript sketch (the code is unreliable; it's only meant to illustrate the encoding. The encoder lists [lower_bound, upper_bound], or a non-increasing sequence as [non_inc, length, lower_bound, upper_bound]):
function f(A, M){
console.log(JSON.stringify(A), M);
let i = A.length - 1;
let last = A[i];
let s = [[last,last]];
if (A[i-1] == last){
let d = 1;
s.splice(1,0,['non_inc',d++,last,M]);
while (i > 0 && A[i-1] == last){
s.splice(1,0,['non_inc',d++,last,M]);
i--
}
} else {
s.push([last+1,M]);
i--;
}
if (i == 0)
s.splice(0,1);
for (; i>0; i--){
let x = A[i];
if (x < s[0][0])
s = [[x,x]].concat(s);
if (x > s[0][0]){
let [l, _l] = s[0];
let [lb, ub] = s[1];
s[0] = [x+1, M];
s[1] = [lb, x];
s = [[l,_l], [x,x]].concat(s);
}
if (x == s[0][0]){
let [l,_l] = s[0];
let [lb, ub] = s[1];
let d = 1;
s.splice(0,1);
while (i > 0 && A[i-1] == x){
s =
[['non_inc', d++, lb, M]].concat(s);
i--;
}
if (i > 0)
s = [[l,_l]].concat(s);
}
}
// dirty fix
if (s[0][0] == 0)
s.splice(0,1);
return s;
}
var a = [2, 5, 3, 7, 9, 6]
var b = [0, 2, 2, 3, 7, 3]
console.log(JSON.stringify(a));
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,2,3,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,2]
console.log(JSON.stringify(f(b,4)));
b = [0,3,5,6]
console.log(JSON.stringify(f(b,10)));
b = [0,0,3,0]
console.log(JSON.stringify(f(b,10)));

Sum of numbers with approximation and no repetition

For an app I'm working on, I need to process an array of numbers and return a new array such that the sum of the elements are as close as possible to a target sum. This is similar to the coin-counting problem, with two differences:
Each element of the new array has to come from the input array (i.e. no repetition/duplication)
The algorithm should stop when it finds an array whose sum falls within X of the target number (e.g., given [10, 12, 15, 23, 26], a target of 35, and a sigma of 5, a result of [10, 12, 15] (sum 37) is OK but a result of [15, 26] (sum 41) is not.
I was considering the following algorithm (in pseudocode) but I doubt that this is the best way to do it.
function (array, goal, sigma)
var A = []
for each element E in array
if (E + (sum of rest of A) < goal +/- sigma)
A.push(E)
return A
For what it's worth, the language I'm using is Javascript. Any advice is much appreciated!

This is not intended as the best answer possible, just maybe something that will work well enough. All remarks/input is welcome.
Also, this is taking into mind the answers from the comments, that the input is length of songs (usually 100 - 600), the length of the input array is between 5 to 50 and the goal is anywhere between 100 to 7200.
The idea:
Start with finding the average value of the input, and then work out a guess on the number of input values you're going to need. Lets say that comes out x.
Order your input.
Take the first x-1 values and substitute the smallest one with the any other to get to your goal (somewhere in the range). If none exist, find a number so you're still lower than the goal.
Repeat step #3 using backtracking or something like that. Maybe limit the number of trials you're gonna spend there.
x++ and go back to step #3.

I would use some kind of divide and conquer and a recursive implementation. Here is a prototype in Smalltalk
SequenceableCollection>>subsetOfSum: s plusOrMinus: d
"check if a singleton matches"
self do: [:v | (v between: s - d and: s + d) ifTrue: [^{v}]].
"nope, engage recursion with a smaller collection"
self keysAndValuesDo: [:i :v |
| sub |
sub := (self copyWithoutIndex: i) subsetOfSum: s-v plusOrMinus: d.
sub isNil ifFalse: [^sub copyWith: v]].
"none found"
^nil
Using like this:
#(10 12 15 23 26) subsetOfSum: 62 plusOrMinus: 3.
gives:
#(23 15 12 10)

With limited input this problem is good candidate for dynamic programming with time complexity O((Sum + Sigma) * ArrayLength)
Delphi code:
function FindCombination(const A: array of Integer; Sum, Sigma: Integer): string;
var
Sums: array of Integer;
Value, idx: Integer;
begin
Result := '';
SetLength(Sums, Sum + Sigma + 1); //zero-initialized array
Sums[0] := 1; //just non-zero
for Value in A do begin
idx := Sum + Sigma;
while idx >= Value do begin
if Sums[idx - Value] <> 0 then begin //(idx-Value) sum can be formed from array]
Sums[idx] := Value; //value is included in this sum
if idx >= Sum - Sigma then begin //bingo!
while idx > 0 do begin //unwind and extract all values for this sum
Result := Result + IntToStr(Sums[idx]) + ' ';
idx := idx - Sums[idx];
end;
Exit;
end;
end;
Dec(idx); //idx--
end;
end;
end;

Here's one commented algorithm in JavaScript:
var arr = [9, 12, 20, 23, 26];
var target = 35;
var sigma = 5;
var n = arr.length;
// sort the numbers in ascending order
arr.sort(function(a,b){return a-b;});
// initialize the recursion
var stack = [[0,0,[]]];
while (stack[0] !== undefined){
var params = stack.pop();
var i = params[0]; // index
var s = params[1]; // sum so far
var r = params[2]; // accumulating list of numbers
// if the sum is within range, output sum
if (s >= target - sigma && s <= target + sigma){
console.log(r);
break;
// since the numbers are sorted, if the current
// number makes the sum too large, abandon this thread
} else if (s + arr[i] > target + sigma){
continue;
}
// there are still enough numbers left to skip this one
if (i < n - 1){
stack.push([i + 1,s,r]);
}
// there are still enough numbers left to add this one
if (i < n){
_r = r.slice();
_r.push(arr[i]);
stack.push([i + 1,s + arr[i],_r]);
}
}
/* [9,23] */

Efficient way to generate a seemingly random permutation from a very large set without repeating?

I have a very large set (billions or more, it's expected to grow exponentially to some level), and I want to generate seemingly random elements from it without repeating. I know I can pick a random number and repeat and record the elements I have generated, but that takes more and more memory as numbers are generated, and wouldn't be practical after couple millions elements out.
I mean, I could say 1, 2, 3 up to billions and each would be constant time without remembering all the previous, or I can say 1,3,5,7,9 and on then 2,4,6,8,10, but is there a more sophisticated way to do that and eventually get a seemingly random permutation of that set?
Update
1, The set does not change size in the generation process. I meant when the user's input increases linearly, the size of the set increases exponentially.
2, In short, the set is like the set of every integer from 1 to 10 billions or more.
3, In long, it goes up to 10 billion because each element carries the information of many independent choices, for example. Imagine an RPG character that have 10 attributes, each can go from 1 to 100 (for my problem different choices can have different ranges), thus there's 10^20 possible characters, number "10873456879326587345" would correspond to a character that have "11, 88, 35...", and I would like an algorithm to generate them one by one without repeating, but makes it looks random.

Thanks for the interesting question. You can create a "pseudorandom"* (cyclic) permutation with a few bytes using modular exponentiation. Say we have n elements. Search for a prime p that's bigger than n+1. Then find a primitive root g modulo p. Basically by definition of primitive root, the action x --> (g * x) % p is a cyclic permutation of {1, ..., p-1}. And so x --> ((g * (x+1))%p) - 1 is a cyclic permutation of {0, ..., p-2}. We can get a cyclic permutation of {0, ..., n-1} by repeating the previous permutation if it gives a value bigger (or equal) n.
I implemented this idea as a Go package. https://github.com/bwesterb/powercycle
package main
import (
"fmt"
"github.com/bwesterb/powercycle"
)
func main() {
var x uint64
cycle := powercycle.New(10)
for i := 0; i < 10; i++ {
fmt.Println(x)
x = cycle.Apply(x)
}
}
This outputs something like
0
6
4
1
2
9
3
5
8
7
but that might vary off course depending on the generator chosen.
It's fast, but not super-fast: on my five year old i7 it takes less than 210ns to compute one application of a cycle on 1000000000000000 elements. More details:
BenchmarkNew10-8 1000000 1328 ns/op
BenchmarkNew1000-8 500000 2566 ns/op
BenchmarkNew1000000-8 50000 25893 ns/op
BenchmarkNew1000000000-8 200000 7589 ns/op
BenchmarkNew1000000000000-8 2000 648785 ns/op
BenchmarkApply10-8 10000000 170 ns/op
BenchmarkApply1000-8 10000000 173 ns/op
BenchmarkApply1000000-8 10000000 172 ns/op
BenchmarkApply1000000000-8 10000000 169 ns/op
BenchmarkApply1000000000000-8 10000000 201 ns/op
BenchmarkApply1000000000000000-8 10000000 204 ns/op
Why did I say "pseudorandom"? Well, we are always creating a very specific kind of cycle: namely one that uses modular exponentiation. It looks pretty pseudorandom though.

I would use a random number and swap it with an element at the beginning of the set.
Here's some pseudo code
set = [1, 2, 3, 4, 5, 6]
picked = 0
Function PickNext(set, picked)
If picked > Len(set) - 1 Then
Return Nothing
End If
// random number between picked (inclusive) and length (exclusive)
r = RandomInt(picked, Len(set))
// swap the picked element to the beginning of the set
result = set[r]
set[r] = set[picked]
set[picked] = result
// update picked
picked++
// return your next random element
Return temp
End Function
Every time you pick an element there is one swap and the only extra memory being used is the picked variable. The swap can happen if the elements are in a database or in memory.
EDIT Here's a jsfiddle of a working implementation http://jsfiddle.net/sun8rw4d/
JavaScript
var set = [];
set.picked = 0;
function pickNext(set) {
if(set.picked > set.length - 1) { return null; }
var r = set.picked + Math.floor(Math.random() * (set.length - set.picked));
var result = set[r];
set[r] = set[set.picked];
set[set.picked] = result;
set.picked++;
return result;
}
// testing
for(var i=0; i<100; i++) {
set.push(i);
}
while(pickNext(set) !== null) { }
document.body.innerHTML += set.toString();
EDIT 2 Finally, a random binary walk of the set. This can be accomplished with O(Log2(N)) stack space (memory) which for 10billion is only 33. There's no shuffling or swapping involved. Using trinary instead of binary might yield even better pseudo random results.
// on the fly set generator
var count = 0;
var maxValue = 64;
function nextElement() {
// restart the generation
if(count == maxValue) {
count = 0;
}
return count++;
}
// code to pseudo randomly select elements
var current = 0;
var stack = [0, maxValue - 1];
function randomBinaryWalk() {
if(stack.length == 0) { return null; }
var high = stack.pop();
var low = stack.pop();
var mid = ((high + low) / 2) | 0;
// pseudo randomly choose the next path
if(Math.random() > 0.5) {
if(low <= mid - 1) {
stack.push(low);
stack.push(mid - 1);
}
if(mid + 1 <= high) {
stack.push(mid + 1);
stack.push(high);
}
} else {
if(mid + 1 <= high) {
stack.push(mid + 1);
stack.push(high);
}
if(low <= mid - 1) {
stack.push(low);
stack.push(mid - 1);
}
}
// how many elements to skip
var toMid = (current < mid ? mid - current : (maxValue - current) + mid);
// skip elements
for(var i = 0; i < toMid - 1; i++) {
nextElement();
}
current = mid;
// get result
return nextElement();
}
// test
var result;
var list = [];
do {
result = randomBinaryWalk();
list.push(result);
} while(result !== null);
document.body.innerHTML += '<br/>' + list.toString();
Here's the results from a couple of runs with a small set of 64 elements. JSFiddle http://jsfiddle.net/yooLjtgu/
30,46,38,34,36,35,37,32,33,31,42,40,41,39,44,45,43,54,50,52,53,51,48,47,49,58,60,59,61,62,56,57,55,14,22,18,20,19,21,16,15,17,26,28,29,27,24,25,23,6,2,4,5,3,0,1,63,10,8,7,9,12,11,13
30,14,22,18,16,15,17,20,19,21,26,28,29,27,24,23,25,6,10,8,7,9,12,13,11,2,0,63,1,4,5,3,46,38,42,44,45,43,40,41,39,34,36,35,37,32,31,33,54,58,56,55,57,60,59,61,62,50,48,49,47,52,51,53
As I mentioned in my comment, unless you have an efficient way to skip to a specific point in your "on the fly" generation of the set this will not be very efficient.

if it is enumerable then use a pseudo-random integer generator adjusted to the period 0 .. 2^n - 1 where the upper bound is just greater than the size of your set and generate pseudo-random integers discarding those more than the size of your set. Use those integers to index items from your set.

Pre- compute yourself a series of indices (e.g. in a file), which has the properties you need and then randomly choose a start index for your enumeration and use the series in a round-robin manner.
The length of your pre-computed series should be > the maximum size of the set.
If you combine this (depending on your programming language etc.) with file mappings, your final nextIndex(INOUT state) function is (nearly) as simple as return mappedIndices[state++ % PERIOD];, if you have a fixed size of each entry (e.g. 8 bytes -> uint64_t).
Of course, the returned value could be > your current set size. Simply draw indices until you get one which is <= your sets current size.
Update (In response to question-update):
There is another option to achieve your goal if it is about creating 10Billion unique characters in your RPG: Generate a GUID and write yourself a function which computes your number from the GUID. man uuid if you are are on a unix system. Else google it. Some parts of the uuid are not random but contain meta-info, some parts are either systematic (such as your network cards MAC address) or random, depending on generator algorithm. But they are very very most likely unique. So, whenever you need a new unique number, generate a uuid and transform it to your number by means of some algorithm which basically maps the uuid bytes to your number in a non-trivial way (e.g. use hash functions).

Generating permutations lazily

I'm looking for an algorithm to generate permutations of a set in such a way that I could make a lazy list of them in Clojure. i.e. I'd like to iterate over a list of permutations where each permutation is not calculated until I request it, and all of the permutations don't have to be stored in memory at once.
Alternatively I'm looking for an algorithm where given a certain set, it will return the "next" permutation of that set, in such a way that repeatedly calling the function on its own output will cycle through all permutations of the original set, in some order (what the order is doesn't matter).
Is there such an algorithm? Most of the permutation-generating algorithms I've seen tend to generate them all at once (usually recursively), which doesn't scale to very large sets. An implementation in Clojure (or another functional language) would be helpful but I can figure it out from pseudocode.

Yes, there is a "next permutation" algorithm, and it's quite simple too. The C++ standard template library (STL) even has a function called next_permutation.
The algorithm actually finds the next permutation -- the lexicographically next one. The idea is this: suppose you are given a sequence, say "32541". What is the next permutation?
If you think about it, you'll see that it is "34125". And your thoughts were probably something this: In "32541",
there is no way to keep the "32" fixed and find a later permutation in the "541" part, because that permutation is already the last one for 5,4, and 1 -- it is sorted in decreasing order.
So you'll have to change the "2" to something bigger -- in fact, to the smallest number bigger than it in the "541" part, namely 4.
Now, once you've decided that the permutation will start as "34", the rest of the numbers should be in increasing order, so the answer is "34125".
The algorithm is to implement precisely that line of reasoning:
Find the longest "tail" that is ordered in decreasing order. (The "541" part.)
Change the number just before the tail (the "2") to the smallest number bigger than it in the tail (the 4).
Sort the tail in increasing order.
You can do (1.) efficiently by starting at the end and going backwards as long as the previous element is not smaller than the current element. You can do (2.) by just swapping the "4" with the '2", so you'll have "34521". Once you do this, you can avoid using a sorting algorithm for (3.), because the tail was, and is still (think about this), sorted in decreasing order, so it only needs to be reversed.
The C++ code does precisely this (look at the source in /usr/include/c++/4.0.0/bits/stl_algo.h on your system, or see this article); it should be simple to translate it to your language: [Read "BidirectionalIterator" as "pointer", if you're unfamiliar with C++ iterators. The code returns false if there is no next permutation, i.e. we are already in decreasing order.]
template <class BidirectionalIterator>
bool next_permutation(BidirectionalIterator first,
BidirectionalIterator last) {
if (first == last) return false;
BidirectionalIterator i = first;
++i;
if (i == last) return false;
i = last;
--i;
for(;;) {
BidirectionalIterator ii = i--;
if (*i <*ii) {
BidirectionalIterator j = last;
while (!(*i <*--j));
iter_swap(i, j);
reverse(ii, last);
return true;
}
if (i == first) {
reverse(first, last);
return false;
}
}
}
It might seem that it can take O(n) time per permutation, but if you think about it more carefully, you can prove that it takes O(n!) time for all permutations in total, so only O(1) -- constant time -- per permutation.
The good thing is that the algorithm works even when you have a sequence with repeated elements: with, say, "232254421", it would find the tail as "54421", swap the "2" and "4" (so "232454221"), reverse the rest, giving "232412245", which is the next permutation.

Assuming that we're talking about lexicographic order over the values being permuted, there are two general approaches that you can use:
transform one permutation of the elements to the next permutation (as ShreevatsaR posted), or
directly compute the nth permutation, while counting n from 0 upward.
For those (like me ;-) who don't speak c++ as natives, approach 1 can be implemented from the following pseudo-code, assuming zero-based indexing of an array with index zero on the "left" (substituting some other structure, such as a list, is "left as an exercise" ;-):
1. scan the array from right-to-left (indices descending from N-1 to 0)
1.1. if the current element is less than its right-hand neighbor,
call the current element the pivot,
and stop scanning
1.2. if the left end is reached without finding a pivot,
reverse the array and return
(the permutation was the lexicographically last, so its time to start over)
2. scan the array from right-to-left again,
to find the rightmost element larger than the pivot
(call that one the successor)
3. swap the pivot and the successor
4. reverse the portion of the array to the right of where the pivot was found
5. return
Here's an example starting with a current permutation of CADB:
1. scanning from the right finds A as the pivot in position 1
2. scanning again finds B as the successor in position 3
3. swapping pivot and successor gives CBDA
4. reversing everything following position 1 (i.e. positions 2..3) gives CBAD
5. CBAD is the next permutation after CADB
For the second approach (direct computation of the nth permutation), remember that there are N! permutations of N elements. Therefore, if you are permuting N elements, the first (N-1)! permutations must begin with the smallest element, the next (N-1)! permutations must begin with the second smallest, and so on. This leads to the following recursive approach (again in pseudo-code, numbering the permutations and positions from 0):
To find permutation x of array A, where A has N elements:
0. if A has one element, return it
1. set p to ( x / (N-1)! ) mod N
2. the desired permutation will be A[p] followed by
permutation ( x mod (N-1)! )
of the elements remaining in A after position p is removed
So, for example, the 13th permutation of ABCD is found as follows:
perm 13 of ABCD: {p = (13 / 3!) mod 4 = (13 / 6) mod 4 = 2; ABCD[2] = C}
C followed by perm 1 of ABD {because 13 mod 3! = 13 mod 6 = 1}
perm 1 of ABD: {p = (1 / 2!) mod 3 = (1 / 2) mod 2 = 0; ABD[0] = A}
A followed by perm 1 of BD {because 1 mod 2! = 1 mod 2 = 1}
perm 1 of BD: {p = (1 / 1!) mod 2 = (1 / 1) mod 2 = 1; BD[1] = D}
D followed by perm 0 of B {because 1 mod 1! = 1 mod 1 = 0}
B (because there's only one element)
DB
ADB
CADB
Incidentally, the "removal" of elements can be represented by a parallel array of booleans which indicates which elements are still available, so it is not necessary to create a new array on each recursive call.
So, to iterate across the permutations of ABCD, just count from 0 to 23 (4!-1) and directly compute the corresponding permutation.

You should check the Permutations article on wikipeda. Also, there is the concept of Factoradic numbers.
Anyway, the mathematical problem is quite hard.
In C# you can use an iterator, and stop the permutation algorithm using yield. The problem with this is that you cannot go back and forth, or use an index.

More examples of permutation algorithms to generate them.
Source: http://www.ddj.com/architect/201200326
Uses the Fike's Algorithm, that is the one of fastest known.
Uses the Algo to the Lexographic order.
Uses the nonlexographic, but runs faster than item 2.
1.
PROGRAM TestFikePerm;
CONST marksize = 5;
VAR
marks : ARRAY [1..marksize] OF INTEGER;
ii : INTEGER;
permcount : INTEGER;
PROCEDURE WriteArray;
VAR i : INTEGER;
BEGIN
FOR i := 1 TO marksize
DO Write ;
WriteLn;
permcount := permcount + 1;
END;
PROCEDURE FikePerm ;
{Outputs permutations in nonlexicographic order. This is Fike.s algorithm}
{ with tuning by J.S. Rohl. The array marks[1..marksizn] is global. The }
{ procedure WriteArray is global and displays the results. This must be}
{ evoked with FikePerm(2) in the calling procedure.}
VAR
dn, dk, temp : INTEGER;
BEGIN
IF
THEN BEGIN { swap the pair }
WriteArray;
temp :=marks[marksize];
FOR dn := DOWNTO 1
DO BEGIN
marks[marksize] := marks[dn];
marks [dn] := temp;
WriteArray;
marks[dn] := marks[marksize]
END;
marks[marksize] := temp;
END {of bottom level sequence }
ELSE BEGIN
FikePerm;
temp := marks[k];
FOR dk := DOWNTO 1
DO BEGIN
marks[k] := marks[dk];
marks[dk][ := temp;
FikePerm;
marks[dk] := marks[k];
END; { of loop on dk }
marks[k] := temp;l
END { of sequence for other levels }
END; { of FikePerm procedure }
BEGIN { Main }
FOR ii := 1 TO marksize
DO marks[ii] := ii;
permcount := 0;
WriteLn ;
WrieLn;
FikePerm ; { It always starts with 2 }
WriteLn ;
ReadLn;
END.
2.
PROGRAM TestLexPerms;
CONST marksize = 5;
VAR
marks : ARRAY [1..marksize] OF INTEGER;
ii : INTEGER;
permcount : INTEGER;
PROCEDURE WriteArray;
VAR i : INTEGER;
BEGIN
FOR i := 1 TO marksize
DO Write ;
permcount := permcount + 1;
WriteLn;
END;
PROCEDURE LexPerm ;
{ Outputs permutations in lexicographic order. The array marks is global }
{ and has n or fewer marks. The procedure WriteArray () is global and }
{ displays the results. }
VAR
work : INTEGER:
mp, hlen, i : INTEGER;
BEGIN
IF
THEN BEGIN { Swap the pair }
work := marks[1];
marks[1] := marks[2];
marks[2] := work;
WriteArray ;
END
ELSE BEGIN
FOR mp := DOWNTO 1
DO BEGIN
LexPerm<>;
hlen := DIV 2;
FOR i := 1 TO hlen
DO BEGIN { Another swap }
work := marks[i];
marks[i] := marks[n - i];
marks[n - i] := work
END;
work := marks[n]; { More swapping }
marks[n[ := marks[mp];
marks[mp] := work;
WriteArray;
END;
LexPerm<>
END;
END;
BEGIN { Main }
FOR ii := 1 TO marksize
DO marks[ii] := ii;
permcount := 1; { The starting position is permutation }
WriteLn < Starting position: >;
WriteLn
LexPerm ;
WriteLn < PermCount is , permcount>;
ReadLn;
END.
3.
PROGRAM TestAllPerms;
CONST marksize = 5;
VAR
marks : ARRAY [1..marksize] of INTEGER;
ii : INTEGER;
permcount : INTEGER;
PROCEDURE WriteArray;
VAR i : INTEGER;
BEGIN
FOR i := 1 TO marksize
DO Write ;
WriteLn;
permcount := permcount + 1;
END;
PROCEDURE AllPerm (n : INTEGER);
{ Outputs permutations in nonlexicographic order. The array marks is }
{ global and has n or few marks. The procedure WriteArray is global and }
{ displays the results. }
VAR
work : INTEGER;
mp, swaptemp : INTEGER;
BEGIN
IF
THEN BEGIN { Swap the pair }
work := marks[1];
marks[1] := marks[2];
marks[2] := work;
WriteArray;
END
ELSE BEGIN
FOR mp := DOWNTO 1
DO BEGIN
ALLPerm<< n - 1>>;
IF >
THEN swaptemp := 1
ELSE swaptemp := mp;
work := marks[n];
marks[n] := marks[swaptemp};
marks[swaptemp} := work;
WriteArray;
AllPerm< n-1 >;
END;
END;
BEGIN { Main }
FOR ii := 1 TO marksize
DO marks[ii] := ii
permcount :=1;
WriteLn < Starting position; >;
WriteLn;
Allperm < marksize>;
WriteLn < Perm count is , permcount>;
ReadLn;
END.

the permutations function in clojure.contrib.lazy_seqs already claims to do just this.

It looks necromantic in 2022 but I'm sharing it anyway
Here an implementation of C++ next_permutation in Java can be found. The idea of using it in Clojure might be something like
(println (lazy-seq (iterator-seq (NextPermutationIterator. (list 'a 'b 'c)))))
disclaimer: I'm the author and maintainer of the project

Counting, reversed bit pattern

I am trying to find an algorithm to count from 0 to 2n-1 but their bit pattern reversed. I care about only n LSB of a word. As you may have guessed I failed.
For n=3:
000 -> 0
100 -> 4
010 -> 2
110 -> 6
001 -> 1
101 -> 5
011 -> 3
111 -> 7
You get the idea.
Answers in pseudo-code is great. Code fragments in any language are welcome, answers without bit operations are preferred.
Please don't just post a fragment without even a short explanation or a pointer to a source.
Edit: I forgot to add, I already have a naive implementation which just bit-reverses a count variable. In a sense, this method is not really counting.

This is, I think easiest with bit operations, even though you said this wasn't preferred
Assuming 32 bit ints, here's a nifty chunk of code that can reverse all of the bits without doing it in 32 steps:
unsigned int i;
i = (i & 0x55555555) << 1 | (i & 0xaaaaaaaa) >> 1;
i = (i & 0x33333333) << 2 | (i & 0xcccccccc) >> 2;
i = (i & 0x0f0f0f0f) << 4 | (i & 0xf0f0f0f0) >> 4;
i = (i & 0x00ff00ff) << 8 | (i & 0xff00ff00) >> 8;
i = (i & 0x0000ffff) << 16 | (i & 0xffff0000) >> 16;
i >>= (32 - n);
Essentially this does an interleaved shuffle of all of the bits. Each time around half of the bits in the value are swapped with the other half.
The last line is necessary to realign the bits so that bin "n" is the most significant bit.
Shorter versions of this are possible if "n" is <= 16, or <= 8

At each step, find the leftmost 0 digit of your value. Set it, and clear all digits to the left of it. If you don't find a 0 digit, then you've overflowed: return 0, or stop, or crash, or whatever you want.
This is what happens on a normal binary increment (by which I mean it's the effect, not how it's implemented in hardware), but we're doing it on the left instead of the right.
Whether you do this in bit ops, strings, or whatever, is up to you. If you do it in bitops, then a clz (or call to an equivalent hibit-style function) on ~value might be the most efficient way: __builtin_clz where available. But that's an implementation detail.

This solution was originally in binary and converted to conventional math as the requester specified.
It would make more sense as binary, at least the multiply by 2 and divide by 2 should be << 1 and >> 1 for speed, the additions and subtractions probably don't matter one way or the other.
If you pass in mask instead of nBits, and use bitshifting instead of multiplying or dividing, and change the tail recursion to a loop, this will probably be the most performant solution you'll find since every other call it will be nothing but a single add, it would only be as slow as Alnitak's solution once every 4, maybe even 8 calls.
int incrementBizarre(int initial, int nBits)
// in the 3 bit example, this should create 100
mask=2^(nBits-1)
// This should only return true if the first (least significant) bit is not set
// if initial is 011 and mask is 100
// 3 4, bit is not set
if(initial < mask)
// If it was not, just set it and bail.
return initial+ mask // 011 (3) + 100 (4) = 111 (7)
else
// it was set, are we at the most significant bit yet?
// mask 100 (4) / 2 = 010 (2), 001/2 = 0 indicating overflow
if(mask / 2) > 0
// No, we were't, so unset it (initial-mask) and increment the next bit
return incrementBizarre(initial - mask, mask/2)
else
// Whoops we were at the most significant bit. Error condition
throw new OverflowedMyBitsException()
Wow, that turned out kinda cool. I didn't figure in the recursion until the last second there.
It feels wrong--like there are some operations that should not work, but they do because of the nature of what you are doing (like it feels like you should get into trouble when you are operating on a bit and some bits to the left are non-zero, but it turns out you can't ever be operating on a bit unless all the bits to the left are zero--which is a very strange condition, but true.
Example of flow to get from 110 to 001 (backwards 3 to backwards 4):
mask 100 (4), initial 110 (6); initial < mask=false; initial-mask = 010 (2), now try on the next bit
mask 010 (2), initial 010 (2); initial < mask=false; initial-mask = 000 (0), now inc the next bit
mask 001 (1), initial 000 (0); initial < mask=true; initial + mask = 001--correct answer

Here's a solution from my answer to a different question that computes the next bit-reversed index without looping. It relies heavily on bit operations, though.
The key idea is that incrementing a number simply flips a sequence of least-significant bits, for example from nnnn0111 to nnnn1000. So in order to compute the next bit-reversed index, you have to flip a sequence of most-significant bits. If your target platform has a CTZ ("count trailing zeros") instruction, this can be done efficiently.
Example in C using GCC's __builtin_ctz:
void iter_reversed(unsigned bits) {
unsigned n = 1 << bits;
for (unsigned i = 0, j = 0; i < n; i++) {
printf("%x\n", j);
// Compute a mask of LSBs.
unsigned mask = i ^ (i + 1);
// Length of the mask.
unsigned len = __builtin_ctz(~mask);
// Align the mask to MSB of n.
mask <<= bits - len;
// XOR with mask.
j ^= mask;
}
}
Without a CTZ instruction, you can also use integer division:
void iter_reversed(unsigned bits) {
unsigned n = 1 << bits;
for (unsigned i = 0, j = 0; i < n; i++) {
printf("%x\n", j);
// Find least significant zero bit.
unsigned bit = ~i & (i + 1);
// Using division to bit-reverse a single bit.
unsigned rev = (n / 2) / bit;
// XOR with mask.
j ^= (n - 1) & ~(rev - 1);
}
}

void reverse(int nMaxVal, int nBits)
{
int thisVal, bit, out;
// Calculate for each value from 0 to nMaxVal.
for (thisVal=0; thisVal<=nMaxVal; ++thisVal)
{
out = 0;
// Shift each bit from thisVal into out, in reverse order.
for (bit=0; bit<nBits; ++bit)
out = (out<<1) + ((thisVal>>bit) & 1)
}
printf("%d -> %d\n", thisVal, out);
}

Maybe increment from 0 to N (the "usual" way") and do ReverseBitOrder() for each iteration. You can find several implementations here (I like the LUT one the best).
Should be really quick.

Here's an answer in Perl. You don't say what comes after the all ones pattern, so I just return zero. I took out the bitwise operations so that it should be easy to translate into another language.
sub reverse_increment {
my($n, $bits) = #_;
my $carry = 2**$bits;
while($carry > 1) {
$carry /= 2;
if($carry > $n) {
return $carry + $n;
} else {
$n -= $carry;
}
}
return 0;
}

Here's a solution which doesn't actually try to do any addition, but exploits the on/off pattern of the seqence (most sig bit alternates every time, next most sig bit alternates every other time, etc), adjust n as desired:
#define FLIP(x, i) do { (x) ^= (1 << (i)); } while(0)
int main() {
int n = 3;
int max = (1 << n);
int x = 0;
for(int i = 1; i <= max; ++i) {
std::cout << x << std::endl;
/* if n == 3, this next part is functionally equivalent to this:
*
* if((i % 1) == 0) FLIP(x, n - 1);
* if((i % 2) == 0) FLIP(x, n - 2);
* if((i % 4) == 0) FLIP(x, n - 3);
*/
for(int j = 0; j < n; ++j) {
if((i % (1 << j)) == 0) FLIP(x, n - (j + 1));
}
}
}

How about adding 1 to the most significant bit, then carrying to the next (less significant) bit, if necessary. You could speed this up by operating on bytes:
Precompute a lookup table for counting in bit-reverse from 0 to 256 (00000000 -> 10000000, 10000000 -> 01000000, ..., 11111111 -> 00000000).
Set all bytes in your multi-byte number to zero.
Increment the most significant byte using the lookup table. If the byte is 0, increment the next byte using the lookup table. If the byte is 0, increment the next byte...
Go to step 3.

With n as your power of 2 and x the variable you want to step:
(defun inv-step (x n) ; the following is a function declaration
"returns a bit-inverse step of x, bounded by 2^n" ; documentation
(do ((i (expt 2 (- n 1)) ; loop, init of i
(/ i 2)) ; stepping of i
(s x)) ; init of s as x
((not (integerp i)) ; breaking condition
s) ; returned value if all bits are 1 (is 0 then)
(if (< s i) ; the loop's body: if s < i
(return-from inv-step (+ s i)) ; -> add i to s and return the result
(decf s i)))) ; else: reduce s by i
I commented it thoroughly as you may not be familiar with this syntax.
edit: here is the tail recursive version. It seems to be a little faster, provided that you have a compiler with tail call optimization.
(defun inv-step (x n)
(let ((i (expt 2 (- n 1))))
(cond ((= n 1)
(if (zerop x) 1 0)) ; this is really (logxor x 1)
((< x i)
(+ x i))
(t
(inv-step (- x i) (- n 1))))))

When you reverse 0 to 2^n-1 but their bit pattern reversed, you pretty much cover the entire 0-2^n-1 sequence
Sum = 2^n * (2^n+1)/2
O(1) operation. No need to do bit reversals

Edit: Of course original poster's question was about to do increment by (reversed) one, which makes things more simple than adding two random values. So nwellnhof's answer contains the algorithm already.
Summing two bit-reversal values
Here is one solution in php:
function RevSum ($a,$b) {
// loop until our adder, $b, is zero
while ($b) {
// get carry (aka overflow) bit for every bit-location by AND-operation
// 0 + 0 --> 00 no overflow, carry is "0"
// 0 + 1 --> 01 no overflow, carry is "0"
// 1 + 0 --> 01 no overflow, carry is "0"
// 1 + 1 --> 10 overflow! carry is "1"
$c = $a & $b;
// do 1-bit addition for every bit location at once by XOR-operation
// 0 + 0 --> 00 result = 0
// 0 + 1 --> 01 result = 1
// 1 + 0 --> 01 result = 1
// 1 + 1 --> 10 result = 0 (ignored that "1", already taken care above)
$a ^= $b;
// now: shift carry bits to the next bit-locations to be added to $a in
// next iteration.
// PHP_INT_MAX here is used to ensure that the most-significant bit of the
// $b will be cleared after shifting. see link in the side note below.
$b = ($c >> 1) & PHP_INT_MAX;
}
return $a;
}
Side note: See this question about shifting negative values.
And as for test; start from zero and increment value by 8-bit reversed one (10000000):
$value = 0;
$add = 0x80; // 10000000 <-- "one" as bit reversed
for ($count = 20; $count--;) { // loop 20 times
printf("%08b\n", $value); // show value as 8-bit binary
$value = RevSum($value, $add); // do addition
}
... will output:
00000000
10000000
01000000
11000000
00100000
10100000
01100000
11100000
00010000
10010000
01010000
11010000
00110000
10110000
01110000
11110000
00001000
10001000
01001000
11001000

Let assume number 1110101 and our task is to find next one.
1) Find zero on highest position and mark position as index.
11101010 (4th position, so index = 4)
2) Set to zero all bits on position higher than index.
00001010
3) Change founded zero from step 1) to '1'
00011010
That's it. This is by far the fastest algorithm since most of cpu's has instructions to achieve this very efficiently. Here is a C++ implementation which increment 64bit number in reversed patern.
#include <intrin.h>
unsigned __int64 reversed_increment(unsigned __int64 number)
{
unsigned long index, result;
_BitScanReverse64(&index, ~number); // returns index of the highest '1' on bit-reverse number (trick to find the highest '0')
result = _bzhi_u64(number, index); // set to '0' all bits at number higher than index position
result |= (unsigned __int64) 1 << index; // changes to '1' bit on index position
return result;
}
Its not hit your requirements to have "no bits" operations, however i fear there is now way how to achieve something similar without them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio