number which appears more than n/3 times in an array - algorithm

I have read this problem
Find the most common entry in an array
and the answer from jon skeet is just mind blowing .. :)
Now I am trying to solve this problem find an element which occurs more than n/3 times in an array ..
I am pretty sure that we cannot apply the same method because there can be 2 such elements which will occur more than n/3 times and that gives false alarm of the count ..so is there any way we can tweak around jon skeet's answer to work for this ..?
Or is there any solution that will run in linear time ?

Jan Dvorak's answer is probably best:
Start with two empty candidate slots and two counters set to 0.
for each item:
if it is equal to either candidate, increment the corresponding count
else if there is an empty slot (i.e. a slot with count 0), put it in that slot and set the count to 1
else reduce both counters by 1
At the end, make a second pass over the array to check whether the candidates really do have the required count. This isn't allowed by the question you link to but I don't see how to avoid it for this modified version. If there is a value that occurs more than n/3 times then it will be in a slot, but you don't know which one it is.
If this modified version of the question guaranteed that there were two values with more than n/3 elements (in general, k-1 values with more than n/k) then we wouldn't need the second pass. But when the original question has k=2 and 1 guaranteed majority there's no way to know whether we "should" generalize it as guaranteeing 1 such element or guaranteeing k-1. The stronger the guarantee, the easier the problem.

Using Boyer-Moore Majority Vote Algorithm, we get:
vector<int> majorityElement(vector<int>& nums) {
int cnt1=0, cnt2=0;
int a,b;
for(int n: A){
if (n == a) cnt1++;
else if (n == b) cnt2++;
else if (cnt1 == 0){
cnt1++;
a = n;
}
else if (cnt2 == 0){
cnt2++;
b = n;
}
else{
cnt1--;
cnt2--;
}
}
cnt1=cnt2=0;
for(int n: nums){
if (n==a) cnt1++;
else if (n==b) cnt2++;
}
vector<int> result;
if (cnt1 > nums.size()/3) result.push_back(a);
if (cnt2 > nums.size()/3) result.push_back(b);
return result;
}
Updated, correction from #Vipul Jain

You can use Selection algorithm to find the number in the n/3 place and 2n/3.
n1=Selection(array[],n/3);
n2=Selection(array[],n2/3);
coun1=0;
coun2=0;
for(i=0;i<n;i++)
{
if(array[i]==n1)
count1++;
if(array[i]==n2)
count2++;
}
if(count1>n)
print(n1);
else if(count2>n)
print(n2);
else
print("no found!");

At line number five, the if statement should have one more check:
if(n!=b && (cnt1 == 0 || n == a))

I use the following Python solution to discuss the correctness of the algorithm:
class Solution:
"""
#param: nums: a list of integers
#return: The majority number that occurs more than 1/3
"""
def majorityNumber(self, nums):
if nums is None:
return None
if len(nums) == 0:
return None
num1 = None
num2 = None
count1 = 0
count2 = 0
# Loop 1
for i, val in enumerate(nums):
if count1 == 0:
num1 = val
count1 = 1
elif val == num1:
count1 += 1
elif count2 == 0:
num2 = val
count2 = 1
elif val == num2:
count2 += 1
else:
count1 -= 1
count2 -= 1
count1 = 0
count2 = 0
for val in nums:
if val == num1:
count1 += 1
elif val == num2:
count2 += 1
if count1 > count2:
return num1
return num2
First, we need to prove claim A:
Claim A: Consider a list C which contains a majority number m which occurs more floor(n/3) times. After 3 different numbers are removed from C, we have C'. m is the majority number of C'.
Proof: Use R to denote m's occurrence count in C. We have R > floor(n/3). R > floor(n/3) => R - 1 > floor(n/3) - 1 => R - 1 > floor((n-3)/3). Use R' to denote m's occurrence count in C'. And use n' to denote the length of C'. Since 3 different numbers are removed, we have R' >= R - 1. And n'=n-3 is obvious. We can have R' > floor(n'/3) from R - 1 > floor((n-3)/3). So m is the majority number of C'.
Now let's prove the correctness of the loop 1. Define L as count1 * [num1] + count2 * [num2] + nums[i:]. Use m to denote the majority number.
Invariant
The majority number m is in L.
Initialization
At the start of the first itearation, L is nums[0:]. So the invariant is trivially true.
Maintenance
if count1 == 0 branch: Before the iteration, L is count2 * [num2] + nums[i:]. After the iteration, L is 1 * [nums[i]] + count2 * [num2] + nums[i+1:]. In other words, L is not changed. So the invariant is maintained.
if val == num1 branch: Before the iteration, L is count1 * [nums[i]] + count2 * [num2] + nums[i:]. After the iteration, L is (count1+1) * [num[i]] + count2 * [num2] + nums[i+1:]. In other words, L is not changed. So the invariant is maintained.
f count2 == 0 branch: Similar to condition 1.
elif val == num2 branch: Similar to condition 2.
else branch: nums[i], num1 and num2 are different to each other in this case. After the iteration, L is (count1-1) * [num1] + (count2-1) * [num2] + nums[i+1:]. In other words, three different numbers are moved from count1 * [num1] + count2 * [num2] + nums[i:]. From claim A, we know m is the majority number of L.So the invariant is maintained.
Termination
When the loop terminates, nums[n:] is empty. L is count1 * [num1] + count2 * [num2].
So when the loop terminates, the majority number is either num1 or num2.

If there are n elements in the array , and suppose in the worst case only 1 element is repeated n/3 times , then the probability of choosing one number that is not the one which is repeated n/3 times will be (2n/3)/n that is 1/3 , so if we randomly choose N elements from the array of size ‘n’, then the probability that we end up choosing the n/3 times repeated number will be atleast 1-(2/3)^N . If we eqaute this to say 99.99 percent probability of getting success, we will get N=23 for any value of “n”.
Therefore just choose 23 numbers randomly from the list and count their occurrences , if we get count greater than n/3 , we will return that number and if we didn’t get any solution after checking for 23 numbers randomly , return -1;
The algorithm is essentially O(n) as the value 23 doesn’t depend on n(size of list) , so we have to just traverse array 23 times at worst case of algo.
Accepted Code on interviewbit(C++):
int n=A.size();
int ans,flag=0;
for(int i=0;i<23;i++)
{
int index=rand()%n;
int elem=A[index];
int count=0;
for(int i=0;i<n;i++)
{
if(A[i]==elem)
count++;
}
if(count>n/3)
{
flag=1;
ans=elem;
}
if(flag==1)
break;
}
if(flag==1)
return ans;
else return -1;
}

Related

Reading the bits of a natural number from LSB to MSB without built-ins in O(n)

Taking a natural number a as input, it is easy to read the bits of its binary form from MSB to LSB in O(n) time, n being its binary length, using only a for loop and elementary sums and subtractions. A left shift can be achieved by a+a and subtracting 1000000...
def powerOfTwo(n):
a = 1
for j in range(0,n):
a=(a+a)
return a
def bitLenFast(n):
len=0
if (n==0):
len=1
else:
y=1
while (y<=n):
y=(y+y)
len=(len+1)
return len
def readAsBinary(x):
len=bitLenFast(x) # Length of input x in bits
y=powerOfTwo((len-1)) # Reference word 1000000...
hBit=powerOfTwo(len) # Deletes highest bit in left shift
for i in range(0, len):
if (x>=y):
bit=1
x=((x+x)-hBit)
else:
bit=0
x=(x+x)
print(bit)
Is there an algorithm to parse a bit by bit from LSB to MSB in O(n) time, using only a while or a for loop and elementary operations (i.e. no bitwise built-in functions or operators)?
Apply your algorithm to find the bits in MSB to LSB order to the number. Keep an accumulator A initialized to 0 and a place value variable B initialized to 1. At each iteration, add B to A if the bit is set and then double B by adding it to itself. You also need to keep track of the number of consecutive 0 bits. Initialize a counter C to zero beforehand and at each iteration increment it if the bit is 0 or set to zero otherwise.
At the end you will have the number with the bits reversed in A. You can then output C leading zeros and then apply the algorithm to A to output the bits of the original number in LSB to MSB order.
This is an implementation of samgak's answer in JS, using 2 calls to (an adapted version of) OP's code. Since OP's code is O(n), and all added code is O(1), the result is also O(n).
Therefore, the answer to OP's question is yes.
NOTE: updated to add leading zeroes as per samgak's updated answer.
function read_low_to_high(num, out) {
const acc = {
n: 0, // integer with bits in reverse order
p: 1, // current power-of-two
z: 0, // last run of zeroes, to prepend to result once finished
push: (bit) => { // this is O(1)
if (bit) {
acc.n = acc.n + acc.p;
acc.z = 0;
} else {
acc.z = acc.z + 1;
}
acc.p = acc.p + acc.p;
}
};
// with n as log2(num) ...
read_high_to_low(num, acc); // O(n) - bits in reverse order
for (let i=0; i<acc.z; i++) { // O(n) - prepend zeroes
out.push(0);
}
read_high_to_low(acc.n, out); // O(n) - bits in expected order
}
function read_high_to_low(num, out) {
let po2 = 1; // max power-of-two <= num
let binlength = 1;
while (po2 + po2 <= num) {
po2 = po2 + po2;
binlength ++;
}
const hi = po2 + po2; // min power-of-two > num
for (let i=0; i<binlength; i++) {
if (num>=po2) {
out.push(1);
num = num + num - hi;
} else {
out.push(0);
num = num + num;
}
}
}
function test(i) {
const a = i.toString(2)
.split('').map(i => i-'0');
const ra = a.slice().reverse();
const b = [];
read_high_to_low(i, b);
const rb = [];
read_low_to_high(i, rb);
console.log(i,
"high-to-low",
JSON.stringify(a),
JSON.stringify(b),
"low-to-high",
JSON.stringify(ra),
JSON.stringify(rb)
);
}
for (let i=0; i<16; i++) test(i);
Perhaps you want something like this:
value = 666
while value:
next = value // 2 # integer division
bit = value - next * 2
print(bit, end = " ")
value = next
>>> 0 1 0 1 1 0 0 1 0 1
For reading digits from least significant to most significant and determining the numerical value, there is, but for a valid assertion about run time it would be essential if e.g. indexed access is constant time.
For digits in numerical value:
value ← 0, weight ← 1
foreach digit
  while 0 < digit
    value ← value + weight
    digit ← digit - 1
  weight ← weight + weight
 

Find Time Complexity of Function where the recursive call is in a for loop

Here is my function:
function a(n)
print 'a'
if n == 0:
return
for (int i = 0; i<=n-1; i++):
a(i)
return
So basically I understand that for each call, we're also calling all the function numbers leading to n recursively and then for each function we do the same thing again. My main problem, however, is that the for loop leaps up to a variable number every time, so it's like doing recursion inside of recursion.
Does it terminate in the first place?
For n == 0, it just returns. But for n == 1, it'll call itself for n == 0, n == 1 and n == 2. Thus calling a(1) will cause another call of a(1)...
IOW this is an endless loop and will show an infinite complexity.
Now after the change of the algorithm it will terminate. So let me investigate it anew.
For n == 1 it'll only call itself with n == 0.
For n == 2 it'll call itself for n == 0, n == 1 and another n == 0 due to the n == 1; that makes 3 calls.
For n == 3 it'll call itself 3 times + 3 times + 1 time, makes 7 times.
For n == 4 it'll call itself 4 times + 7 times + 3 times + 1 time, makes 15 times.
This looks very much like O(2^n - 1) = O(2^n).
(It's easy to prove by induction; The number of calls will be 2^n - 1, which is obviously true for all examples above. Given it is true for some n, it'll easily follow that it's true for n + 1)
Since the proof by induction isn't obvious for the OP, here it is:
First of all, since apart from the loop nothing really happens within the function, it'll only add a constant number of operations per iteration, which means it'll be sufficient to count the calls to itself.
By the above, it is proved for n = 1.
Now assume it has been proved for some n. We'll now follow it is true for n + 1.
By the induction hypothesis the number of calls for a(n + 1) = n + 1 + \sum_{i=0}^n (2^i - 1) (sorry for the notation; it would have worked on mathexchange. It states "the sum for i going from 0 up to n of (2^i - 1)").
Now n + 1 + \sum_{i=0}^n (2^i - 1) = \sum_{i=0}^n (2^i) = 2^{n + 1} - 1 which had to be shown.
This proves that the complexity is O(2^n).
Analysis by #Ronald is absolutely right.
Here is a different version of the program to get a count for how many times recursion is happening for different values of n
public class FindingRec
{
static int count;
static void rr(int n)
{
count++;
// System.out.print(n + ", ");
if (n == 0)
return;
for (int i = 0; i < n; i++)
{
rr(i);
}
}
public static void main(String[] args)
{
for (int n = 0; n < 10; n++)
{
count = 0;
rr(n);
System.out.println("For n = " + n + ", Count: " + count);
}
}
}
And here is the output:
For n = 0, Count: 1
For n = 1, Count: 2
For n = 2, Count: 4
For n = 3, Count: 8
For n = 4, Count: 16
For n = 5, Count: 32
For n = 6, Count: 64
For n = 7, Count: 128
For n = 8, Count: 256
For n = 9, Count: 512
So, the complexity is exactly O(2^n).

Generate any number using Incrementation and mult by 2

I'm looking for algorithm working in loop which will generate any natural number n with using only incrementation and multiplication by 2 well trivial way is known (increment number n times) but I'm looking for something a little bit faster. Honestly I don't even know how I should start this.
Basically, what you want to do is shift in the bits of the number from the right, starting with the MSB.
For example, if your number is 70, then the binary of it is 0b1000110. So, you want to "shift in" the bits 1, 0, 0, 0, 1, 1, 0.
To shift in a zero, you simply double the number. To shift in a one, you double the number, then increment it.
if (bit_to_be_shifted_in != 0)
x = (x * 2) + 1;
else
x = x * 2;
So, if you're given an array of bits from MSB to LSB (i.e. from left to right), then the C code looks like this:
x = 0;
for (i = 0; i < number_of_bits; i++)
{
if (bits[i] != 0)
x = x * 2 + 1;
else
x = x * 2;
}
One way of doing this is to go backwards. If it's an odd number, subtract one. If it's even, divide by 2.
while(n > 0) {
n & 1 ? n &= ~1 : n >>= 1;
}

Caculating total combinations

I don't know how to go about this programming problem.
Given two integers n and m, how many numbers exist such that all numbers have all digits from 0 to n-1 and the difference between two adjacent digits is exactly 1 and the number of digits in the number is atmost 'm'.
What is the best way to solve this problem? Is there a direct mathematical formula?
Edit: The number cannot start with 0.
Example:
for n = 3 and m = 6 there are 18 such numbers (210, 2101, 21012, 210121 ... etc)
Update (some people have encountered an ambiguity):
All digits from 0 to n-1 must be present.
This Python code computes the answer in O(nm) by keeping track of the numbers ending with a particular digit.
Different arrays (A,B,C,D) are used to track numbers that have hit the maximum or minimum of the range.
n=3
m=6
A=[1]*n # Number of ways of being at digit i and never being to min or max
B=[0]*n # number of ways with minimum being observed
C=[0]*n # number of ways with maximum being observed
D=[0]*n # number of ways with both being observed
A[0]=0 # Cannot start with 0
A[n-1]=0 # Have seen max so this 1 moves from A to C
C[n-1]=1 # Have seen max if start with highest digit
t=0
for k in range(m-1):
A2=[0]*n
B2=[0]*n
C2=[0]*n
D2=[0]*n
for i in range(1,n-1):
A2[i]=A[i+1]+A[i-1]
B2[i]=B[i+1]+B[i-1]
C2[i]=C[i+1]+C[i-1]
D2[i]=D[i+1]+D[i-1]
B2[0]=A[1]+B[1]
C2[n-1]=A[n-2]+C[n-2]
D2[0]=C[1]+D[1]
D2[n-1]=B[n-2]+D[n-2]
A=A2
B=B2
C=C2
D=D2
x=sum(d for d in D2)
t+=x
print t
After doing some more research, I think there may actually be a mathematical approach after all, although the math is advanced for me. Douglas S. Stones pointed me in the direction of Joseph Myers' (2008) article, BMO 2008–2009 Round 1 Problem 1—Generalisation, which derives formulas for calculating the number of zig-zag paths across a rectangular board.
As I understand it, in Anirudh's example, our board would have 6 rows of length 3 (I believe this would mean n=3 and r=6 in the article's terms). We can visualize our board so:
0 1 2 example zig-zag path: 0
0 1 2 1
0 1 2 0
0 1 2 1
0 1 2 2
0 1 2 1
Since Myers' formula m(n,r) would generate the number for all the zig-zag paths, that is, the number of all 6-digit numbers where all adjacent digits are consecutive and digits are chosen from (0,1,2), we would still need to determine and subtract those that begin with zero and those that do not include all digits.
If I understand correctly, we may do this in the following way for our example, although generalizing the concept to arbitrary m and n may prove more complicated:
Let m(3,6) equal the number of 6-digit numbers where all adjacent digits
are consecutive and digits are chosen from (0,1,2). According to Myers,
m(3,r) is given by formula and also equals OEIS sequence A029744 at
index r+2, so we have
m(3,6) = 16
How many of these numbers start with zero? Myers describes c(n,r) as the
number of zig-zag paths whose colour is that of the square in the top
right corner of the board. In our case, c(3,6) would include the total
for starting-digit 0 as well as starting-digit 2. He gives c(3,2r) as 2^r,
so we have
c(3,6) = 8. For starting-digit 0 only, we divide by two to get 4.
Now we need to obtain only those numbers that include all the digits in
the range, but how? We can do this be subtracting m(n-1,r) from m(n,r).
In our case, we have all the m(2,6) that would include only 0's and 1's,
and all the m(2,6) that would include 1's and 2's. Myers gives
m(2,anything) as 2, so we have
2*m(2,6) = 2*2 = 4
But we must remember that one of the zero-starting numbers is included
in our total for 2*m(2,6), namely 010101. So all together we have
m(3,6) - c(3,6)/2 - 4 + 1
= 16 - 4 - 4 + 1
= 9
To complete our example, we must follow a similar process for m(3,5),
m(3,4) and m(3,3). Since it's late here, I might follow up tomorrow...
One approach could be to program it recursively, calling the function to add as well as subtract from the last digit.
Haskell code:
import Data.List (sort,nub)
f n m = concatMap (combs n) [n..m]
combs n m = concatMap (\x -> combs' 1 [x]) [1..n - 1] where
combs' count result
| count == m = if test then [concatMap show result] else []
| otherwise = combs' (count + 1) (result ++ [r + 1])
++ combs' (count + 1) (result ++ [r - 1])
where r = last result
test = (nub . sort $ result) == [0..n - 1]
Output:
*Main> f 3 6
["210","1210","1012","2101","12101","10121","21210","21012"
,"21010","121210","121012","121010","101212","101210","101012"
,"212101","210121","210101"]
In response to Anirudh Rayabharam's comment, I hope the following code will be more 'pseudocode' like. When the total number of digits reaches m, the function g outputs 1 if the solution has hashed all [0..n-1], and 0 if not. The function f accumulates the results for g for starting digits [1..n-1] and total number of digits [n..m].
Haskell code:
import qualified Data.Set as S
g :: Int -> Int -> Int -> Int -> (S.Set Int, Int) -> Int
g n m digitCount lastDigit (hash,hashCount)
| digitCount == m = if test then 1 else 0
| otherwise =
if lastDigit == 0
then g n m d' (lastDigit + 1) (hash'',hashCount')
else if lastDigit == n - 1
then g n m d' (lastDigit - 1) (hash'',hashCount')
else g n m d' (lastDigit + 1) (hash'',hashCount')
+ g n m d' (lastDigit - 1) (hash'',hashCount')
where test = hashCount' == n
d' = digitCount + 1
hash'' = if test then S.empty else hash'
(hash',hashCount')
| hashCount == n = (S.empty,hashCount)
| S.member lastDigit hash = (hash,hashCount)
| otherwise = (S.insert lastDigit hash,hashCount + 1)
f n m = foldr forEachNumDigits 0 [n..m] where
forEachNumDigits numDigits accumulator =
accumulator + foldr forEachStartingDigit 0 [1..n - 1] where
forEachStartingDigit startingDigit accumulator' =
accumulator' + g n numDigits 1 startingDigit (S.empty,0)
Output:
*Main> f 3 6
18
(0.01 secs, 571980 bytes)
*Main> f 4 20
62784
(1.23 secs, 97795656 bytes)
*Main> f 4 25
762465
(11.73 secs, 1068373268 bytes)
model your problem as 2 superimposed lattices in 2 dimensions, specifically as pairs (i,j) interconnected with oriented edges ((i0,j0),(i1,j1)) where i1 = i0 + 1, |j1 - j0| = 1, modified as follows:
dropping all pairs (i,j) with j > 9 and its incident edges
dropping all pairs (i,j) with i > m-1 and its incident edges
dropping edge ((0,0), (1,1))
this construction results in a structure like in this diagram:
:
the requested numbers map to paths in the lattice starting at one of the green elements ((0,j), j=1..min(n-1,9)) that contain at least one pink and one red element ((i,0), i=1..m-1, (i,n-1), i=0..m-1 ). to see this, identify the i-th digit j of a given number with point (i,j). including pink and red elements ('extremal digits') guarantee that all available diguts are represented in the number.
Analysis
for convenience, let q1, q2 denote the position-1.
let q1 be the position of a number's first digit being either 0 or min(n-1,9).
let q2 be the position of a number's first 0 if the digit at position q1 is min(n-1,9) and vv.
case 1: first extremal digit is 0
the number of valid prefixes containing no 0 can be expressed as sum_{k=1..min(n-1,9)} (paths_to_0(k,1,q1), the function paths_to_0 being recursively defined as
paths_to_0(0,q1-1,q1) = 0;
paths_to_0(1,q1-1,q1) = 1;
paths_to_0(digit,i,q1) = 0; if q1-i < digit;
paths_to_0(x,_,_) = 0; if x >= min(n-1,9)
// x=min(n-1,9) mustn't occur before position q2,
// x > min(n-1,9) not at all
paths_to_0(x,_,_) = 0; if x <= 0;
// x=0 mustn't occur before position q1,
// x < 0 not at all
and else paths_to_0(digit,i,q1) =
paths_to_0(digit+1,i+1,q1) + paths_to_0(digit-1,i+1,q1);
similarly we have
paths_to_max(min(n-1,9),q2-1,q2) = 0;
paths_to_max(min(n-2,8),q2-1,q2) = 1;
paths_to_max(digit,i,q2) = 0 if q2-i < n-1;
paths_to_max(x,_,_) = 0; if x >= min(n-1,9)
// x=min(n-1,9) mustn't occur before
// position q2,
// x > min(n-1,9) not at all
paths_to_max(x,_,_) = 0; if x < 0;
and else paths_to_max(digit,q1,q2) =
paths_max(digit+1,q1+1,q2) + paths_to_max(digit-1,q1+1,q2);
and finally
paths_suffix(digit,length-1,length) = 2; if digit > 0 and digit < min(n-1,9)
paths_suffix(digit,length-1,length) = 1; if digit = 0 or digit = min(n-1,9)
paths_suffix(digit,k,length) = 0; if length > m-1
or length < q2
or k > length
paths_suffix(digit,k,0) = 1; // the empty path
and else paths_suffix(digit,k,length) =
paths_suffix(digit+1,k+1,length) + paths_suffix(digit-1,k+1,length);
... for a grand total of
number_count_case_1(n, m) =
sum_{first=1..min(n-1,9), q1=1..m-1-(n-1), q2=q1..m-1, l_suffix=0..m-1-q2} (
paths_to_0(first,1,q1)
+ paths_to_max(0,q1,q2)
+ paths_suffix(min(n-1,9),q2,l_suffix+q2)
)
case 2: first extremal digit is min(n-1,9)
case 2.1: initial digit is not min(n-1,9)
this is symmetrical to case 1 with all digits d replaced by min(n,10) - d. as the lattice structure is symmetrical, this means number_count_case_2_1 = number_count_case_1.
case 2.2: initial digit is min(n-1,9)
note that q1 is 1 and the second digit must be min(n-2,8).
thus
number_count_case_2_2 (n, m) =
sum_{q2=1..m-2, l_suffix=0..m-2-q2} (
paths_to_max(1,1,q2)
+ paths_suffix(min(n-1,9),q2,l_suffix+q2)
)
so the grand grand total will be
number_count ( n, m ) = 2 * number_count_case_1 (n, m) + number_count_case_2_2 (n, m);
Code
i don't know whether a closed expression for number_count exists, but the following perl code will compute it (the code is but a proof of concept as it does not use memoization techniques to avoid recomputing results already obtained):
use strict;
use warnings;
my ($n, $m) = ( 5, 7 ); # for example
$n = ($n > 10) ? 10 : $n; # cutoff
sub min
sub paths_to_0 ($$$) {
my (
$d
, $at
, $until
) = #_;
#
if (($d == 0) && ($at == $until - 1)) { return 0; }
if (($d == 1) && ($at == $until - 1)) { return 1; }
if ($until - $at < $d) { return 0; }
if (($d <= 0) || ($d >= $n))) { return 0; }
return paths_to_0($d+1, $at+1, $until) + paths_to_0($d-1, $at+1, $until);
} # paths_to_0
sub paths_to_max ($$$) {
my (
$d
, $at
, $until
) = #_;
#
if (($d == $n-1) && ($at == $until - 1)) { return 0; }
if (($d == $n-2) && ($at == $until - 1)) { return 1; }
if ($until - $at < $n-1) { return 0; }
if (($d < 0) || ($d >= $n-1)) { return 0; }
return paths_to_max($d+1, $at+1, $until) + paths_to_max($d-1, $at+1, $until);
} # paths_to_max
sub paths_suffix ($$$) {
my (
$d
, $at
, $until
) = #_;
#
if (($d < $n-1) && ($d > 0) && ($at == $until - 1)) { return 2; }
if ((($d == $n-1) && ($d == 0)) && ($at == $until - 1)) { return 1; }
if (($until > $m-1) || ($at > $until)) { return 0; }
if ($until == 0) { return 1; }
return paths_suffix($d+1, $at+1, $until) + paths_suffix($d-1, $at+1, $until);
} # paths_suffix
#
# main
#
number_count =
sum_{first=1..min(n-1,9), q1=1..m-1-(n-1), q2=q1..m-1, l_suffix=0..m-1-q2} (
paths_to_0(first,1,q1)
+ paths_to_max(0,q1,q2)
+ paths_suffix(min(n-1,9),q2,l_suffix+q2)
)
my ($number_count, $number_count_2_2) = (0, 0);
my ($first, $q1, i, $l_suffix);
for ($first = 1; $first <= $n-1; $first++) {
for ($q1 = 1; $q1 <= $m-1 - ($n-1); $q1++) {
for ($q2 = $q1; $q2 <= $m-1; $q2++) {
for ($l_suffix = 0; $l_suffix <= $m-1 - $q2; $l_suffix++) {
$number_count =
$number_count
+ paths_to_0($first,1,$q1)
+ paths_to_max(0,$q1,$q2)
+ paths_suffix($n-1,$q2,$l_suffix+$q2)
;
}
}
}
}
#
# case 2.2
#
for ($q2 = 1; $q2 <= $m-2; $q2++) {
for ($l_suffix = 0; $l_suffix <= $m-2 - $q2; $l_suffix++) {
$number_count_2_2 =
$number_count_2_2
+ paths_to_max(1,1,$q2)
+ paths_suffix($n-1,$q2,$l_suffix+$q2)
;
}
}
$number_count = 2 * $number_count + number_count_2_2;

How to calculate the index (lexicographical order) when the combination is given

I know that there is an algorithm that permits, given a combination of number (no repetitions, no order), calculates the index of the lexicographic order.
It would be very useful for my application to speedup things...
For example:
combination(10, 5)
1 - 1 2 3 4 5
2 - 1 2 3 4 6
3 - 1 2 3 4 7
....
251 - 5 7 8 9 10
252 - 6 7 8 9 10
I need that the algorithm returns the index of the given combination.
es: index( 2, 5, 7, 8, 10 ) --> index
EDIT: actually I'm using a java application that generates all combinations C(53, 5) and inserts them into a TreeMap.
My idea is to create an array that contains all combinations (and related data) that I can index with this algorithm.
Everything is to speedup combination searching.
However I tried some (not all) of your solutions and the algorithms that you proposed are slower that a get() from TreeMap.
If it helps: my needs are for a combination of 5 from 53 starting from 0 to 52.
Thank you again to all :-)
Here is a snippet that will do the work.
#include <iostream>
int main()
{
const int n = 10;
const int k = 5;
int combination[k] = {2, 5, 7, 8, 10};
int index = 0;
int j = 0;
for (int i = 0; i != k; ++i)
{
for (++j; j != combination[i]; ++j)
{
index += c(n - j, k - i - 1);
}
}
std::cout << index + 1 << std::endl;
return 0;
}
It assumes you have a function
int c(int n, int k);
that will return the number of combinations of choosing k elements out of n elements.
The loop calculates the number of combinations preceding the given combination.
By adding one at the end we get the actual index.
For the given combination there are
c(9, 4) = 126 combinations containing 1 and hence preceding it in lexicographic order.
Of the combinations containing 2 as the smallest number there are
c(7, 3) = 35 combinations having 3 as the second smallest number
c(6, 3) = 20 combinations having 4 as the second smallest number
All of these are preceding the given combination.
Of the combinations containing 2 and 5 as the two smallest numbers there are
c(4, 2) = 6 combinations having 6 as the third smallest number.
All of these are preceding the given combination.
Etc.
If you put a print statement in the inner loop you will get the numbers
126, 35, 20, 6, 1.
Hope that explains the code.
Convert your number selections to a factorial base number. This number will be the index you want. Technically this calculates the lexicographical index of all permutations, but if you only give it combinations, the indexes will still be well ordered, just with some large gaps for all the permutations that come in between each combination.
Edit: pseudocode removed, it was incorrect, but the method above should work. Too tired to come up with correct pseudocode at the moment.
Edit 2: Here's an example. Say we were choosing a combination of 5 elements from a set of 10 elements, like in your example above. If the combination was 2 3 4 6 8, you would get the related factorial base number like so:
Take the unselected elements and count how many you have to pass by to get to the one you are selecting.
1 2 3 4 5 6 7 8 9 10
2 -> 1
1 3 4 5 6 7 8 9 10
3 -> 1
1 4 5 6 7 8 9 10
4 -> 1
1 5 6 7 8 9 10
6 -> 2
1 5 7 8 9 10
8 -> 3
So the index in factorial base is 1112300000
In decimal base, it's
1*9! + 1*8! + 1*7! + 2*6! + 3*5! = 410040
This is Algorithm 2.7 kSubsetLexRank on page 44 of Combinatorial Algorithms by Kreher and Stinson.
r = 0
t[0] = 0
for i from 1 to k
if t[i - 1] + 1 <= t[i] - 1
for j from t[i - 1] to t[i] - 1
r = r + choose(n - j, k - i)
return r
The array t holds your values, for example [5 7 8 9 10]. The function choose(n, k) calculates the number "n choose k". The result value r will be the index, 251 for the example. Other inputs are n and k, for the example they would be 10 and 5.
zero-base,
# v: array of length k consisting of numbers between 0 and n-1 (ascending)
def index_of_combination(n,k,v):
idx = 0
for p in range(k-1):
if p == 0: arrg = range(1,v[p]+1)
else: arrg = range(v[p-1]+2, v[p]+1)
for a in arrg:
idx += combi[n-a, k-1-p]
idx += v[k-1] - v[k-2] - 1
return idx
Null Set has the right approach. The index corresponds to the factorial-base number of the sequence. You build a factorial-base number just like any other base number, except that the base decreases for each digit.
Now, the value of each digit in the factorial-base number is the number of elements less than it that have not yet been used. So, for combination(10, 5):
(1 2 3 4 5) == 0*9!/5! + 0*8!/5! + 0*7!/5! + 0*6!/5! + 0*5!/5!
== 0*3024 + 0*336 + 0*42 + 0*6 + 0*1
== 0
(10 9 8 7 6) == 9*3024 + 8*336 + 7*42 + 6*6 + 5*1
== 30239
It should be pretty easy to calculate the index incrementally.
If you have a set of positive integers 0<=x_1 < x_2< ... < x_k , then you could use something called the squashed order:
I = sum(j=1..k) Choose(x_j,j)
The beauty of the squashed order is that it works independent of the largest value in the parent set.
The squashed order is not the order you are looking for, but it is related.
To use the squashed order to get the lexicographic order in the set of k-subsets of {1,...,n) is by taking
1 <= x1 < ... < x_k <=n
compute
0 <= n-x_k < n-x_(k-1) ... < n-x_1
Then compute the squashed order index of (n-x_k,...,n-k_1)
Then subtract the squashed order index from Choose(n,k) to get your result, which is the lexicographic index.
If you have relatively small values of n and k, you can cache all the values Choose(a,b) with a
See Anderson, Combinatorics on Finite Sets, pp 112-119
I needed also the same for a project of mine and the fastest solution I found was (Python):
import math
def nCr(n,r):
f = math.factorial
return f(n) / f(r) / f(n-r)
def index(comb,n,k):
r=nCr(n,k)
for i in range(k):
if n-comb[i]<k-i:continue
r=r-nCr(n-comb[i],k-i)
return r
My input "comb" contained elements in increasing order You can test the code with for example:
import itertools
k=3
t=[1,2,3,4,5]
for x in itertools.combinations(t, k):
print x,index(x,len(t),k)
It is not hard to prove that if comb=(a1,a2,a3...,ak) (in increasing order) then:
index=[nCk-(n-a1+1)Ck] + [(n-a1)C(k-1)-(n-a2+1)C(k-1)] + ... =
nCk -(n-a1)Ck -(n-a2)C(k-1) - .... -(n-ak)C1
There's another way to do all this. You could generate all possible combinations and write them into a binary file where each comb is represented by it's index starting from zero. Then, when you need to find an index, and the combination is given, you apply a binary search on the file. Here's the function. It's written in VB.NET 2010 for my lotto program, it works with Israel lottery system so there's a bonus (7th) number; just ignore it.
Public Function Comb2Index( _
ByVal gAr() As Byte) As UInt32
Dim mxPntr As UInt32 = WHL.AMT.WHL_SYS_00 '(16.273.488)
Dim mdPntr As UInt32 = mxPntr \ 2
Dim eqCntr As Byte
Dim rdAr() As Byte
modBinary.OpenFile(WHL.WHL_SYS_00, _
FileMode.Open, FileAccess.Read)
Do
modBinary.ReadBlock(mdPntr, rdAr)
RP: If eqCntr = 7 Then GoTo EX
If gAr(eqCntr) = rdAr(eqCntr) Then
eqCntr += 1
GoTo RP
ElseIf gAr(eqCntr) < rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mxPntr = mdPntr
mdPntr \= 2
ElseIf gAr(eqCntr) > rdAr(eqCntr) Then
If eqCntr > 0 Then eqCntr = 0
mdPntr += (mxPntr - mdPntr) \ 2
End If
Loop Until eqCntr = 7
EX: modBinary.CloseFile()
Return mdPntr
End Function
P.S. It takes 5 to 10 mins to generate 16 million combs on a Core 2 Duo. To find the index using binary search on file takes 397 milliseconds on a SATA drive.
Assuming the maximum setSize is not too large, you can simply generate a lookup table, where the inputs are encoded this way:
int index(a,b,c,...)
{
int key = 0;
key |= 1<<a;
key |= 1<<b;
key |= 1<<c;
//repeat for all arguments
return Lookup[key];
}
To generate the lookup table, look at this "banker's order" algorithm. Generate all the combinations, and also store the base index for each nItems. (For the example on p6, this would be [0,1,5,11,15]). Note that by you storing the answers in the opposite order from the example (LSBs set first) you will only need one table, sized for the largest possible set.
Populate the lookup table by walking through the combinations doing Lookup[combination[i]]=i-baseIdx[nItems]
EDIT: Never mind. This is completely wrong.
Let your combination be (a1, a2, ..., ak-1, ak) where a1 < a2 < ... < ak. Let choose(a,b) = a!/(b!*(a-b)!) if a >= b and 0 otherwise. Then, the index you are looking for is
choose(ak-1, k) + choose(ak-1-1, k-1) + choose(ak-2-1, k-2) + ... + choose (a2-1, 2) + choose (a1-1, 1) + 1
The first term counts the number of k-element combinations such that the largest element is less than ak. The second term counts the number of (k-1)-element combinations such that the largest element is less than ak-1. And, so on.
Notice that the size of the universe of elements to be chosen from (10 in your example) does not play a role in the computation of the index. Can you see why?
Sample solution:
class Program
{
static void Main(string[] args)
{
// The input
var n = 5;
var t = new[] { 2, 4, 5 };
// Helping transformations
ComputeDistances(t);
CorrectDistances(t);
// The algorithm
var r = CalculateRank(t, n);
Console.WriteLine("n = 5");
Console.WriteLine("t = {2, 4, 5}");
Console.WriteLine("r = {0}", r);
Console.ReadKey();
}
static void ComputeDistances(int[] t)
{
var k = t.Length;
while (--k >= 0)
t[k] -= (k + 1);
}
static void CorrectDistances(int[] t)
{
var k = t.Length;
while (--k > 0)
t[k] -= t[k - 1];
}
static int CalculateRank(int[] t, int n)
{
int k = t.Length - 1, r = 0;
for (var i = 0; i < t.Length; i++)
{
if (t[i] == 0)
{
n--;
k--;
continue;
}
for (var j = 0; j < t[i]; j++)
{
n--;
r += CalculateBinomialCoefficient(n, k);
}
n--;
k--;
}
return r;
}
static int CalculateBinomialCoefficient(int n, int k)
{
int i, l = 1, m, x, y;
if (n - k < k)
{
x = k;
y = n - k;
}
else
{
x = n - k;
y = k;
}
for (i = x + 1; i <= n; i++)
l *= i;
m = CalculateFactorial(y);
return l/m;
}
static int CalculateFactorial(int n)
{
int i, w = 1;
for (i = 1; i <= n; i++)
w *= i;
return w;
}
}
The idea behind the scenes is to associate a k-subset with an operation of drawing k-elements from the n-size set. It is a combination, so the overall count of possible items will be (n k). It is a clue that we could seek the solution in Pascal Triangle. After a while of comparing manually written examples with the appropriate numbers from the Pascal Triangle, we will find the pattern and hence the algorithm.
I used user515430's answer and converted to python3. Also this supports non-continuous values so you could pass in [1,3,5,7,9] as your pool instead of range(1,11)
from itertools import combinations
from scipy.special import comb
from pandas import Index
debugcombinations = False
class IndexedCombination:
def __init__(self, _setsize, _poolvalues):
self.setsize = _setsize
self.poolvals = Index(_poolvalues)
self.poolsize = len(self.poolvals)
self.totalcombinations = 1
fast_k = min(self.setsize, self.poolsize - self.setsize)
for i in range(1, fast_k + 1):
self.totalcombinations = self.totalcombinations * (self.poolsize - fast_k + i) // i
#fill the nCr cache
self.choose_cache = {}
n = self.poolsize
k = self.setsize
for i in range(k + 1):
for j in range(n + 1):
if n - j >= k - i:
self.choose_cache[n - j,k - i] = comb(n - j,k - i, exact=True)
if debugcombinations:
print('testnth = ' + str(self.testnth()))
def get_nth_combination(self,index):
n = self.poolsize
r = self.setsize
c = self.totalcombinations
#if index < 0 or index >= c:
# raise IndexError
result = []
while r:
c, n, r = c*r//n, n-1, r-1
while index >= c:
index -= c
c, n = c*(n-r)//n, n-1
result.append(self.poolvals[-1 - n])
return tuple(result)
def get_n_from_combination(self,someset):
n = self.poolsize
k = self.setsize
index = 0
j = 0
for i in range(k):
setidx = self.poolvals.get_loc(someset[i])
for j in range(j + 1, setidx + 1):
index += self.choose_cache[n - j, k - i - 1]
j += 1
return index
#just used to test whether nth_combination from the internet actually works
def testnth(self):
n = 0
_setsize = self.setsize
mainset = self.poolvals
for someset in combinations(mainset, _setsize):
nthset = self.get_nth_combination(n)
n2 = self.get_n_from_combination(nthset)
if debugcombinations:
print(str(n) + ': ' + str(someset) + ' vs ' + str(n2) + ': ' + str(nthset))
if n != n2:
return False
for x in range(_setsize):
if someset[x] != nthset[x]:
return False
n += 1
return True
setcombination = IndexedCombination(5, list(range(1,10+1)))
print( str(setcombination.get_n_from_combination([2,5,7,8,10])))
returns 188

Resources