How to iterate over subsets of a bitwise mask? - algorithm

I have a bitwise mask represented as an integer. The mask and integer is limited to 32-bit integers.
I am interested in examining all subsets of the set bits of a given mask/integer, but I don't know of a good way to quickly find these subsets.
The solution that I've been using is
for(int j = 1; j <= mask; ++j)
{
if(j & mask != 0)
{
// j is a valid subset of mask
}
}
But this requires looping from j = 1 to mask, and I think there should be a faster solution than this.
Is there a faster solution than this?
My followup question is if I want to constrain the subset to be of a fixed size (i.e., a fixed number of set bits), is there a simple way to do that as well?

Iterate all the subset of state in C++:
for (int subset=state; subset>0; subset=(subset-1)&state) {}
This tip is usually used in Bit mask + dp question. The total time complexity is O(3^n) to iterate all the subset of all state, which is a great improvement from O(4^n) if using the code in this question.

For a bitmask of length n the maximum number of subsets of set bits can be 2^n - 1. Hence we have to traverse over all the subsets if you want to examine them, and therefore the best complexity will be O(mask). The code cannot be improved further.
Ps: If you want to count the total number of subsets, then it can be solved using dynamic programming with much better time complexity.

Given x with a subset of the bits in mask, the next subset in order is ( (x|~mask) +1 ) & mask. This will wrap around to zero if x==mask.
I don't have a super fast way for subsets with a fixed number of bits.

Related

Finding all prime numbers from 1 to N using GCD (An alternate approach to sieve-of-eratosthenes)

To find all prime numbers from 1 to N.
I know we usually approach this problem using Sieve of Eratosthenes, I had an alternate approach in mind using gcd that I wanted your views on.
My approach->
Keep a maintaining a variable if all prime numbers are processed till any iteration. If gcd of this var, number i ==1. That means the nos. are co-prime so i must be prime.
For ex: gcd(210,11) == 1, so 11 is prime.
{210=2*3*5*7}
Pseudocode:
Init num_list={contains numbers 2 to N} [since 0 and 1 arent prime nos.]
curr_gcd = 2, gcd_val=1
For i=3;i<=N;i++
gcd_val=__gcd(curr_gcd,i)
if gcd_val == 1 //(prime)
curr_gcd = curr_gcd * i
else //(composite so remove from list)
numList.remove(i)
Alternatively, we can also have a list and push the prime numbers into that list.
SC = O(N)
TC = O(N log(N)) [TC to calculate gcd using euclid's method => O(log(max(a,b)))]
Does this seem right or I am calculating the TC incorrectly here. Please post your views on this.
TIA!
Looks like the time complexity of my approach is closer to O(log^2(n)) as pointed out by many in the comments.
Also, the curr_gcd var would become quite large as N is increased and would definitely overflow int and long size limits.
Thanks to everyone who responded!
Maybe your method is theoretically right,but evidently, it's not excellent.
It's efficiency is worse than SoE, the range of data that it needs is too large. So maybe it seems elegant to look but hard to use.
In my views, "To find all prime numbers from 1 to N" is already a well-known problem and that means it's solution is well considered.
At first, maybe we use brute-force to deal with it like this.
int primes[N],cnt;//store all prime numbers
bool st[N];//st[i]:whether i is rejected
void get_primes(int n){
for(int i=2;i<=n;i++){
if(st[i]) continue;
primes[cnt++]=i;
for(int j=i+i;j<=n;j+=i){
st[j]=true;
}
}
}
it's a O(n^2) time algorithm.Too slow to endure.
Go ahead. We have SoE, which use O(nlognlogn) time.
But we have a better algorithm called "liner sieve", which only use O(n) time, just as it's name. I implement it with C language like this.
int primes[N],cnt;
bool st[N];
void get_primes(int n){
for(int i=2;i<=n;i++){
if(!st[i]) primes[cnt++]=i;
for(int j=0;primes[j]*i<=n;j++){
st[primes[j]*i]=true;
if(i%primes[j]==0) break;
}
}
}
this O(n) algorithm is used by me to slove this kind of algorithm problems that appear in major IT companies and many kinds of OJ.

Finding every possible combination of a list of numbers divided in two sets

I have a list of numbers eg.
120
233
197
400
276
356
121
For the purpose of my program these numbers have to be arranged in two sets. Based on what numbers are in the set the program calculates the efficiency of each set. It then combines the efficiency quotients for the combination of the two sets. The two sets and its efficiency quotient then gets saved in an array.
The goal is to find the combination of sets where both the sets have the highest efficiency.
My problem: At the moment, i can't seem to wrap my head around the algorithm needed to check every possible set combination. As far as i can get it seems to need a form of recursion.
If you need more information let me know! Thanks in advance!
To iterate over all possible ways in which you can form two sets, you should iterate over all possible subsets of the initial list. To do that you can use bitmask of size n, where n is the number of elements in the initial list.
To generate all possible bitmasks of size n, you can use a simple cycle(c++ code that should be easy to port to other languages):
for (int mask = 0; mask < (1 << (n - 1)); ++mask) {
for (int i = 0; i < n; ++i) {
if (mask & (1 << i)) {
// element i is in the first sub set
} else {
// element i is in the second sub set
}
}
// compute efficiency quotient for these subsets
}
Overall complexity is 2n * efficiency_quotient calclulation and is as good as you can go unless you have additional information on the efficiency quotient.
NOTE: here I cycle up to 1 << (n - 1) to avoid considering subsets A and B and then B and A. If the efficiency quotient cares about the order, you will need to change that to 1 << n.

Algorithm on interview

Recently I was asked the following interview question:
You have two sets of numbers of the same length N, for example A = [3, 5, 9] and B = [7, 5, 1]. Next, for each position i in range 0..N-1, you can pick either number A[i] or B[i], so at the end you will have another array C of length N which consists in elements from A and B. If sum of all elements in C is less than or equal to K, then such array is good. Please write an algorithm to figure out the total number of good arrays by given arrays A, B and number K.
The only solution I've come up is Dynamic Programming approach, when we have a matrix of size NxK and M[i][j] represents how many combinations could we have for number X[i] if current sum is equal to j. But looks like they expected me to come up with a formula. Could you please help me with that? At least what direction should I look for? Will appreciate any help. Thanks.
After some consideration, I believe this is an NP-complete problem. Consider:
A = [0, 0, 0, ..., 0]
B = [b1, b2, b3, ..., bn]
Note that every construction of the third set C = ( A[i] or B[i] for i = 0..n ) is is just the union of some subset of A and some subset of B. In this case, since every subset of A sums to 0, the sum of C is the same as the sum of some subset of B.
Now your question "How many ways can we construct C with a sum less than K?" can be restated as "How many subsets of B sum to less than K?". Solving this problem for K = 1 and K = 0 yields the solution to the subset sum problem for B (the difference between the two solutions is the number of subsets that sum to 0).
By similar argument, even in the general case where A contains nonzero elements, we can construct an array S = [b1-a1, b2-a2, b3-a3, ..., bn-an], and the question becomes "How many subsets of S sum to less than K - sum(A)?"
Since the subset sum problem is NP-complete, this problem must be also. So with that in mind, I would venture that the dynamic programming solution you proposed is the best you can do, and certainly no magic formula exists.
" Please write an algorithm to figure out the total number of good
arrays by given arrays A, B and number K."
Is it not the goal?
int A[];
int B[];
int N;
int K;
int Solutions = 0;
void FindSolutons(int Depth, int theSumSoFar) {
if (theSumSoFar > K) return;
if (Depth >= N) {
Solutions++;
return;
}
FindSolutions(Depth+1,theSumSoFar+A[Depth]);
FindSolutions(Depth+1,theSumSoFar+B[Depth]);
}
Invoke FindSolutions with both arguments set to zero. On return the Solutions will be equal to the number of good arrays;
this is how i would try to solve the problem
(Sorry if its stupid)
think of arrays
A=[3,5,9,8,2]
B=[7,5,1,8,2]
if
elements
0..N-1
number of choices
2^N
C1=0,C2=0
for all A[i]=B[i]
{
C1++
C2+=A[i]+B[i]
}
then create new two arrays like
A1=[3,5,9]
B1=[7,5,1]
also now C2 is 10
now number of all choices are reduced to 2^(N-C1)
now calculate all good numbers
using 'K' as K=K-C2
unfortunately
no matter what method you use, you have
to calculate sum 2^(N-C1) times
So there's 2^N choices, since at each point you either pick from A or from B. In the specific example you give where N happens to be 3 there are 8. For discussion you can characterise each set of decisions as a bit pattern.
So as a brute-force approach would try every single bit pattern.
But what should be obvious is that if the first few bits produce a number too large then every subsequent possible group of tail bits will also produce a number that is too large. So probably a better way to model it is a tree where you don't bother walking down the limbs that have already grown beyond your limit.
You can also compute the maximum totals that can be reached from each bit to the end of the table. If at any point your running total plus the maximum that you can obtain from here on down is less than K then every subtree from where you are is acceptable without any need for traversal. The case, as discussed in the comments, where every single combination is acceptable is a special case of this observation.
As pointed out by Serge below, a related observation is to us minimums and use the converse logic to cancel whole subtrees without traversal.
A potential further optimisation rests behind the observation that, as long as we shuffle each in the same way, changing the order of A and B has no effect because addition is commutative. You can therefore make an effort to ensure either that the maximums grow as quickly as possible or the minimums grow as slowly as possible, to try to get the earliest possible exit from traversal. In practice you'd probably want to apply a heuristic comparing the absolute maximum and minimum (both of which you've computed anyway) to K.
That being the case, a recursive implementation is easiest, e.g. (in C)
/* assume A, B and N are known globals */
unsigned int numberOfGoodArraysFromBit(
unsigned int bit,
unsigned int runningTotal,
unsigned int limit)
{
// have we ended up in an unacceptable subtree?
if(runningTotal > limit) return 0;
// have we reached the leaf node without at any
// point finding this subtree to be unacceptable?
if(bit >= N) return 1;
// maybe every subtree is acceptable?
if(runningTotal + MAXV[bit] <= limit)
{
return 1 << (N - bit);
}
// maybe no subtrees are acceptable?
if(runningTotal + MINV[bit] > limit)
{
return 0;
}
// if we can't prima facie judge the subtreees,
// we'll need specifically to evaluate them
return
numberOfGoodArraysFromBit(bit+1, runningTotal+A[bit], limit) +
numberOfGoodArraysFromBit(bit+1, runningTotal+B[bit], limit);
}
// work out the minimum and maximum values at each position
for(int i = 0; i < N; i++)
{
MAXV[i] = MAX(A[i], B[i]);
MINV[i] = MIN(A[i], B[i]);
}
// hence work out the cumulative totals from right to left
for(int i = N-2; i >= 0; i--)
{
MAXV[i] += MAXV[i+1];
MINV[i] += MINV[i+1];
}
// to kick it off
printf("Total valid combinations is %u", numberOfGoodArraysFromBit(0, 0, K));
I'm just thinking extemporaneously; it's likely better solutions exist.

Get X unique numbers from a set

What is the most elegant way to grab unique random numbers I ponder?
At the moment I need random unique numbers, I check to see if it's not unique by using a while loop to see if I've used the random number before.
So It looks like:
int n = getRandomNumber % [Array Size];
for each ( Previously used n in list)
Check if I've used n before, if I have...try again.
There are many ways to solve this linear O(n/2) problem, I just wonder if there is a elegant way to solve it. Trying to think back to MATH115 Discrete mathematics and remember if the old lecturer covered anything to do with a seemingly trivial problem.
I can't think at the moment, so maybe once I have some caffeine my brain will suss it with the heightened IQ induced from the Coffee.
If you want k random integers drawn without replacement (to get unique numbers) from the set {1, ..., n}, what you want is the first k elements in a random permutation of [n]. The most elegant way to generate such a random permutation is by using the Knuth shuffle. See here: http://en.wikipedia.org/wiki/Knuth_shuffle
grab unique random numbers I ponder?
Make an array of N unique elements (integers in range 0..N-1, for example), store N as arraySize and initialArraySize (arraySize = N; initialArraySize = N)
When random number is requested:
2.1 if arraySize is zero, then arraySize = initialArraySize
2.1 Generate index = getRandomNuber()%arraySize
2.3 result = array[index]. Do not return result yet.
2.2 swap array[index] with array[arraySize-1]. Swap means "exchange" c = array[index]; array[index] = array[arraySize-1]; array[arraySize-1] = c
2.3 decrease arraySize by 1.
2.4 return result.
You'll get a list of random numbers that won't repeat until you run out of unique values. O(1) complexity.
An n-bit Maximal Period Linear Shift Feedback Register (LFSR) will cycle through all of its (2^n -1) internal states before an internal state is repeated. A LFSR is a Maximal Period LFSR if and only if the polynomial formed from a tap sequence plus 1 is a primitive polynomial mod 2.
Thus, an n-bit Maximal Period LFSR will provide you with a sequence of (2^n - 1) unique random numbers, each one of them is n-bit long.
A LFSR is very elegant.
Since you're imposing uniqueness, then a pseudorandom generator should be sufficient, which can be configured to not repeat for as long a sequence as you probably need. Eg, an LCG: if seed is uint32 and initially 0, then use (1664525 * seed) + 1013904223 for the next seed and take the low word for your unrepeated 16-bit result.

Find the most common entry in an array

You are given a 32-bit unsigned integer array with length up to 232, with the property that more than half of the entries in the array are equal to N, for some 32-bit unsigned integer N. Find N looking at each number in the array only once and using at most 2 kB of memory.
Your solution must be deterministic, and guaranteed to find N.
Keep one integer for each bit, and increment this collection appropriately for each integer in the array.
At the end, some of the bits will have a count higher than half the length of the array - those bits determine N. Of course, the count will be higher than the number of times N occurred, but that doesn't matter. The important thing is that any bit which isn't part of N cannot occur more than half the times (because N has over half the entries) and any bit which is part of N must occur more than half the times (because it will occur every time N occurs, and any extras).
(No code at the moment - about to lose net access. Hopefully the above is clear enough though.)
Boyer and Moore's "Linear Time Majority Vote Algorithm" - go down the array maintaining your current guess at the answer.
You can do this with only two variables.
public uint MostCommon(UInt32[] numberList)
{
uint suspect = 0;
int suspicionStrength = -1;
foreach (uint number in numberList)
{
if (number==suspect)
{
suspicionStrength++;
}
else
{
suspicionStrength--;
}
if (suspicionStrength<=0)
{
suspect = number;
}
}
return suspect;
}
Make the first number the suspect number, and continue looping through the list. If the number matches, increase the suspicion strength by one; if it doesn't match, lower the suspicion strength by one. If the suspicion strength hits 0 the current number becomes the suspect number. This will not work to find the most common number, only a number that is more than 50% of the group. Resist the urge to add a check if suspicionStrength is greater than half the list length - it will always result in more total comparisons.
P.S. I have not tested this code - use it at your own peril.
Pseudo code (notepad C++ :-)) for Jon's algorithm:
int lNumbers = (size_of(arrNumbers)/size_of(arrNumbers[0]);
for (int i = 0; i < lNumbers; i++)
for (int bi = 0; bi < 32; bi++)
arrBits[i] = arrBits[i] + (arrNumbers[i] & (1 << bi)) == (1 << bi) ? 1 : 0;
int N = 0;
for (int bc = 0; bc < 32; bc++)
if (arrBits[bc] > lNumbers/2)
N = N | (1 << bc);
Notice that if the sequence a0, a1, . . . , an−1 contains a leader, then after removing a pair of
elements of different values, the remaining sequence still has the same leader. Indeed, if we
remove two different elements then only one of them could be the leader. The leader in the
new sequence occurs more than n/2 − 1 = (n−2)/2
times. Consequently, it is still the leader of the
new sequence of n − 2 elements.
Here is a Python implementation, with O(n) time complexity:
def goldenLeader(A):
n = len(A)
size = 0
for k in xrange(n):
if (size == 0):
size += 1
value = A[k]
else:
if (value != A[k]):
size -= 1
else:
size += 1
candidate = -1
if (size > 0):
candidate = value
leader = -1
count = 0
for k in xrange(n):
if (A[k] == candidate):
count += 1
if (count > n // 2):
leader = candidate
return leader
This is a standard problem in streaming algorithms (where you have a huge (potentially infinite) stream of data) and you have to calculate some statistics from this stream, passing through this stream once.
Clearly you can approach it with hashing or sorting, but with potentially infinite stream you clearly run out of memory. So you have to do something smart here.
The majority element is the element that occurs more than half of the size of the array. This means that the majority element occurs more than all other elements combined or if you count the number of times, majority element appears, and subtract the number of all other elements, you will get a positive number.
So if you count the number of some element, and subtract the number of all other elements and get the number 0 - then your original element can't be a majority element. This if the basis for a correct algorithm:
Have two variables, counter and possible element. Iterate the stream, if the counter is 0 - your overwrite the possible element and initialize the counter, if the number is the same as possible element - increase the counter, otherwise decrease it. Python code:
def majority_element(arr):
counter, possible_element = 0, None
for i in arr:
if counter == 0:
possible_element, counter = i, 1
elif i == possible_element:
counter += 1
else:
counter -= 1
return possible_element
It is clear to see that the algorithm is O(n) with a very small constant before O(n) (like 3). Also it looks like the space complexity is O(1), because we have only three variable initialized. The problem is that one of these variables is a counter which potentially can grow up to n (when the array consists of the same numbers). And to store the number n you need O(log (n)) space. So from theoretical point of view it is O(n) time and O(log(n)) space. From practical, you can fit 2^128 number in a longint and this number of elements in the array is unimaginably huge.
Also note that the algorithm works only if there is a majority element. If such element does not exist it will still return some number, which will surely be wrong. (it is easy to modify the algorithm to tell whether the majority element exists)
History channel: this algorithm was invented somewhere in 1982 by Boyer, Moore and called Boyer–Moore majority vote algorithm.
I have recollections of this algorithm, which might or might not follow the 2K rule. It might need to be rewritten with stacks and the like to avoid breaking the memory limits due to function calls, but this might be unneeded since it only ever has a logarithmic number of such calls. Anyhow, I have vague recollections from college or a recursive solution to this which involved divide and conquer, the secret being that when you divide the groups in half, at least one of the halves still has more than half of its values equal to the max. The basic rule when dividing is that you return two candidate top values, one of which is the top value and one of which is some other value (that may or may not be 2nd place). I forget the algorithm itself.
Proof of correctness for buti-oxa / Jason Hernandez's answer, assuming Jason's answer is the same as buti-oxa's answer and both work the way the algorithm described should work:
We define adjusted suspicion strength as being equal to suspicion strength if top value is selected or -suspicion strength if top value is not selected. Every time you pick the right number, the current adjusted suspicion strength increases by 1. Each time you pick a wrong number, it either drops by 1 or increases by 1, depending on if the wrong number is currently selected. So, the minimum possible ending adjusted suspicion strength is equal to number-of[top values] - number-of[other values]

Resources