How does "&" work for numeric comparison? - ruby

def power_of_two?(n)
n & (n-1) == 0
end
This method checks if a given number n is a power of two.
How does this work? I don't understand the usage of &

& is called Bitwise AND operator.
The AND operator walks through the binary representation of two supplied integers bit by bit. If the bits at the same position in both integers are 1 the resulting integer will have that bit set to 1. If not, the bit will be set to 0:
(a = 18).to_s(2) #=> "10010"
(b = 20).to_s(2) #=> "10100"
(a & b).to_s(2) #=> "10000"
if the number is a power of two already, then one less will result in a binary number that only has the lower-order bits set. Using & there will do nothing.
Example with 8: 0100 & (0100 - 1) --> (0100 & 0011) --> 0000
To understand it follow "How does this bitwise operation check for a power of 2?".
Example through IRB:
>> 4.to_s(2)
=> "100"
>> 3.to_s(2)
=> "11"
>> 4 & 3
=> 0
>>
This is why you can say 4 is power of 2 number.

The "&" is a bit-wise "AND" (see http://calleerlandsson.com/2014/02/06/rubys-bitwise-operators/) operator. It compares two numbers, as explained in the following example:
Suppose that n=4 (which is a power of two). This means that n-1=3. In binary (which I'm writing with ones and zeros in quotes like "1101011101" so we can see the bits) we have n="100" and n-1="011".
The bit-wise AND of these two numbers is 0="000" (in the following, each column only contains a single 1, never two 1s)
100 <-- this is n, n=4
011 <-- this is n-1, n-1=3
---
000 <-- this is n & (n-1)
As another example, now lets say that n=14 (not a power of two) and so n-1=13. In that case n="1110" and n-1="1101", and we have n & (n-1) = 12
1110 <-- this is n, n=14
1101 <-- this is n-1, n-1=13
----
1100 <-- this is n & (n-1)
In the above example, the first two columns of n and n-1 both contain a 1, thus the AND of those columns is one.
Okay, lets consider one final example where n is again a power of two (this should make it abundently clear if it is not already why "poweroftwo?" is written as it is. Suppose n=16 (which is a power of two).
Suppose that n=16 (which is a power of two). This means that n-1=15 so we have n="10000" and n-1="01111".
The bit-wise AND of these two numbers is 0="00000" (in the following, each column only contains a single 1, never two 1s)
10000 <-- this is n, n=16
01111 <-- this is n-1, n-1=15
---
00000 <-- this is n & (n-1)
Caveat: In the special case that n=0, the function "power_of_two?" will return True even though n=0 is not a power of two. This is because 0 is represented as a bit string of all zeros, and anything ANDed with zero is zero.
So, in general, the function "power_of_two?" will return True if and only if n is a power of two or n is zero. The above examples only illustrate this fact, they do not prove it... However, it is the case.

We wish to prove that
n & (n-1) == 0
if and only if n is a power of 2.
We may assume that n is an integer greater than 1. (In fact, I will use this assumption to obtain a contraction.)
If n is a power of 2, its binary representation has 1 at bit-offset
p = log2(n)
and 0s at all lower-order bit positions j, j < p. Moreover, since (n-1)+1 = n, n-1 must have 1's at all bit offsets j, 0 <= j < p. Therefore,
n & (n-1) == 0
It remains to prove that if n is not a power of 2 and
n & m == 0
then m != n-1. I assume that m = n-1 and will obtain a contraction, thereby completing the proof.
n's most significant bit is of course 1. Since n is not a power of 2, n has at least one other bit equal to 1. Among those 1-bits, consider the one at the most significant bit position j.
Since n & (n-1) == 0, n-1 must have a 0 at position j of its binary representation. When we add 1 to n-1, to make it equal n, it must have a 1 at offset j, meaning that n-1 must have 1's in all bit positions < j. Moreover, (n-1)+1 has zeroes in all bit positions < j after 1 is added. But since n = (n-1)+1, that can only be true if j == 0, since n & (n-1) == 0. Hence, for this to be true, n's most-significant and least-significant bits most both equal 1 and all other bits must equal zero. However, since n = (n-1)+1, that would imply n-1==0 and hence that n == 1, the needed contradiction.
(Whew! There's got to be an easier proof!)

The procedure for decreasing one from a binary number is, starting from the least significant bit:
if the bit is 0 - turn it into 1 and continue to the next significant bit
if the bit is 1 - turn it into 0 and stop.
This means that if there is more than one 1 digit in a number not all digits will be toggled (since you stopped before you got the most significant bit).
Let us say the first 1 in our number n is at position i. If we shift right the number n we'll get the part of the number which did not change when we decreased one, let's call that m. If we shift the number n-1 we should get the same number m, exactly because it is the part that did not change when we decreased one:
n >> i == m
(n - 1) >> i == m
Shifting right two numbers by the same amount will also shift right by the same amount the result of &ing them:
(n >> i) & ((n - 1) >> i) == 0 >> i
But 0 >> i is 0, no matter the i, so:
(n >> i) & ((n - 1) >> i) == 0
Let's put m where we know it is:
m & m == 0
But we also know that:
m & m == m # for any m
So m == 0!
Therefore n & (n - 1) == 0 if and only if there is at most one 1 bit in the number n.
The only numbers which have at most one 1 bit are all the (non-negative) powers of 2 (a leading 1 and a non-negative number of zeroes after it), and the number 0.
QED

In the case of a power of two, it takes the binary form of a single bit of value 1 followed by zeros. Any such value when decremented will take the form of a run of 1's, thus when using the bitwise-and, since it's necessarily less than the former, it will mask it out. E.g.
0b1000 & (0b1000 - 1) = 0b1000 & 0b111 = 0
So, anything (num - 1) might become, the key here is touching the highest bit of num, by decreasing it, we clear it out.
On the other hand, if a number is not a power of two, the result must be non-zero.
The reason behind is that the operation can always be carried without touching the highest bit, because there will always be a non-zero bit in the way, and so at least the highest bit makes it's way to the mask and will show up in the result.

Related

How to get the intuition behind the solution?

I was solving the below problem from USACO training. I found this really fast solution for which, I am finding it unable to absorb fully.
Problem: Consider an ordered set S of strings of N (1 <= N <= 31) bits. Bits, of course, are either 0 or 1.
This set of strings is interesting because it is ordered and contains all possible strings of length N that have L (1 <= L <= N) or fewer bits that are `1'.
Your task is to read a number I (1 <= I <= sizeof(S)) from the input and print the Ith element of the ordered set for N bits with no more than L bits that are `1'.
sample input: 5 3 19
output: 10110
The two solutions I could think of:
Firstly the brute force solution which goes through all possible combinations of bits, selects and stores the strings whose count of '1's are less than equal to 'L' and returning the Ith string.
Secondly, we can find all the permutations of '1's from 5 positions with range of count(0 to L), sort the strings in increasing order and returning the Ith string.
The best Solution:
The OP who posted the solution has used combination instead of permutation. According to him, the total number of string possible is 5C0 + 5C1 + 5C2 + 5C3.
So at every position i of the string, we decide whether to include the ith bit in our output or not, based on the total number of ways we have to build the rest of the string. Below is a dry run of the entire approach for the above input.
N = 5, L = 3, I = 19
00000
at i = 0, for the rem string, we have 4C0 + 4C1 + 4C2 + 4C3 = 15
It says that, there are 15 other numbers possible with the last 4 positions. as 15 is less than 19, our first bit has to be set.
N = 5, L = 2, I = 4
10000
at i = 1, we have 3C0 + 3C1 + 3C2 (as we have used 1 from L) = 7
as 7 is greater than 4, we cannot set this bit.
N = 5, L = 2, I = 4
10000
at i = 2 we have 2C0 + 2C2 = 2
as 2 <= I(4), we take this bit in our output.
N = 5, L = 1, I = 2
10100
at i = 3, we have 1C0 + 1C1 = 2
as 2 <= I(2) we can take this bit in our output.
as L == 0, we stop and 10110 is our answer. I was amazed to find this solution. However, I am finding it difficult to get the intuition behind this solution.
How does this solution sort-of zero in directly to the Ith number in the set?
Why does the order of the bits not matter in the combinations of set bits?
Suppose we have precomputed the number of strings of length n with k or fewer bits set. Call that S(n, k).
Now suppose we want the i'th string (in lexicographic order) of length N with L or fewer bits set.
All the strings with the most significant bit zero come before those with the most significant bit 1. There's S(N-1, L) strings with the most significant bit zero, and S(N-1, L-1) strings with the most significant bit 1. So if we want the i'th string, if i<=S(N-1, L), then it must have the top bit zero and the remainder must be the i'th string of length N-1 with at most L bits set, and otherwise it must have the top bit one, and the remainder must be the (i-S(N-1, L))'th string of length N-1 with at most L-1 bits set.
All that remains to code is to precompute S(n, k), and to handle the base cases.
You can figure out a combinatorial solution to S(n, k) as your friend did, but it's more practical to use a recurrence relation: S(n, k) = S(n-1, k) + S(n-1, k-1), and S(0, k) = S(n, 0) = 1.
Here's code that does all that, and as an example prints out all 8-bit numbers with 3 or fewer bits set, in lexicographic order. If i is out of range, then it raises an IndexError exception, although in your question you assume i is always in range, so perhaps that's not necessary.
S = [[1] * 32 for _ in range(32)]
for n in range(1, 32):
for k in range(1, 32):
S[n][k] = S[n-1][k] + S[n-1][k-1]
def ith_string(n, k, i):
if n == 0:
if i != 1:
raise IndexError
return ''
elif i <= S[n-1][k]:
return "0" + ith_string(n-1, k, i)
elif k == 0:
raise IndexError
else:
return "1" + ith_string(n-1, k-1, i - S[n-1][k])
print([ith_string(8, 3, i) for i in range(1, 94)])

count of even numbers having more exponent of 2

Suppose I have given a number n. I want to find out all then even numbers which are less than n, and also have a greater exponent of 2 in its prime factorization than that of the exponent of 2 in the prime factorization of n.
if n=18 answer is 4 i.e, 4,8,12,16.
Using a for loop from i=2 to less than n and checking for every i will show time limit exceeded in the code.
My approach is to count no of times i will continue to divide by 2. But constraints of n=10^18. So, i think its a O (1) operation . Can anyone help me to find any formula or algorithm to find the answer as fast as possible?
First assume n is an odd number. Obviously every even number less than n also has a greater exponent of 2 in its factorization, so the answer will be equal to (n−1) / 2.
Now suppose n is equal to 2 times some odd number p. There are (p−1) / 2 even numbers that are smaller than p, so it follows that there are also (p−1) / 2 numbers smaller than n that are divisible by at least 22.
In general, given any number n that is equal to 2k times some odd number q, there will be (q−1) / 2 numbers that are smaller than n and have a larger exponent of 2 (> 2k) in their factorization.
So a function like this should work:
def count_smaller_numbers_with_greater_power_of_2_as_a_factor(n):
assert n > 0
while n % 2 == 0:
n >>= 1
return (n-1) // 2
Example 1 (n = 18)
Since n is even, keep dividing it by 2 until you get an odd number. This only takes one step (because n / 2 = 9)
Count the number of even numbers that are less than 9. This is equal to (9−1) / 2 = 4
Example 2 (n = 1018)
In this case, n = 218 × 518. So if we keep halving n until we get an odd number, the result will be 518.
The number of even numbers that are less than 518 is equal to (518−1) / 2 = 1907348632812
Your division is limited by constant number 64 (for 10^18~2^64), and O(64)=O(1) in complexity theory.
Number of two's in value factorization is equal to the number of trailing zero bits in binary representation of this value, so you can use bit operations (like & 1 and right shift shr, >>) to accelerate code a bit or apply some bit tricks
First, suppose n = 2^k * something. Find out k:
long k = 0;
while(n % 2 == 0) { n >>= 1; k++; }
n <<= k;
Now that you know who is k, multiply 2^k by 2 to get the first power of 2 greater than 2^k:
long next_power = 1 << (k + 1); // same as 2^(k + 1)
And lastly, check if n is odd. If it isn't, print all the multiples of next_power:
if(k == 0){ //equivalent to testing n % 2 == 0
for(long i = next_power; i < n; i += next_power) cout<<i<<endl;
}
EXAMPLE: n = 18
k will be 1, because 18 = 2^1 * 9 and the while will finish there.
next_power will be 4 (= 1 << (k + 1) = 2 ^ (k + 1)).
for(long i = next_power; i < n; i += next_power) cout<<i<<endl; will print 4, 8, 12 and 16.
This is very easy to do with a gcd trick i found:
You can find the count by //4. So 10^18 has
In [298]: pow(10,18)//4
Out[298]: 250000000000000000
You can find the count of 18 by //4 which is 4
Fan any numbers that meet your criteria. You can check by using my
algorithm here, and taking the len of the array and conpare with the
number div//4 to see that that is the answer your looking for: an exact
match. You'll notice that it's every four numbers that don't have an
exponent of 2. So the count of numbers can be found with //4.
import math
def lars_last_modulus_powers_of_two(hm):
return math.gcd(hm, 1<<hm.bit_length())
def findevennumberswithexponentgreaterthan2lessthannum(hm):
if hm %2 != 0:
return "only for use of even numbers"
vv = []
for x in range(hm,1,-2):
if lars_last_modulus_powers_of_two(x) != 2:
vv.append(x)
return vv
Result:
In [3132]: findevennumberswithexponentgreaterthan2lessthannum(18)
Out[3132]: [16, 12, 8, 4]
This is the fastest way to do it as you skip the mod down the path to get the answer. Instantly get the number you need with lars_last_modulus_powers_of_two(num) which is one operation per number.
Here is some example to show the answer is right:
In [302]: len(findevennumberswithexponentgreaterthan2lessthannum(100))
Out[302]: 25
In [303]: 100//4
Out[303]: 25
In [304]: len(findevennumberswithexponentgreaterthan2lessthannum(1000))
Out[304]: 250
In [305]: 1000//4
Out[305]: 250
In [306]: len(findevennumberswithexponentgreaterthan2lessthannum(23424))
Out[306]: 5856
In [307]: 23424//4
Out[307]: 5856

Number of different binary sequences of length n generated using exactly k flip operations

Consider a binary sequence b of length N. Initially, all the bits are set to 0. We define a flip operation with 2 arguments, flip(L,R), such that:
All bits with indices between L and R are "flipped", meaning a bit with value 1 becomes a bit with value 0 and vice-versa. More exactly, for all i in range [L,R]: b[i] = !b[i].
Nothing happens to bits outside the specified range.
You are asked to determine the number of possible different sequences that can be obtained using exactly K flip operations modulo an arbitrary given number, let's call it MOD.
More specifically, each test contains on the first line a number T, the number of queries to be given. Then there are T queries, each one being of the form N, K, MOD with the meaning from above.
1 ≤ N, K ≤ 300 000
T ≤ 250
2 ≤ MOD ≤ 1 000 000 007
Sum of all N-s in a test is ≤ 600 000
time limit: 2 seconds
memory limit: 65536 kbytes
Example :
Input :
1
2 1 1000
Output :
3
Explanation :
There is a single query. The initial sequence is 00. We can do the following operations :
flip(1,1) ⇒ 10
flip(2,2) ⇒ 01
flip(1,2) ⇒ 11
So there are 3 possible sequences that can be generated using exactly 1 flip.
Some quick observations that I've made, although I'm not sure they are totally correct :
If K is big enough, that is if we have a big enough number of flips at our disposal, we should be able to obtain 2n sequences.
If K=1, then the result we're looking for is N(N+1)/2. It's also C(n,1)+C(n,2), where C is the binomial coefficient.
Currently trying a brute force approach to see if I can spot a rule of some kind. I think this is a sum of some binomial coefficients, but I'm not sure.
I've also come across a somewhat simpler variant of this problem, where the flip operation only flips a single specified bit. In that case, the result is
C(n,k)+C(n,k-2)+C(n,k-4)+...+C(n,(1 or 0)). Of course, there's the special case where k > n, but it's not a huge difference. Anyway, it's pretty easy to understand why that happens.I guess it's worth noting.
Here are a few ideas:
We may assume that no flip operation occurs twice (otherwise, we can assume that it did not happen). It does affect the number of operations, but I'll talk about it later.
We may assume that no two segments intersect. Indeed, if L1 < L2 < R1 < R2, we can just do the (L1, L2 - 1) and (R1 + 1, R2) flips instead. The case when one segment is inside the other is handled similarly.
We may also assume that no two segments touch each other. Otherwise, we can glue them together and reduce the number of operations.
These observations give the following formula for the number of different sequences one can obtain by flipping exactly k segments without "redundant" flips: C(n + 1, 2 * k) (we choose 2 * k ends of segments. They are always different. The left end is exclusive).
If we had perform no more than K flips, the answer would be sum for k = 0...K of C(n + 1, 2 * k)
Intuitively, it seems that its possible to transform the sequence of no more than K flips into a sequence of exactly K flips (for instance, we can flip the same segment two more times and add 2 operations. We can also split a segment of more than two elements into two segments and add one operation).
By running the brute force search (I know that it's not a real proof, but looks correct combined with the observations mentioned above) that the answer this sum minus 1 if n or k is equal to 1 and exactly the sum otherwise.
That is, the result is C(n + 1, 0) + C(n + 1, 2) + ... + C(n + 1, 2 * K) - d, where d = 1 if n = 1 or k = 1 and 0 otherwise.
Here is code I used to look for patterns running a brute force search and to verify that the formula is correct for small n and k:
reachable = set()
was = set()
def other(c):
"""
returns '1' if c == '0' and '0' otherwise
"""
return '0' if c == '1' else '1'
def flipped(s, l, r):
"""
Flips the [l, r] segment of the string s and returns the result
"""
res = s[:l]
for i in range(l, r + 1):
res += other(s[i])
res += s[r + 1:]
return res
def go(xs, k):
"""
Exhaustive search. was is used to speed up the search to avoid checking the
same string with the same number of remaining operations twice.
"""
p = (xs, k)
if p in was:
return
was.add(p)
if k == 0:
reachable.add(xs)
return
for l in range(len(xs)):
for r in range(l, len(xs)):
go(flipped(xs, l, r), k - 1)
def calc_naive(n, k):
"""
Counts the number of reachable sequences by running an exhaustive search
"""
xs = '0' * n
global reachable
global was
was = set()
reachable = set()
go(xs, k)
return len(reachable)
def fact(n):
return 1 if n == 0 else n * fact(n - 1)
def cnk(n, k):
if k > n:
return 0
return fact(n) // fact(k) // fact(n - k)
def solve(n, k):
"""
Uses the formula shown above to compute the answer
"""
res = 0
for i in range(k + 1):
res += cnk(n + 1, 2 * i)
if k == 1 or n == 1:
res -= 1
return res
if __name__ == '__main__':
# Checks that the formula gives the right answer for small values of n and k
for n in range(1, 11):
for k in range(1, 11):
assert calc_naive(n, k) == solve(n, k)
This solution is much better than the exhaustive search. For instance, it can run in O(N * K) time per test case if we compute the coefficients using Pascal's triangle. Unfortunately, it is not fast enough. I know how to solve it more efficiently for prime MOD (using Lucas' theorem), but O do not have a solution in general case.
Multiplicative modular inverses can't solve this problem immediately as k! or (n - k)! may not have an inverse modulo MOD.
Note: I assumed that C(n, m) is defined for all non-negative n and m and is equal to 0 if n < m.
I think I know how to solve it for an arbitrary MOD now.
Let's factorize the MOD into prime factors p1^a1 * p2^a2 * ... * pn^an. Now can solve this problem for each prime factor independently and combine the result using the Chinese remainder theorem.
Let's fix a prime p. Let's assume that p^a|MOD (that is, we need to get the result modulo p^a). We can precompute all p-free parts of the factorial and the maximum power of p that divides the factorial for all 0 <= n <= N in linear time using something like this:
powers = [0] * (N + 1)
p_free = [i for i in range(N + 1)]
p_free[0] = 1
for cur_p in powers of p <= N:
i = cur_p
while i < N:
powers[i] += 1
p_free[i] /= p
i += cur_p
Now the p-free part of the factorial is the product of p_free[i] for all i <= n and the power of p that divides n! is the prefix sum of the powers.
Now we can divide two factorials: the p-free part is coprime with p^a so it always has an inverse. The powers of p are just subtracted.
We're almost there. One more observation: we can precompute the inverses of p-free parts in linear time. Let's compute the inverse for the p-free part of N! using Euclid's algorithm. Now we can iterate over all i from N to 0. The inverse of the p-free part of i! is the inverse for i + 1 times p_free[i] (it's easy to prove it if we rewrite the inverse of the p-free part as a product using the fact that elements coprime with p^a form an abelian group under multiplication).
This algorithm runs in O(N * number_of_prime_factors + the time to solve the system using the Chinese remainder theorem + sqrt(MOD)) time per test case. Now it looks good enough.
You're on a good path with binomial-coefficients already. There are several factors to consider:
Think of your number as a binary-string of length n. Now we can create another array counting the number of times a bit will be flipped:
[0, 1, 0, 0, 1] number
[a, b, c, d, e] number of flips.
But even numbers of flips all lead to the same result and so do all odd numbers of flips. So basically the relevant part of the distribution can be represented %2
Logical next question: How many different combinations of even and odd values are available. We'll take care of the ordering later on, for now just assume the flipping-array is ordered descending for simplicity. We start of with k as the only flipping-number in the array. Now we want to add a flip. Since the whole flipping-array is used %2, we need to remove two from the value of k to achieve this and insert them into the array separately. E.g.:
[5, 0, 0, 0] mod 2 [1, 0, 0, 0]
[3, 1, 1, 0] [1, 1, 1, 0]
[4, 1, 0, 0] [0, 1, 0, 0]
As the last example shows (remember we're operating modulo 2 in the final result), moving a single 1 doesn't change the number of flips in the final outcome. Thus we always have to flip an even number bits in the flipping-array. If k is even, so will the number of flipped bits be and same applies vice versa, no matter what the value of n is.
So now the question is of course how many different ways of filling the array are available? For simplicity we'll start with mod 2 right away.
Obviously we start with 1 flipped bit, if k is odd, otherwise with 1. And we always add 2 flipped bits. We can continue with this until we either have flipped all n bits (or at least as many as we can flip)
v = (k % 2 == n % 2) ? n : n - 1
or we can't spread k further over the array.
v = k
Putting this together:
noOfAvailableFlips:
if k < n:
return k
else:
return (k % 2 == n % 2) ? n : n - 1
So far so well, there are always v / 2 flipping-arrays (mod 2) that differ by the number of flipped bits. Now we come to the next part permuting these arrays. This is just a simple permutation-function (permutation with repetition to be precise):
flipArrayNo(flippedbits):
return factorial(n) / (factorial(flippedbits) * factorial(n - flippedbits)
Putting it all together:
solutionsByFlipping(n, k):
res = 0
for i in [k % 2, noOfAvailableFlips(), step=2]:
res += flipArrayNo(i)
return res
This also shows that for sufficiently large numbers we can't obtain 2^n sequences for the simply reason that we can not arrange operations as we please. The number of flips that actually affect the outcome will always be either even or odd depending upon k. There's no way around this. The best result one can get is 2^(n-1) sequences.
For completeness, here's a dynamic program. It can deal easily with arbitrary modulo since it is based on sums, but unfortunately I haven't found a way to speed it beyond O(n * k).
Let a[n][k] be the number of binary strings of length n with k non-adjacent blocks of contiguous 1s that end in 1. Let b[n][k] be the number of binary strings of length n with k non-adjacent blocks of contiguous 1s that end in 0.
Then:
# we can append 1 to any arrangement of k non-adjacent blocks of contiguous 1's
# that ends in 1, or to any arrangement of (k-1) non-adjacent blocks of contiguous
# 1's that ends in 0:
a[n][k] = a[n - 1][k] + b[n - 1][k - 1]
# we can append 0 to any arrangement of k non-adjacent blocks of contiguous 1's
# that ends in either 0 or 1:
b[n][k] = b[n - 1][k] + a[n - 1][k]
# complete answer would be sum (a[n][i] + b[n][i]) for i = 0 to k
I wonder if the following observations might be useful: (1) a[n][k] and b[n][k] are zero when n < 2*k - 1, and (2) on the flip side, for values of k greater than ⌊(n + 1) / 2⌋ the overall answer seems to be identical.
Python code (full matrices are defined for simplicity, but I think only one row of each would actually be needed, space-wise, for a bottom-up method):
a = [[0] * 11 for i in range(0,11)]
b = [([1] + [0] * 10) for i in range(0,11)]
def f(n,k):
return fa(n,k) + fb(n,k)
def fa(n,k):
global a
if a[n][k] or n == 0 or k == 0:
return a[n][k]
elif n == 2*k - 1:
a[n][k] = 1
return 1
else:
a[n][k] = fb(n-1,k-1) + fa(n-1,k)
return a[n][k]
def fb(n,k):
global b
if b[n][k] or n == 0 or n == 2*k - 1:
return b[n][k]
else:
b[n][k] = fb(n-1,k) + fa(n-1,k)
return b[n][k]
def g(n,k):
return sum([f(n,i) for i in range(0,k+1)])
# example
print(g(10,10))
for i in range(0,11):
print(a[i])
print()
for i in range(0,11):
print(b[i])

OR of pairwise XOR

Can anyone suggest me a O(n) algorithm for OR the result of pairwise XOR operations on N numbers for example let N=3 and numbers be 5,7 and 9
5 ^ 7 = 2, 7 ^ 9 = 14, 9 ^ 5 = 12
2|14|12 = 14
^ for XOR operation and | for OR operation
If A[i] differs from A[j] in k-th bit then:
either A[i] differs from A[1] in k-th bit
or A[j] differs from A[1] in k-th bit
So, N operations are enough:
// A[i] = array of source numbers, i=1..N
Result = 0
for i = 2 to N
Result = Result OR (A[i] XOR A[1])
print(Result)
The bits are independent, so you can compute the result for each bit separately.
So given a sequence of bits, we want to now the OR of all pairswise XORs. The OR is 1 iif at least one of the bitwise XORs is 1. This is the case iif you have at least one 0 and at least one 1 in your input sequence. It is very easy to check whether this is the case.
Total time: O(wN) where w is the maximum bit length of any of your input numbers.
Assuming that w is smaller than the word size, we can also solve it in O(N): Just solve all bits in parallel:
have_one = 0
have_zero = 0
for x in input:
have_one |= x
have_zero |= ~x
return have_one & have_zero
have_one has a bit set exactly at those positions where we have a 1 somewhere in the input. have_zero has a bit set exactly at those positions where we have a 0 somewhere in the input. The result is 1 for those positions where both is the case.

Levenstein distance from particular group of numbers

My input are three numbers - a number s and the beginning b and end e of a range with 0 <= s,b,e <= 10^1000. The task is to find the minimal Levenstein distance between s and all numbers in range [b, e]. It is not necessary to find the number minimizing the distance, the minimal distance is sufficient.
Obviously I have to read the numbers as string, because standard C++ type will not handle such large numbers. Calculating the Levenstein distance for every number in the possibly huge range is not feasible.
Any ideas?
[EDIT 10/8/2013: Some cases considered in the DP algorithm actually don't need to be considered after all, though considering them does not lead to incorrectness :)]
In the following I describe an algorithm that takes O(N^2) time, where N is the largest number of digits in any of b, e, or s. Since all these numbers are limited to 1000 digits, this means at most a few million basic operations, which will take milliseconds on any modern CPU.
Suppose s has n digits. In the following, "between" means "inclusive"; I will say "strictly between" if I mean "excluding its endpoints". Indices are 1-based. x[i] means the ith digit of x, so e.g. x[1] is its first digit.
Splitting up the problem
The first thing to do is to break up the problem into a series of subproblems in which each b and e have the same number of digits. Suppose e has k >= 0 more digits than s: break up the problem into k+1 subproblems. E.g. if b = 5 and e = 14032, create the following subproblems:
b = 5, e = 9
b = 10, e = 99
b = 100, e = 999
b = 1000, e = 9999
b = 10000, e = 14032
We can solve each of these subproblems, and take the minimum solution.
The easy cases: the middle
The easy cases are the ones in the middle. Whenever e has k >= 1 more digits than b, there will be k-1 subproblems (e.g. 3 above) in which b is a power of 10 and e is the next power of 10, minus 1. Suppose b is 10^m. Notice that choosing any digit between 1 and 9, followed by any m digits between 0 and 9, produces a number x that is in the range b <= x <= e. Furthermore there are no numbers in this range that cannot be produced this way. The minimum Levenshtein distance between s (or in fact any given length-n digit string that doesn't start with a 0) and any number x in the range 10^m <= x <= 10^(m+1)-1 is necessarily abs(m+1-n), since if m+1 >= n it's possible to simply choose the first n digits of x to be the same as those in s, and delete the remainder, and if m+1 < n then choose the first m+1 to be the same as those in s and insert the remainder.
In fact we can deal with all these subproblems in a single constant-time operation: if the smallest "easy" subproblem has b = 10^m and the largest "easy" subproblem has b = 10^u, then the minimum Levenshtein distance between s and any number in any of these ranges is m-n if n < m, n-u if n > u, and 0 otherwise.
The hard cases: the end(s)
The hard cases are when b and e are not restricted to have the form b = 10^m and e = 10^(m+1)-1 respectively. Any master problem can generate at most two subproblems like this: either two "ends" (resulting from a master problem in which b and e have different numbers of digits, such as the example at the top) or a single subproblem (i.e. the master problem itself, which didn't need to be subdivided at all because b and e already have the same number of digits). Note that due to the previous splitting of the problem, we can assume that the subproblem's b and e have the same number of digits, which we will call m.
Super-Levenshtein!
What we will do is design a variation of the Levenshtein DP matrix that calculates the minimum Levenshtein distance between a given digit string (s) and any number x in the range b <= x <= e. Despite this added "power", the algorithm will still run in O(n^2) time :)
First, observe that if b and e have the same number of digits and b != e, then it must be the case that they consist of some number q >= 0 of identical digits at the left, followed by a digit that is larger in e than in b. Now consider the following procedure for generating a random digit string x:
Set x to the first q digits of b.
Append a randomly-chosen digit d between b[i] and e[i] to x.
If d == b[i], we "hug" the lower bound:
For i from q+1 to m:
If b[i] == 9 then append b[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that e[i] will be larger then b[i], and there is no digit larger than 9!]
Otherwise, flip a coin:
Heads: Append b[i].
Tails: Append a randomly-chosen digit d > b[i], then goto 6.
Stop.
Else if d == e[i], we "hug" the upper bound:
For i from q+1 to m:
If e[i] == 0 then append e[i]. [EDIT 10/8/2013: Actually this can't happen, because we chose q so that b[i] will be smaller then e[i], and there is no digit smaller than 0!]
Otherwise, flip a coin:
Heads: Append e[i].
Tails: Append a randomly-chosen digit d < e[i], then goto 6.
Stop.
Otherwise (if d is strictly between b[i] and e[i]), drop through to step 6.
Keep appending randomly-chosen digits to x until it has m digits.
The basic idea is that after including all the digits that you must include, you can either "hug" the lower bound's digits for as long as you want, or "hug" the upper bound's digits for as long as you want, and as soon as you decide to stop "hugging", you can thereafter choose any digits you want. For suitable random choices, this procedure will generate all and only the numbers x such that b <= x <= e.
In the "usual" Levenshtein distance computation between two strings s and x, of lengths n and m respectively, we have a rectangular grid from (0, 0) to (n, m), and at each grid point (i, j) we record the Levenshtein distance between the prefix s[1..i] and the prefix x[1..j]. The score at (i, j) is calculated from the scores at (i-1, j), (i, j-1) and (i-1, j-1) using bottom-up dynamic programming. To adapt this to treat x as one of a set of possible strings (specifically, a digit string corresponding to a number between b and e) instead of a particular given string, what we need to do is record not one but two scores for each grid point: one for the case where we assume that the digit at position j was chosen to hug the lower bound, and one where we assume it was chosen to hug the upper bound. The 3rd possibility (step 5 above) doesn't actually require space in the DP matrix because we can work out the minimal Levenshtein distance for the entire rest of the input string immediately, very similar to the way we work it out for the "easy" subproblems in the first section.
Super-Levenshtein DP recursion
Call the overall minimal score at grid point (i, j) v(i, j). Let diff(a, b) = 1 if characters a and b are different, and 0 otherwise. Let inrange(a, b..c) be 1 if the character a is in the range b..c, and 0 otherwise. The calculations are:
# The best Lev distance overall between s[1..i] and x[1..j]
v(i, j) = min(hb(i, j), he(i, j))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the lower bound
hb(i, j) = min(hb(i-1, j)+1, hb(i, j-1)+1, hb(i-1, j-1)+diff(s[i], b[j]))
# The best Lev distance between s[1..i] and x[1..j] obtainable by
# continuing to hug the upper bound
he(i, j) = min(he(i-1, j)+1, he(i, j-1)+1, he(i-1, j-1)+diff(s[i], e[j]))
At the point in time when v(i, j) is being calculated, we will also calculate the Levenshtein distance resulting from choosing to "stop hugging", i.e. by choosing a digit that is strictly in between b[j] and e[j] (if j == q) or (if j != q) is either above b[j] or below e[j], and thereafter freely choosing digits to make the suffix of x match the suffix of s as closely as possible:
# The best Lev distance possible between the ENTIRE STRINGS s and x, given that
# we choose to stop hugging at the jth digit of x, and have optimally aligned
# the first i digits of s to these j digits
sh(i, j) = if j >= q then shc(i, j)+abs(n-i-m+j)
else infinity
shc(i, j) = if j == q then
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..(e[j]-1)))
else
min(hb(i, j-1)+1, hb(i-1, j-1)+inrange(s[i], (b[j]+1)..9),
he(i, j-1)+1, he(i-1, j-1)+inrange(s[i], (0..(e[j]-1)))
The formula for shc(i, j) doesn't need to consider "downward" moves, since such moves don't involve any digit choice for x.
The overall minimal Levenshtein distance is the minimum of v(n, m) and sh(i, j), for all 0 <= i <= n and 0 <= j <= m.
Complexity
Take N to be the largest number of digits in any of s, b or e. The original problem can be split in linear time into at most 1 set of easy problems that collectively takes O(1) time to solve and 2 hard subproblems that each take O(N^2) time to solve using the super-Levenshtein algorithm, so overall the problem can be solved in O(N^2) time, i.e. time proportional to the square of the number of digits.
A first idea to speed up the computation (works if |e-b| is not too large):
Question: how much can the Levestein distance change when we compare s with n and then with n+1?
Answer: not too much!
Let's see the dynamic-programming tables for s = 12007 and two consecutive n
n = 12296
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 3
and
n = 12297
0 1 2 3 4 5
1 0 1 2 3 4
2 1 0 1 2 3
3 2 1 1 2 3
4 3 2 2 2 3
5 4 3 3 3 2
As you can see, only the last column changes, since n and n+1 have the same digits, except for the last one.
If you have the dynamic-programming table for the edit-distance of s = 12001 and n = 12296, you already have the table for n = 12297, you just need to update the last column!
Obviously if n = 12299 then n+1 = 12300 and you need to update the last 3 columns of the previous table.. but this happens just once every 100 iteration.
In general, you have to
update the last column on every iterations (so, length(s) cells)
update the second-to-last too, once every 10 iterations
update the third-to-last, too, once every 100 iterations
so let L = length(s) and D = e-b. First you compute the edit-distance between s and b. Then you can find the minimum Levenstein distance over [b,e] looping over every integer in the interval. There are D of them, so the execution time is about:
Now since
we have an algorithm wich is

Resources