Is this number a power of two? - algorithm

I have a number (in base 10) represented as a string with up to 10^6 digits. I want to check if this number is a power of two. One thing I can think of is binary search on exponents and using FFT and fast exponentiation algorithm, but it is quite long and complex to code. Let n denote the length of the input (i.e., the number of decimal digits in the input). What's the most efficient algorithm for solving this problem, as a function of n?

There are either two or three powers of 2 for any given size of a decimal number, and it is easy to guess what they are, since the size of the decimal number is a good approximation of its base 10 logarithm, and you can compute the base 2 logarithm by just multiplying by an appropriate constant (log210). So a binary search would be inefficient and unnecessary.
Once you have a trial exponent, which will be on the order of three million, you can use the squaring exponentiation algorithm with about 22 bugnum decimal multiplications. (And up to 21 doublings, but those are relatively easy.)
Depending on how often you do this check, you might want to invest in fast bignum code. But if it is infrequent, simple multiplication should be ok.
If you don't expect the numbers to be powers of 2, you could first do a quick computation mod 109 to see if the last 9 digits match. That will eliminate all but a tiny percentage of random numbers. Or, for an even faster but slightly weaker filter, using 64-bit arithmetic check that the last 20 digits are divisible by 220 and not by 10.

Here is an easy probabilistic solution.
Say your number is n, and we want to find k: n = 2^k. Obviously, k = log2(n) = log10(n) * log2(10). We can estimate log10(n) ~ len(n) and find k' = len(n) * log2(10) with a small error (say, |k - k'| <= 5, I didn't check but this should be enough). Probably you'll need this part in any solutions that can come in mind, it was mentioned in other answers as well.
Now let's check that n = 2^k for some known k. Select a random prime number P with from 2 to k^2. If remainders are not equal that k is definitely not a match. But what if they are equal? I claim that false positive rate is bounded by 2 log(k)/k.
Why it is so? Because if n = 2^k (mod P) then P divides D = n-2^k. The number D has length about k (because n and 2^k has similar magnitude due to the first part) and thus cannot have more than k distinct prime divisors. There are around k^2 / log(k^2) primes less than k^2, so a probability that you've picked a prime divisor of D at random is less than k / (k^2 / log(k^2)) = 2 log(k) / k.
In practice, primes up to 10^9 (or even up to log(n)) should suffice, but you have to do a bit deeper analysis to prove the probability.
This solution does not require any long arithmetics at all, all calculations could be made in 64-bit integers.
P.S. In order to select a random prime from 1 to T you may use the following logic: select a random number from 1 to T and increment it by one until it is prime. In this case the distribution on primes is not uniform and the former analysis is not completely correct, but it can be adapted to such kind of random as well.

i am not sure if its easy to apply, but i would do it in the following way:
1) show the number in binary. now if the number is a power of two, it would look like:
1000000....
with only one 1 and the rest are 0. checking this number would be easy. now the question is how is the number stored. for example, it could have leading zeroes that will harden the search for the 1:
...000010000....
if there are only small number of leading zeroes, just search from left to right. if the number of zeroes is unknown, we will have to...
2) binary search for the 1:
2a) cut in the middle.
2b) if both or neither of them are 0 (hopefully you can check if a number is zero in reasonable time), stop and return false. (false = not power of 2)
else continue with the non-zero part.
stop if the non-zero part = 1 and return true.
estimation: if the number is n digits (decimal), then its 2^n digits binary.
binary search takes O(log t), and since t = 2^n, log t = n. therefore the algorithm should take O(n).
assumptions:
1) you can access the binary view of the number.
2) you can compare a number to zero in a reasonable time.

Related

What is the efficiency of dividing N positive integers by a given power of 2?

For example:
Let's say N = 128.
We want to divide each positive integer up to and including N, by say, 8.
So we would perform integer division for:
1/8
2/8
3/8
...
127/8
128/8
In looking this up, I see that bit shift operations are the way to go and that any good compiler will automatically do it that way in the first place. But nonetheless, I can't seem to find any big O function for this type of algorithm.
To sum up: given a positive integer N, and a number Y which is a power of 2, what is the efficiency of an algorithm which divides each of the numbers 1,2,3,...,N by Y?

Checking if there're equal bits in binary string

We have 2 binary strings: X and Y, in 2 different computers. Both of them in length n. The computers can communicate by sending bits to each other.
We have to build randomized algorithm to check whether there's an index i such that Xi=Yi. We can send only O(log n) bits between the computers.
If there's no such index the algorithm will always return "false". If there's such index, the algorithm will return "true" in probability 0.5 (at least).
Any suggestions?
If the bits are independent sending any log(n) will give you the same chance that you 'hit' the equal bits. You will not be able to improve this without any additional information.
To elaborate a bit on Ivaylo's answer:
imagine the two strings are
A = 110110....0....00010
B = 001001....0....11101
both are of some large length n, and Ak = Bk for a single k, somewhere in the middle.
You basically want functions that transform A, or B, such that f(A) and g(B) are O(log n) bit numbers. E.g. sum is such a function.
Say you sum the bits of A, i.e. f = sum. Also let g = sum . xor.
So if A was 110110 0 00010 (12 bits) and B was 001001 0 11101 (12 bits), then f(A) = 5 / 101 (3 bits) and g(B) = 6 / 110 (3 bits). You can compare them and since they are different so you can say "Aha! Then the numbers must share a bit! (there must be i, s.t. Ai = Bi)" and you will be right. However, while this is enough evidence, it is not necessary true when the answer should be true. In other words: there could be i s.t. Ai = Bi, but f(A) = g(B).
Lets look closer to the functions to see why. f(A) really counts how many ones there are in A, g(B) counts how many zeroes there are in B. Assuming that if they are the same then A XOR B = 0, is the same as saying "any number that has as many zeros as there are ones in another number results in 0 when XOR-ed with that other number." Which is false: 100 and 110 fulfill the condition but 100 XOR 110 is 010.
Now you can say: "Well, we just need to pick better f and g." However, the reason sum didn't work is fundamental and you cannot get away from it: f and g are hash-functions, or in maths language - surjective functions. The domain has size of O(n) bits or O(2^n) elements, while the codomain (target set) has size of O(log n) bits or O(2 ^ (log n)) = O(n) elements and O(2^n) > O(n).
Surjective functions cannot be inverted (which is what you actually want). Any time you invert f(A) or g(B) you get one-to-many mapping. If f(A) is 2 and A has 3 bits, then A could be {110, 101, 011}. The size of the inverse image of f(A) would be, on average, O(2^n / n). With no further information the chance of you guessing the value of A is O(n / 2^n) < 0.5 in the general case.
And you have no further information, because if you did, you could incorporate it in f and g, but that would increase the size of their codomain.
I suggest reading up on information theory for further understanding of information loss, entropy, etc.
for stanm (I wrote it as an answer because comment was to long):
It's a correct solution. The full algorithm is:
k = number of 1's in X.
Send k to computer 2.
l = number of 0's in Y.
If k=l computer 2 will answer "no", else "yes" (or 0 and 1).
If there's no index i such that Xi=Yi, so the algorithms will always answer "no" (or 0).
If such index exists. The probability for computer 2 for wrong answer is the probability that computer 2 will get l=k.
The number of all binary strings (length n) that contains k 0's is (n choose k).
The number of all binary strings (length n) is (2^n).
So the probability that computer 2 will fail even though it has to return "yes" is (n choose k)/2^n. You can prove that this number is always less than (or equal to) 1/2.
So finally we can conclude that:
If such an index doesn't exists computer 2 will answer "no". If it exists, so the probability that computer 2 will fail is less than (or equal to) 1/2, and therefore it will answer "yes" in probability more than 1/2.

program that checks if any even number greater than 4 is a sum of two prime numbers

I have the following problem:
Given that the even numbers greater than 4 can be obtained by addition of 2 prime
numbers, I have to write an algorithm which check it. The algorithm should take less time that O(n^2).
For example there is a set of numbers from 6 to n. If we have the number 6 the answer is 6=3+3 and for 22=17+5 and so on.
My first idea:
S - set of n numbers
for i=1 to n {
//removing odd numbers
if (S[i]%2!=0)
continue;
result = false;
for j=2 to S[i]-2{
if (j.isPrime) // prime test can be done in O(log^2(n))
if ((S[i]-j).isPrime)
result = true;
break;
else
continue;
}
if (result == false)
break;
}
Since I use 2 for-loops, the total running time of this algorithm should be
O(n*n)*O(log^2(n)) = O(n^2*log^2(n)) which is not less than O(n^2).
Does anybody have an idea to reduce the running time to get the required time of less than O(n^2)?
If set contains big numbers I've got nothing.
If max(S) < n ^ 2 / log(n) than:
You should preprocess which numbers from interval [1, max(S)] are primes.
For preprocessing you can use sieve of Eratosthenes
Then, you are able to check if number is a prime in O(1) and complexity of your solution becomes O(N^2).
This is Goldbach's conjecture. Primality testing is known to be in P (polynomial time), but the break-even is ridiculously high - in practice, you will not be able to do this in anywhere near O(n^2).
If we assume you only need to deal with relatively small numbers, and can precompute the primes up to a certain limit, you still need to find candidate pairs. The prime counting function gives approximately: n / ln(n) primes, less than (n). Subtracting the candidate prime (p) from (n) gives an odd number (q). If you can look up the primality of (q) with a complexity of: n.ln(n), or better - i.e., an O(1) lookup table for all odd numbers less than the limit - you can achieve O(n^2) or better.
You can run only until square root of N, this sufficient for determine if the number is prime or not.
this will reduce your running time.
also take a look at the following question - Program to find prime numbers

Exponentiation algorithm analysis

Following text is provided about exponentation
We have obvious algorithm to compute X to power N uses N-1 multiplications. A recursive algorithm can do better. N<=1 is the base case of recursion. Otherwise, if n is even, we have xn = xn/2 . xn/2, and if n is
odd, x to power of n = x(n-1)/2 x(n-1)/2 x.
Specifically, a 200-digit number is raised to a large power (usually
another 200-digit number), with only the low 200 or so digits retained
after each multiplication. Since the calculations require dealing with
200-digit numbers, efficiency is obviously important. The
straightforward algorithm for exponentiation would require about 10 to
power of 200 multiplications, whereas recursive algorithm presented
requires only about 1,200.
My questions regarding above text
1. How does author came with 10 to power of 200 multiplications for simple alogorithm and recursive algorithm only about 1, 200? How author came with above numbers
Thanks!
Because complexity of the first algorithm is linear and of the second is logarithmic (due to N).
200-digit number is about 10^200 and log_2(10^200) is about 1,200.
The exponent has 200 digits, thus it is about 10 to power of 200. If using naive exponentiation you'll have to do this amount of multiplications.
On the other hand, if you use the recursive exponentiation, the number of multiplications depends on exponent's number of bits. Since the exponent is almost 10^200, it has log(10^200) = 200*log(10) bits. This is 600, the 2 in there stems from the fact that if you have a 1 bit you'll have to do two multiplications.
Here are the 2 possible algorithms :
algo gives a^N
SimpleExp(a,N):
return a*simpleExp(a,N-1)
so it's N operation, so for a^(10^200) it's 10^200
OptimizedAlgo(a,N):
if N == 0:
return 1
if (N mod 2) == 0:
return OptimizedAlgo(a,N/2)*OptimizedAlgo(a,N/2) // 1 operation
else:
return a*OptimizedAlgo(a,(N-1)/2)*OptimizedAlgo(a,(N-1)/2) //2 operations
here for a^(10^200) you have between log2(N) and 2* log2(N) operations (2^(log2(N) = N )
and log2(10^200) = 200 * log2(10) ~ 664.3856189774724
and 2*log2(10^200) =1328.771237954945
so the number of operations lies between 664 and 1328

Greatest GCD between some numbers

We've got some nonnegative numbers. We want to find the pair with maximum gcd. actually this maximum is more important than the pair!
For example if we have:
2 4 5 15
gcd(2,4)=2
gcd(2,5)=1
gcd(2,15)=1
gcd(4,5)=1
gcd(4,15)=1
gcd(5,15)=5
The answer is 5.
You can use the Euclidean Algorithm to find the GCD of two numbers.
while (b != 0)
{
int m = a % b;
a = b;
b = m;
}
return a;
If you want an alternative to the obvious algorithm, then assuming your numbers are in a bounded range, and you have plenty of memory, you can beat O(N^2) time, N being the number of values:
Create an array of a small integer type, indexes 1 to the max input. O(1)
For each value, increment the count of every element of the index which is a factor of the number (make sure you don't wraparound). O(N).
Starting at the end of the array, scan back until you find a value >= 2. O(1)
That tells you the max gcd, but doesn't tell you which pair produced it. For your example input, the computed array looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 2 1 1 2 0 0 0 0 0 0 0 0 0 1
I don't know whether this is actually any faster for the inputs you have to handle. The constant factors involved are large: the bound on your values and the time to factorise a value within that bound.
You don't have to factorise each value - you could use memoisation and/or a pregenerated list of primes. Which gives me the idea that if you are memoising the factorisation, you don't need the array:
Create an empty set of int, and a best-so-far value 1.
For each input integer:
if it's less than or equal to best-so-far, continue.
check whether it's in the set. If so, best-so-far = max(best-so-far, this-value), continue. If not:
add it to the set
repeat for all of its factors (larger than best-so-far).
Add/lookup in a set could be O(log N), although it depends what data structure you use. Each value has O(f(k)) factors, where k is the max value and I can't remember what the function f is...
The reason that you're finished with a value as soon as you encounter it in the set is that you've found a number which is a common factor of two input values. If you keep factorising, you'll only find smaller such numbers, which are not interesting.
I'm not quite sure what the best way is to repeat for the larger factors. I think in practice you might have to strike a balance: you don't want to do them quite in decreasing order because it's awkward to generate ordered factors, but you also don't want to actually find all the factors.
Even in the realms of O(N^2), you might be able to beat the use of the Euclidean algorithm:
Fully factorise each number, storing it as a sequence of exponents of primes (so for example 2 is {1}, 4 is {2}, 5 is {0, 0, 1}, 15 is {0, 1, 1}). Then you can calculate gcd(a,b) by taking the min value at each index and multiplying them back out. No idea whether this is faster than Euclid on average, but it might be. Obviously it uses a load more memory.
The optimisations I can think of is
1) start with the two biggest numbers since they are likely to have most prime factors and thus likely to have the most shared prime factors (and thus the highest GCD).
2) When calculating the GCDs of other pairs you can stop your Euclidean algorithm loop if you get below your current greatest GCD.
Off the top of my head I can't think of a way that you can work out the greatest GCD of a pair without trying to work out each pair individually (and optimise a bit as above).
Disclaimer: I've never looked at this problem before and the above is off the top of my head. There may be better ways and I may be wrong. I'm happy to discuss my thoughts in more length if anybody wants. :)
There is no O(n log n) solution to this problem in general. In fact, the worst case is O(n^2) in the number of items in the list. Consider the following set of numbers:
2^20 3^13 5^9 7^2*11^4 7^4*11^3
Only the GCD of the last two is greater than 1, but the only way to know that from looking at the GCDs is to try out every pair and notice that one of them is greater than 1.
So you're stuck with the boring brute-force try-every-pair approach, perhaps with a couple of clever optimizations to avoid doing needless work when you've already found a large GCD (while making sure that you don't miss anything).
With some constraints, e.g the numbers in the array are within a given range, say 1-1e7, it is doable in O(NlogN) / O(MAX * logMAX), where MAX is the maximum possible value in A.
Inspired from the sieve algorithm, and came across it in a Hackerrank Challenge -- there it is done for two arrays. Check their editorial.
find min(A) and max(A) - O(N)
create a binary mask, to mark which elements of A appear in the given range, for O(1) lookup; O(N) to build; O(MAX_RANGE) storage.
for every number a in the range (min(A), max(A)):
for aa = a; aa < max(A); aa += a:
if aa in A, increment a counter for aa, and compare it to current max_gcd, if counter >= 2 (i.e, you have two numbers divisible by aa);
store top two candidates for each GCD candidate.
could also ignore elements which are less than current max_gcd;
Previous answer:
Still O(N^2) -- sort the array; should eliminate some of the unnecessary comparisons;
max_gcd = 1
# assuming you want pairs of distinct elements.
sort(a) # assume in place
for ii = n - 1: -1 : 0 do
if a[ii] <= max_gcd
break
for jj = ii - 1 : -1 :0 do
if a[jj] <= max_gcd
break
current_gcd = GCD(a[ii], a[jj])
if current_gcd > max_gcd:
max_gcd = current_gcd
This should save some unnecessary computation.
There is a solution that would take O(n):
Let our numbers be a_i. First, calculate m=a_0*a_1*a_2*.... For each number a_i, calculate gcd(m/a_i, a_i). The number you are looking for is the maximum of these values.
I haven't proved that this is always true, but in your example, it works:
m=2*4*5*15=600,
max(gcd(m/2,2), gcd(m/4,4), gcd(m/5,5), gcd(m/15,15))=max(2, 2, 5, 5)=5
NOTE: This is not correct. If the number a_i has a factor p_j repeated twice, and if two other numbers also contain this factor, p_j, then you get the incorrect result p_j^2 insted of p_j. For example, for the set 3, 5, 15, 25, you get 25 as the answer instead of 5.
However, you can still use this to quickly filter out numbers. For example, in the above case, once you determine the 25, you can first do the exhaustive search for a_3=25 with gcd(a_3, a_i) to find the real maximum, 5, then filter out gcd(m/a_i, a_i), i!=3 which are less than or equal to 5 (in the example above, this filters out all others).
Added for clarification and justification:
To see why this should work, note that gcd(a_i, a_j) divides gcd(m/a_i, a_i) for all j!=i.
Let's call gcd(m/a_i, a_i) as g_i, and max(gcd(a_i, a_j),j=1..n, j!=i) as r_i. What I say above is g_i=x_i*r_i, and x_i is an integer. It is obvious that r_i <= g_i, so in n gcd operations, we get an upper bound for r_i for all i.
The above claim is not very obvious. Let's examine it a bit deeper to see why it is true: the gcd of a_i and a_j is the product of all prime factors that appear in both a_i and a_j (by definition). Now, multiply a_j with another number, b. The gcd of a_i and b*a_j is either equal to gcd(a_i, a_j), or is a multiple of it, because b*a_j contains all prime factors of a_j, and some more prime factors contributed by b, which may also be included in the factorization of a_i. In fact, gcd(a_i, b*a_j)=gcd(a_i/gcd(a_i, a_j), b)*gcd(a_i, a_j), I think. But I can't see a way to make use of this. :)
Anyhow, in our construction, m/a_i is simply a shortcut to calculate the product of all a_j, where j=1..1, j!=i. As a result, gcd(m/a_i, a_i) contains all gcd(a_i, a_j) as a factor. So, obviously, the maximum of these individual gcd results will divide g_i.
Now, the largest g_i is of particular interest to us: it is either the maximum gcd itself (if x_i is 1), or a good candidate for being one. To do that, we do another n-1 gcd operations, and calculate r_i explicitly. Then, we drop all g_j less than or equal to r_i as candidates. If we don't have any other candidate left, we are done. If not, we pick up the next largest g_k, and calculate r_k. If r_k <= r_i, we drop g_k, and repeat with another g_k'. If r_k > r_i, we filter out remaining g_j <= r_k, and repeat.
I think it is possible to construct a number set that will make this algorithm run in O(n^2) (if we fail to filter out anything), but on random number sets, I think it will quickly get rid of large chunks of candidates.
pseudocode
function getGcdMax(array[])
arrayUB=upperbound(array)
if (arrayUB<1)
error
pointerA=0
pointerB=1
gcdMax=0
do
gcdMax=MAX(gcdMax,gcd(array[pointera],array[pointerb]))
pointerB++
if (pointerB>arrayUB)
pointerA++
pointerB=pointerA+1
until (pointerB>arrayUB)
return gcdMax

Resources