Convert a very large number from decimal string to binary representation? [closed] - algorithm

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have a very big number, on the order of a thousand decimal digits, and I have to convert this to its binary representation. The numbers are stored as strings.
Since few languages have a basic data type to handle numbers this big, I see no easy way to turn this into an integral value for which I could convert it.
Could someone please help me out here? What would be a viable approach for doing this?

If this is a genuine problem, there are plenty of BigNum libraries out there to assist, such as the MPIR library.
If it's something where you can't use a third-party library, it's still relatively easy. You don't actually need a complex BigNum library for this, you only need one operation: divide by two.
Here's how you do it. Start with an empty stack of binary digits. Then loop until the number is "0" (yes, that's still a string). If the last digit of the number is odd, push 1 on to the stack, otherwise push 0. Then divide the number by two and restart the loop.
Once the loop is finished (number is "0"), pop the digits off the stack one at a time and print them. There you go.
Oh, yeah, the divide-by-two, that is a rather important piece of the puzzle :-)
Let's start with "12345". Here's the process you follow, in pseudo-code.
Set next_additive to 0.
For every digit in number (starting at the left):
Set additive to next_additive.
If the digit is odd, set next_additive to 5, else set it to 0.
Divide the digit by two (truncating) then add additive.
Remove leading zero if necessary (if it starts with 0 but is not just 0).
This can be done by processing the actual string one character at a time.
Starting with 1 (from 12345), additive is 0, number is odd, so next_additive is 5. Divide 1 by 2 and add additive of 0, you get 0: 02345.
Next digit 2, additive is 5, number is even, so next_additive is 0. Divide 2 by 2 and add additive of 5, you get 6: 06345.
Next digit 3, additive is 0, number is odd, so next_additive is 5. Divide 3 by 2 and add additive of 0, you get 1: 06145.
Next digit 4, additive is 5, number is even, so next_additive is 0. Divide 4 by 2 and add additive of 5, you get 7: 06175.
Next digit 5, additive is 0, number is odd, so next_additive is 5. Divide 5 by 2 and add additive of 0, you get 2: 06172.
Strip off leading zeros: 6172. Ignore the next additive since you're truncating the result.
And there you have it: 12345 / 2 = 6172.
By way of example, here's a Python approach to implementing this algorithm as follows. First the support routine for checking if a string-number is odd (keep in mind this isn't meant to be Pythonic code, it's just to show how it could be done - there's almost certainly better ways to do this in Python but that won't necessarily map well to another language):
def oddsToOne(s):
if s.endswith('1'): return 1
if s.endswith('3'): return 1
if s.endswith('5'): return 1
if s.endswith('7'): return 1
if s.endswith('9'): return 1
return 0
Then another support routine for dividing a string-number by two:
def divByTwo(s):
new_s = ''
add = 0
for ch in s:
new_dgt = (ord(ch) - ord('0')) // 2 + add
new_s = '%s%d' % (new_s, new_dgt)
add = oddsToOne(ch) * 5
if new_s != '0' and new_s.startswith('0'):
new_s = new_s[1:]
return new_s
And, finally, some actual code to make a binary string from the decimal string:
num = '12345'
if num == '0':
stack = '0'
else:
stack = ''
while num != '0':
stack = '%d%s'%(oddsToOne(num), stack)
num = divByTwo (num)
print(stack)
Note that if you wanted to actually use this to populate real bits (rather than make a string of bits), it's a simple matter to change what happens in the if and else clauses.
As stated, it's probably not the most efficient or beautiful Python code you could come up with but it's simply meant to show the process, not be some well-engineered production-ready piece of code. The output is (with some added stuff below to show what's going on):
12345
11000000111001
|| ||| |
|| ||| +- 1
|| ||+---- 8
|| |+----- 16
|| +------ 32
|+------------- 4096
+-------------- 8192
=====
12345
Because this works on the string representation of the numbers, there is no arbitrary numeric limit such as the size of a 64-bit integer. Some example values are (slightly reformatted into 32-digit chunks for readability):
123456781234567812345678
=> 11010001001001001101100000011011
01110110001110110010110110101111
0111101001110
99999999999999999999999999999999
99999999999999999999999999999999
99999999999999999999999999999999
9999
=> 10010010010011010110100100101100
10100110000110111110011101011000
01011001001111000010011000100110
01110000010111111001110001010110
01110010000001000111000100001000
11010011111001010101010110010010
00011000010001010100000101110100
01111000011111111111111111111111
11111111111111111111111111111111
11111111111111111111111111111111
1111111111111

Related

Check if number is multiple of 5 in most efficient way

Info
Hi everyone
I was searching an efficient way to check if a number is multiple of 5. So I searched on google and found this solution on geeksforgeeks.org.
There were 3 solutions of my problem.
First solution was to subtract 5 until reaching zero,
Second solution was to convert the number to string and check last character to be 5 or 0,
Third solution was by doing some interesting operations on bitwise level.
I'm interested in third solution as I can fully understand the first and the second.
Here's the code from geeksforgeeks.
bool isMultipleof5(int n)
{
// If n is a multiple of 5 then we
// make sure that last digit of n is 0
if ( (n & 1) == 1 )
n <<= 1;
float x = n;
x = ( (int)(x * 0.1) ) * 10;
// If last digit of n is 0 then n
// will be equal to (int)x
if ( (int)x == n )
return true;
return false;
}
I understand only some parts of the logic. I haven't even tested this code. So I need to understand it to use freely.
As said in mentioned article this function is multiplying number by 2 if last bit is set and then checking last bit to be 0 and returns true in that case. But after checking binary representations of numbers I got confused as last bit is 1 in case of any odd number and last bit is 0 in case of any even number. So...
Actual question is
What's the logic of this function?
Any answer is appreciated!
Thanks for all!
The most straightforward way to check if a number is a multiple of 5 is to simply
if (n % 5 == 0) {
// logic...
}
What the bit manipulation code does is:
If the number is odd, multiply it by two. Notice that for multiples of 5, the ones digit will end in either 0 or 5, and doubling the number will make it end in 0.
We create a number x that is set to n, but with a ones digit set to 0. (We do this by multiplying n by 0.1, which removes the ones digit, and then multiply by 10 in order to add a 0, which has a total effect of just changing the ones digit to 0).
We know that originally, if n was a multiple of 5, it would have a ones digit of 0 after step 1. So we check if x is equal to it, and if so, then we can say n was a multiple of 5.

Prove XOR doesn't work for finding a missing number (interview question)?

Interview question: you're given a file of roughly one billion unique numbers, each of which is a 32-bit quantity. Find a number not in the file.
When I was approaching this question, I tried a few examples with 3-bit and 4-bit numbers. For the examples I tried, I found that when I XOR'd the set of numbers, I got a correct answer:
a = [0,1,2] # missing 3
b = [1,2,3] # missing 0
c = [0,1,2,3,4,5,6] # missing 7
d = [0,1,2,3,5,6,7] # missing 4
functools.reduce((lambda x, y: x^y), a) # returns 3
functools.reduce((lambda x, y: x^y), b) # returns 0
functools.reduce((lambda x, y: x^y), c) # returns 7
functools.reduce((lambda x, y: x^y), d) # returns 4
However, when I coded this up and submitted it, it failed the test cases.
My question is: in an interview setting, how can I confirm or rule out with certainty that an approach like this is not a viable solution?
In all your examples, the array is missing exactly one number. That's why XOR worked. Try not to test with the same property.
For the problem itself, you can construct a number by taking the minority of each bit.
EDIT
Why XOR worked on your examples:
When you take the XOR for all the numbers from 0 to 2^n - 1 the result is 0 (there are exactly 2^(n-1) '1' in each bit). So if you take out one number and take XOR of all the rest, the result is the number you took out because taking XOR of that number with the result of all the rest needs to be 0.
Assuming a 64-bit system with more than 4gb free memory, I would read the numbers into an array of 32-bit integers. Then I would loop through the numbers up to 32 times.
Similarly to an inverse ”Mastermind” game, I would construct a missing number bit-by-bit. In every loop, I count all numbers which match the bits, I have chosen so far and a subsequent 0 or 1. Then I add the bit which occurs less frequently. Once the count reaches zero, I have a missing number.
Example:
The numbers in decimal/binary are
1 = 01
2 = 10
3 = 11
There is one number with most-significant-bit 0 and two numbers with 1. Therefore, I take 0 as most significant bit.
In the next round, I have to match 00 and 01. This immediately leads to 00 as missing number.
Another approach would be to use a random number generator. Chances are 50% that you find a non-existing number as first guess.
Proof by counterexample: 3^4^5^6=4.

Quick way to compute n-th sequence of bits of size b with k bits set?

I want to develop a way to be able to represent all combinations of b bits with k bits set (equal to 1). It needs to be a way that given an index, can get quickly the binary sequence related, and the other way around too. For instance, the tradicional approach which I thought would be to generate the numbers in order, like:
For b=4 and k=2:
0- 0011
1- 0101
2- 0110
3- 1001
4-1010
5-1100
If I am given the sequence '1010', I want to be able to quickly generate the number 4 as a response, and if I give the number 4, I want to be able to quickly generate the sequence '1010'. However I can't figure out a way to do these things without having to generate all the sequences that come before (or after).
It is not necessary to generate the sequences in that order, you could do 0-1001, 1-0110, 2-0011 and so on, but there has to be no repetition between 0 and the (combination of b choose k) - 1 and all sequences have to be represented.
How would you approach this? Is there a better algorithm than the one I'm using?
pkpnd's suggestion is on the right track, essentially process one digit at a time and if it's a 1, count the number of options that exist below it via standard combinatorics.
nCr() can be replaced by a table precomputation requiring O(n^2) storage/time. There may be another property you can exploit to reduce the number of nCr's you need to store by leveraging the absorption property along with the standard recursive formula.
Even with 1000's of bits, that table shouldn't be intractably large. Storing the answer also shouldn't be too bad, as 2^1000 is ~300 digits. If you meant hundreds of thousands, then that would be a different question. :)
import math
def nCr(n,r):
return math.factorial(n) // math.factorial(r) // math.factorial(n-r)
def get_index(value):
b = len(value)
k = sum(c == '1' for c in value)
count = 0
for digit in value:
b -= 1
if digit == '1':
if b >= k:
count += nCr(b, k)
k -= 1
return count
print(get_index('0011')) # 0
print(get_index('0101')) # 1
print(get_index('0110')) # 2
print(get_index('1001')) # 3
print(get_index('1010')) # 4
print(get_index('1100')) # 5
Nice question, btw.

Finding all n digit binary numbers with r adjacent digits as 1 [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Let me explain with an example. If n=4 and r=2 that means all 4 digit binary numbers such that two adjacent digits can be 1. so the answer is 0011 0110 1011 1100 1101
Q. i am unable to figure out a pattern or an algorithm.
Hint: The 11 can start in position 0, 1, or 2. On either side, the digit must be zero, so the only "free" digits are in the remaining position and can cycle through all possible values.
For example, if there are n=10 digits and you're looking for r=3 adjacent ones, the pattern is
x01110y
Where x and y can cycle through all possible suffixes and prefixes for the remaining five free digits. Note, on the sides, the leading and trailing zero gets dropped, leaving six free digits in x0111 and 1110y.
Here's an example using Python:
from itertools import product
def gen(n, r):
'Generate all n-length sequences with r fixed adjacent ones'
result = set()
fixed = tuple([1] * r + [0])
for suffix in product([0,1], repeat=n-r-1):
result.add(fixed + suffix)
fixed = tuple([0] + [1] * r + [0])
rem = n - r - 2
for leadsize in range(1, rem):
for digits in product([0,1], repeat=rem):
result.add(digits[:leadsize] + fixed + digits[leadsize:])
fixed = tuple([0] + [1] * r)
for prefix in product([0,1], repeat=n-r-1):
result.add(prefix + fixed)
return sorted(result)
I would start with simplifying the problem. Once you have a solution for the simplest case, generalize it and then try to optimize it.
First design an algorithm that will find out if a given number has 'r' adjacent 1s. Once you have it, the brute-force way is to go through all the numbers with 'n' digits, checking each with the algorithm you just developed.
Now, you can look for optimizing it. For example: if you know whether 'r' is even or odd, you can reduce your set of numbers to look at. The counting 1's algorithm given by KNR is order of number of set bits. Thus, you rule out half of the cases with lesser complexity then actual bit by bit comparison. There might be a better way to reduce this as well.
Funny problem with very simple recursive solution. Delphi.
procedure GenerateNLengthWithROnesTogether(s: string;
N, R, Len, OnesInRow: Integer; HasPatternAlready: Boolean);
begin
if Len = N then
Output(s)
else
begin
HasPatternAlready := HasPatternAlready or (OnesInRow >= R);
if HasPatternAlready or (N - Len > R) //there is chance to make pattern}
then
GenerateNLengthWithROnesTogether('0' + s, N, R, Len + 1, 0, HasPatternAlready);
if (not HasPatternAlready) or (OnesInRow < R - 1) //only one pattern allowed
then
GenerateNLengthWithROnesTogether('1' + s, N, R, Len + 1, OnesInRow + 1, HasPatternAlready);
end;
end;
begin
GenerateNLengthWithROnesTogether('', 5, 2, 0, 0, False);
end;
program output:
N=5,R=2
11000 01100 11010 00110
10110 11001 01101 00011
10011 01011
N=7, R=3
1110000 0111000 1110100 0011100
1011100 1110010 0111010 1110110
0001110 1001110 0101110 1101110
1110001 0111001 1110101 0011101
1011101 1110011 0111011 0000111
1000111 0100111 1100111 0010111
1010111 0110111
As I've stated in the comment above, I am still unclear about the full restrictions of the output set. However, the algorithm below can be refined to cover your final case.
Before I can describe the algorithm, there is an observation: let S be 1 repeated m times, and D be the set of all possible suffixes we can use to generate valid outputs. So, the bit string S0D0 (S followed by the 0 bit, followed by the bit string D followed by the 0 bit) is a valid output for the algorithm. Also, all strings ror(S0D0, k), 0<=k<=n-m are valid outputs (ror is the rotate right function, where bits that disappear on the right side come in from left). These will generate the bit strings S0D0 to 0D0S. In addition to these rotations, the solutions S0D1 and 1D0S are valid bit strings that can be generated by the pair (S, D).
So, the algorithm is simply enumerating all valid D bit strings, and generating the above set for each (S, D) pair. If you allow more than m 1s together in the D part, it is simple bit enumeration. If not, it is a recursive definition, where D is the set of outputs of the same algorithm with n'=n-(m+2) and m' is each of {m, m-1, ..., 1}.
Of course, this algorithm will generate some duplicates. The cases I can think of are when ror(S0D0,k) matches one of the patterns S0E0, S0E1 or 1E0S. For the first case, you can stop generating more outputs for larger k values. D=E generator will take care of those. You can also simply drop the other two cases, but you need to continue rotating.
I know there is an answer, but I wanted to see the algorithm at work, so I implemented a crude version. It turned out to have more edge cases than I realized. I haven't added duplication check for the two last yields of the family() function, which causes duplication for outputs like 11011, but the majority of them are eliminated.
def ror(str, n):
return str[-n:]+str[:-n]
def family(s, d, r):
root = s + '0' + d + '0'
yield root # root is always a solution
for i in range(1, len(d)+3):
sol=ror(root, i)
if sol[:r]==s and sol[r]=='0' and sol[-1]=='0':
break
yield sol
if d[-r:]!=s: # Make sure output is valid
yield s + '0' + d + '1'
if d[:r]!=s: # Make sure output is valid (todo: duplicate check)
yield '1' + d + '0' + s
def generate(n, r):
s="1"*r
if r==0: # no 1's allowed
yield '0'*n
elif n==r: # only one combination
yield s
elif n==r+1: # two cases. Cannot use family() for this
yield s+'0'
yield '0'+s
else:
# generate all sub-problem outputs
for rr in range(r+1):
if n-r-2>=rr:
for d in generate(n-r-2, rr):
for sol in family(s, d, r):
yield sol
You use it either as [s for s in generate(6,2)], or in a loop as
for s in generate(6,3):
print(s)

String to Number and back algorithm

This is a hard one (for me) I hope people can help me. I have some text and I need to transfer it to a number, but it has to be unique just as the text is unique.
For example:
The word 'kitty' could produce 12432, but only the word kitty produces that number. The text could be anything and a proper number should be given.
One problem the result integer must me a 32-bit unsigned integer, that means the largest possible number is 2147483647. I don't mind if there is a text length restriction, but I hope it can be as large as possible.
My attempts. You have the letters A-Z and 0-9 so one character can have a number between 1-36. But if A = 1 and B = 2 and the text is A(1)B(2) and you add it you will get the result of 3, the problem is the text BA produces the same result, so this algoritm won't work.
Any ideas to point me in the right direction or is it impossible to do?
Your idea is generally sane, only needs to be developed a little.
Let f(c) be a function converting character c to a unique number in range [0..M-1]. Then you can calculate result number for the whole string like this.
f(s[0]) + f(s[1])*M + f(s[2])*M^2 + ... + f(s[n])*M^n
You can easily prove that number will be unique for particular string (and you can get string back from the number).
Obviously, you can't use very long strings here (up to 6 characters for your case), as 36^n grows fast.
Imagine you were trying to store Strings from the character set "0-9" only in a number (the equivalent of obtaining a number of a string of digits). What would you do?
Char 9 8 7 6 5 4 3 2 1 0
Str 0 5 2 1 2 5 4 1 2 6
Num = 6 * 10^0 + 2 * 10^1 + 1 * 10^2...
Apply the same thing to your characters.
Char 5 4 3 2 1 0
Str A B C D E F
L = 36
C(I): transforms character to number: C(0)=0, C(A)=10, C(B)=11, ...
Num = C(F) * L ^ 0 + C(E) * L ^ 1 + ...
Build a dictionary out of words mapped to unique numbers and use that, that's the best you can do.
I doubt there are more than 2^32 number of words in use, but this is not the problem you're facing, the problem is that you need to map numbers back to words.
If you were only mapping words over to numbers, some hash algorithm might work, although you'd have to work a bit to guarantee that you have one that won't produce collisions.
However, for numbers back to words, that's quite a different problem, and the easiest solution to this is to just build a dictionary and map both ways.
In other words:
AARDUANI = 0
AARDVARK = 1
...
If you want to map numbers to base 26 characters, you can only store 6 characters (or 5 or 7 if I miscalculated), but not 12 and certainly not 20.
Unless you only count actual words, and they don't follow any good countable rules. The only way to do that is to just put all the words in a long list, and start assigning numbers from the start.
If it's correctly spelled text in some language, you can have a number for each word. However you'd need to consider all possible plurals, place and people names etc. which is generally impossible. What sort of text are we talking about? There's usually going to be some existing words that can't be coded in 32 bits in any way without prior knowledge of them.
Can you build a list of words as you go along? Just give the first word you see the number 1, second number 2 and check if a word has a number already or it needs a new one. Then save your newly created dictionary somewhere. This would likely be the only workable solution if you require 100% reliable, reversible mapping from the numbers back to original words given new unknown text that doesn't follow any known pattern.
With 64 bits and a sufficiently good hash like MD5 it's extremely unlikely to have collisions, but for 32 bits it doesn't seem likely that a safe hash would exist.
Just treat each character as a digit in base 36, and calculate the decimal equivalent?
So:
'A' = 0
'B' = 1
[...]
'Z' = 25
'0' = 26
[...]
'9' = 35
'AA' = 36
'AB' = 37
[...]
'CAB' = 46657

Resources