An efficient algorithm to compute the number of '1' bit in a long decimal integer that is represented in string? - algorithm

I came across this interesting question today. (Note that this is not for my homework or interview, etc.)
Given a decimal number that is represented in string, we want to compute the number of '1' bits for the large number in binary format. Here the string can have thousands of characters, and cannot be represented with one int or long long variable.
For example, countBits("10") = 2 as '10' in decimal can be represented as '1010' in binary format. Similarly, we have countBits("12") = 2, countBits("7") = 3
What is an efficiently algorithm for this? One possible solution is to convert the decimal string to another string in the binary format, and count the '1's. Can we do better?

When converting from a decimal representation to and integer, the *n*th digit from the end of the string represents the number of 1010n ( one zero base ten to the power of n ) that is added to total the integer value. If you then want to represent that integer in binary, you have to raise 1010 which is 10102 to that power and multiply that value by the digit's value.
Because one of the factors of the base you are translating from, 5, is relatively prime compared to 2, the powers of 1010 have increasing long representations in base 2 - 12, 10102, 11001002, 11111010002.
Note that these powers have trailing zeros ( 1010 = 2 × 5 and 2 is not relatively prime with the base we are translating into ), so will only effect 1, 3, 5, and 7 bits of the answer instead of all 1, 4, 7, 10 bits. But the number of bits they effect will still vary with O(N) where N is the length of the input, so to calculate the effected bits will take O(N2) operations.
If the base you were translating from did not have factors where were relatively prime to the base you are translating to - say translating base 16 to base 2 or base 9 to base 3 and counting non-zero digits, then there would be a O(N) algorithm as the sum of non-zero digits in the target base would equal the sum for each digit in the input translated individually, but since that is not the case then you are stuck at an O(N2) algorithm where you translate the decimal representation into binary and then count the bits in the binary representation.

You convert it to binary and use Hamming weight algorithm.
How it works? Suppose you have the number 8, which is 00001000.
The algorithm takes chunks of 2 bits, so it'll have 00 00 10 00.
Now it'll sum each two bits (by having a mask 10101010, multiplying and shifting), which will result: 00 00 01 00.
Now it does the same for each 4 bits (by having a mask 00110011..), so it'll have 0000 1000. After adding each side, you'll have 0000 00001.
The last stage is adding the two numbers, 0 + 1, which is 1 and that's the final result.

Related

Why this exponent got calculated in this way at this example?

Number: 0.1101112 × 2^–3 (the first bit is included in this example in the mantissa)
where 8 bits are used for the characteristic, and the exponent bias is
2^7 – 1
Their solution:
The sign bit is 0. The characteristic is –3 + 2^7 – 1, represented as an 8-
bit binary number. The simplest way to calculate the characteristic
here is to find the 7-bit 2’s complement of the binary representation
of 4 (= 3 + 1), and adjoin a leading zero:
Binary representation of 4: 0000100
2’s complement: 1111100
Characteristic: 0111 1100
Why: my solution was get the 8-bit instead of the 7-bit complement
1111 1100 then add it to 128 8-bit representation 1000 0000
Which get me 1 0111 1100 then ignoring the ninth column I got the same answer,
but i did not get the approach of the author.
Your explanation is highly appreciated
Thanks
The idea behind the original approach is to rewrite the expression
–3 + 2^7 – 1
as
2^7 - 4
The lower seven bits of this expression are the 7-bit two's complement of 4 (i.e. the representation of -4 in 7 bits). Since the number is obviously in the range 0-127, then eighth bit must be zero.

Minimum bits required on a chess board

This is an interview question:
What is the minimum representation in bits of two positions on an 8x8 chessboard?
I found the answer http://www.careercup.com/question?id=4981467352399872
But I am unable to understand what the author is trying to convey when she says:
You can represent 2^n values with n bits. However, you can represent
2^n + 2^(n-1) + 2^(n-2) + ... 1 = 2^(n+1) - 1 values with atmost n
bits. So you can represent 2^11 - 1 = 2047 different values using just
10 bits.
I am not seeking an explanation of what the author is suggesting in his answer, but I am more interested in solving the problem itself. As far as I can think, since there are 64C2 = 2016 ways to represent two pieces on an 8x8 board, the minimum number of bits required should be 11. But someone suggested that one can use just 10 bits to represent the board. How?
The author is saying that you can represent the positions using 5, 6, 7, 8, 9 and 10 bit values.
In binary 2016 is 11111100000 (1024 + 512+ 256 + 128 + 64 + 32)
5 bits (00000 - 11111) represent 32 positions
6 bits (000000 - 111111) represent 64 positions
7 bits (0000000 - 1111111) represent 128 positions
8 bits (00000000 - 11111111) represent 256 positions
9 bits (000000000 - 111111111) represent 512 positions
10 bits (0000000000 - 1111111111) represent 1024 positions
A total of 2016 positions.
This could be implemented in languages with bit collections, e.g. C++ bitset, which has a size function to get the length.
Here's an example for a 2x2 board which will hopefully explain this better.
For a 2x2 board, there are 4C2 (6) positions
.x x. .. xx .x x.
.x x. xx .. x. .x
so you could use 3 bits 000, 001, 010, 011, 100, 101 and 110
But 6 is binary 110 (4+2) so you could use 1 bit (0-1) for 2 of the positions and 2 bits (00, 01, 10, 11) for the remaining 4. So the positions are:
0, 1, 00, 01, 10, 11.
To answer the question and receive an integer solution you must evaluation the following equation:
bits = ceil(log2(combination(64,2)));
bits = ceil(log2(64!/(62!*2!)));
bits = ceil(log2(64*63/2));
bits = ceil(log2(32*63));
bits = ceil(log2(32)+log2(63));
bits = ceil(5+log2(63));
bits = ceil(5+5.97728);
bits = 11;
Deriving the equation requires a working knowledge of combinatorics.
combination(64,2) represents the number of ways to choose 2 of 64 possible unique spaces.

Algorithm in hardware to find out if number is divisible by five

I am trying to think of an algorithm to implement this for a given n bit binary number. I tried out many examples, but am unable to find out any pattern. So how shall I proceed?
How about this:
Convert the number to base 4 (this is trivial by simply combining pairs of bits). 5 in base 4 is 11. The values base 4 that are divisible by 11 are somewhat familiar: 11, 22, 33, 110, 121, 132, 203, ...
The rule for divisibility by 11 is that you add all the odd digits and all the even digits and subtract one from the other. If the result is divisible by 11 (which remember is 5), then it's divisible by 11 (which remember is 5).
For example:
123456d = 1 1110 0010 0100 0000b = 132021000_4
The even digits are 1 2 2 0 0 : sum = 5d
The odd digits are 3 0 1 0 : sum = 4d
Difference is 1, which is not divisble by 5
Or another one:
123455d = 1 1110 0010 0011 1111b = 132020333_4
The even digits are 1 2 2 3 3 : sum = 11d
The odd digits are 3 0 0 3 : sum = 6d
Difference is 5, which is a 5 or a 0
This should have a fairly efficient HW implementation because it's mostly bit-slicing, followed by N/2 adders, where N is the number of bits in the number you're interested in.
Note that after adding the digits and subtracting, the maximum value is 3/4 * N, so if you have 16-bit numbers max, you can get at most 12 as a result, so you only need to check for 0, ±5 and ±10 explicitly. If you're using 32-bit numbers then you can get at most 24 as a result, so you need to also check if the result is ±15 or ±20.
Make a Deterministic Finite Automaton (DFA) to implement the divisibility check and implement the DFA in hardware.
Creating a DFA for divisibility by 5 is easy. You just need to notice the remainders and check what 2r (mod 5) and 2r + 1(mod 5) map to. There are many websites that discuss this. For example this one.
There are well-known examples to convert DFA to a hardware representation as well.
Well , I just figured out ...
number mod 5 = a0 * 2^0 mod 5 + a1 * 2^1 mod 5 +a2* 2^2 mod 5 + a3 * 2^3 mod 5 + a4 * 2^4 mod 5 + ....
= a0 (1) + a1(2) +a2 (-1) +a3 (-2) +a4 (1) repeats ...
Hence difference of odd digits + 2 times difference of even digits = divisible by 5
for example ... consider 110010
odd digits differnce = 0-0+1 = 1 or 01
even digits difference = 1-0+1 = 2 or 10
difference of odd digits + 2 times difference of even digits = 01 + 2*(10)=01 + 100 = 101 is divisible by 5 .
The contribution of each bit toward being divisible by five is a four bit pattern 3421.
You could shift through any binary number 4 bits at a time adding the corresponding value for positive bits.
Example:
100011
take 0011
apply the pattern 0021
sum 3
next four bits 0010
apply the pattern 0020
sum = 5
We can design a Deterministic Finite Automaton (DFA) for the same. The DFA, then can be implemented in Hardware. This is similar to this answer.
We will simulate a Deterministic Finite Automaton (DFA) that accepts Binary Representation of Integers which are divisible by 5
Now, by accept, we mean that when we are done with scanning string, we should be in one of the multiple possible Final States.
Approach to Design DFA : Essentially, we need to divide the Binary Representation of Integer by 5, and track the remainder. If after consuming/scanning [From Left to Right] the entire string, remainder is Zero, then we should end up in Final State, and if remainder isn't zero we should be in Non-Final States.
Now, DFA is defined by Quintuple/5-Tuple (Q,q₀,F,Σ,δ). We will obtain these five components step-by-step.
Q : Finite Set of States
We need to track remainder. On dividing any integer by 5, we can get remainder as 0,1, 2, 3 or 4. Hence, we will have Five States Z, O, T, Th and F for each possible remainder.
Q={Z, O, T, Th, F}
If after scanning certain part of Binary String, we are in state Z, this means that integer defined from Left to this part will give remainder Zero when divided by 5. Similarly, O for remainder One, and so on.
Now, we can write these three states by Euclidean Division Algorithm as
Z : 5m
O : 5m+1
T : 5m+2
Th : 5m+3
F : 5m+4
where m is Integer.
q₀ : an initial/start state from set Q
Now, start state can be thought in terms of empty string (ɛ). An ɛ directly gets into q₀.
What remainder does ɛ gives when divided by 5?
We can append as many 0s in left hand side of a Binary Number. In the similar fashion, we can append ɛ in left hand side of a Binary String. Thus, ɛ in left can be thought of as 0. And 0 when divided by 5 gives remainder 0. Hence, ɛ should end in State Z. But ɛ ends up in q₀.
Thus, q₀=Z
F : a set of accept states
Now we want all strings which are divisible by 5, or which gives remainder 0 when divided by 5, or which after complete scanning should end up in state Z, and gets accepted.
Hence,
F={Z}
Σ : Alphabet (a finite set of input symbols)
Since we are scanning/reading a Binary String. Hence,
Σ={0,1}
δ : Transition Function (δ : Q × Σ → Q)
Now this δ tells us that if we are in state x (in Q) and next input to be scanned is y (in Σ), then at which state z (in Q) should we go.
If the string upto this point gives remainder 3/Th when divided by 5, and if we append 1 to string, then what remainder will resultant string give.
Now, this can be analyzed by observing how magnitude of a binary string changes on appending 0 and 1.
a.
In Decimal (Base-10), if we add/append 0, then magnitude gets multiplied by 10 . 53, on appending 0 it becomes 530
Also, if we append 8 to decimal, then Magnitude gets multiplied by 10, and then we add 8 to multiplied magnitude.
b.
In Binary (Base-2), if we add/append 0, then magnitude gets multiplied by 2 (The Positional Weight of each Bit get multiplied by 2)
Example : (1010)2 [which is (10)10], on appending 0 it becomes (10100)2 [which is (20)10]
Similarly, In Binary, if we append 1, then Magnitude gets multiplied by 2, and then we add 1.
Example : (10)2 [which is (2)10], on appending 1 it becomes (101)2 [which is (5)10]
Thus, we can say that for Binary String x,
x0=2|x|
x1=2|x|+1
We will use these relation to analyze Five States
Any string in Z can be written as 5m
- On 0, it becomes 2(5m), which is 5(2m), nothing but state Z.
- On 1, it becomes 2(5m)+1, which is 5(2m)+1, that is O. [This can be read as if a Binary String is presently divisible by 5, and we append 1, then resultant string will give remainder as 1]
Any string in O can be written as 5m+1
- On 0, it becomes 2(5m+1) = 10m+2, which is 5(2m)+2, state T.
- On 1, it becomes 2(5m+1)+1 = 10m+3, which is 5(2m)+3, that is state Th.
Any string in T can be written as 5m+2
- On 0, it becomes 2(5m+2) = 10m+4, which is 5(2m)+4, state F.
- On 1, it becomes 2(5m+2)+1 = 10m+5, which is 5(2m+1), state Z. [If m is integer, so is (2m+1)]
Any string in Th can be written as 5m+3
- On 0, it becomes 2(5m+3) = 10m+6, which is 5(2m+1)+1, state V.
- On 1, it becomes 2(5m+3)+1 = 10m+7, which is 5(2m+1)+2, that is state T.
Any string in F can be written as 5m+4
- On 0, it becomes 2(5m+4) = 10m+8, which is 5(2m+1)+3, state Th.
- On 1, it becomes 2(5m+4)+1 = 10m+9, which is 5(2m+1)+4, that is state F.
Hence, the final DFA combining Everything (creating using Tool)
We can even write code [in High Level Language] for the same. But it would go beyond main aim of this question. If readers wish to see the same, they can check here.
As any assignment this would have been an answer for is bound to be way overdue a year later:
in the binary representation of a natural divisible by five the parities of bits 4n and 4n+2 equal, as well as those for bits 4n+1 and 4n+3.
(This is entirely equivalent to the answers of JoshG79, notsogeek, or james: 4≡-1(mod 5), 3≡-2(mod 5) (with reduced hand-waving about recursion in argumentation, and no dispensable handling of carries in circuitry))

Confusion regarding genetic algorithms

My books(Artificial Intelligence A modern approach) says that Genetic algorithms begin with a set of k randomly generated states, called population. Each state is represented as a string over a finite alphabet- most commonly, a string of 0s and 1s. For eg, an 8-queens state must specify the positions of 8 queens, each in a column of 8 squares, and so requires 8 * log(2)8 = 24 bits. Alternatively the state could be represented as 8 digits, each in range from 1 to 8.
[ http://en.wikipedia.org/wiki/Eight_queens_puzzle ]
I don't understand the expression 8 * log(2)8 = 24 bits , why log2 ^ 8? And what are these 24 bits supposed to be for?
If we take first example on the wikipedia page, the solution can be encoded as [2,4,6,8,3,1,7,5] : the first digit gives the row number for the queen in column A, the second for the queen in column B and so on. Now instead of starting the row numbering at 1, we will start at 0. The solution is then encoded with [1,3,5,7,0,6,4]. Any position can be encoded such way.
We have only digits between 0 and 7, if we write them in binary 3 bit (=log2(8)) are enough :
000 -> 0
001 -> 1
...
110 -> 6
111 -> 7
A position can be encoded using 8 times 3 digits, e.g. from [1,3,5,7,2,0,6,4] we get [001,011,101,111,010,000,110,100] or more briefly 001011101111010000110100 : 24 bits.
In the other way, the bitstring 000010001011100101111110 decodes as 000.010.001.011.100.101.111.110 then [0,2,1,3,4,5,7,6] and gives [1,3,2,4,5,8,7] : queen in column A is on row 1, queen in column B is on row 3, etc.
The number of bits needed to store the possible squares (8 possibilities 0-7) is log(2)8. Note that 111 in binary is 7 in decimal. You have to specify the square for 8 columns, so you need 3 bits 8 times

about number of bits required for Fibonacci number

I am reading a algorithms book by S.DasGupta. Following is text snippet from the text regarding number of bits required for nth Fibonacci number.
It is reasonable to treat addition as
a single computer step if small
numbers are being added, 32-bit
numbers say. But the nth Fibonacci
number is about
0.694n bits long, and this can far exceed 32 as n grows. Arithmetic
operations on arbitrarily large
numbers cannot possibly be performed
in a single, constant-time step.
My question is for eg, for Fibonacci number F1 = 1, F2 =1, F3=2, and so on. then substituting "n" in above formula i.e., 0.694n for F1 is approximately 1, F2 is approximately 2 bits, but for F3 and so on above formula fails. I think i didn't understand propely what author mean here, can any one please help me in understanding this?
Thanks
Well,
n 3 4 5 6 7 8
0.694n 2.08 2.78 3.47 4.16 4.86 5.55
F(n) 2 3 5 8 13 21
bits 2 2 3 4 4 5
log(F(n)) 1 1.58 2.32 3 3.7 4.39
Bits required is the base-2 log rounded up, so this is close enough for me.
The value 0.694 comes from the fact that F(n) is the closest integer to (φn)/√5. So log(F(n)) is n * log(phi) - log(sqrt(5)), and log(phi) is 0.694. As n gets bigger, the log(sqrt(5)) and the rounding rapidly become insignificant.
private static int nobFib(int n) // number of bits Fib(n)
{
return n < 6 ? ++n/2 : (int)(0.69424191363061738 * n - 0.1609640474436813);
}
Checked it for n from 0 to 500.000, n=500.000.000, n=1.000.000.000
It's based on Binet's formula.
Needed it for: Fibonacci Sequence Binary Plot.
See: http://bigintegers.blogspot.com/2012/09/fibonacci-sequence-binary-plot-edd-peg.html
First of all, the word about is very important, as in the nth Fibonacci number is about 0.694n bits long. Second, I think the author means when n->infinity. Try some big number and check :)
you cant have say half a bit... the amount of bits must be rounded
so it means
number of bits = Math.ceil(Math.max(0.694*n,32));
so its rounded up for n>32 and 32 for n<32
for 32bit systems that is
and the number may not be exact
I think he's just using the Fibonacci numbers to illustrate his point that for large numbers (>32 bit) addition cannot be assumed to be constant anymore because it involves more than a singe instruction on the CPU.
Why does the formula fail? For F3=2 the binary representation needs 2bits (3 * 0.694 = 2.082) Take F50=12586269025, which can be represented using 33bits (50 * 0.694 = 35) which is still reasonably close to the true value.
N F(N) 0.694*N
1 0 1
2 1 1
3 1 1
4 2 2
5 3 2
6 5 3
7 8 4
8 13 4
etc. That's my interpretation. But then, that means that you have to get to f(47) = 1,836,311,903 before you exceed 32 bits.
The author is basically describing how large numbers affect the performance of the algorithm. To be overly simple, a processor can add numbers of the register size very quickly, if the numbers exceed the register size, more low level processor instructions need to be executed.

Resources