This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Bitwise and in place of modulus operator
Can someone explain the rationale that makes both expressions equivalents? I know it only works because 64 is a power of two, but how can I logically or mathematically go from division to bitwise and?
The operation x % 64 returns the remainder when x is divided by 64, which (assuming x>0) must be a number between 0 and 63. Let's look at this in binary:
63dec = 0011 1111b
64dec = 0100 0000b
You can see that the binary representation of any multiple of 64 must end with 6 zeroes. So the remainder when dividing any number by 64 is the original number, with all of the bits removed except for the 6 rightmost ones.
If you take the bitwise AND of a number with 63, the result is exactly those 6 bits.
Each time you do a bit-shift this is the same as dividing by two. This is because a binary representation is base 2. It is the same way that removing the 3 from 123 in base 10 gives you 12, and that is like dividing 123 by 10.
% is the mod operator, which means the remainder of a division. 64 is 2 to the sixth power, so dividing by 64 is like shifting out six bits. The remainder of the division is those six bits that you shifted out. You can find the value of the six bits by doing a bitwise-and with only the lower six bits set, which is 63.
first one gives the remainder.
second one is short-circuit ( bit wise AND).
in bit wise AND, 63(in binary is 111111) so whatever is on LHS (x) is anded, resulting in same except the MSB. Ans so is the case with % with 64( binary 100000), divides and MSB remains the same.
Related
I have looked for a while to find an algorithm which converts integers to string. My requirement is to do this manually as I am using my own large number type. I have + - * /(with remainder) defined, but need to find a way to print a single number from a double int (high and low, if int is 64bits, 128bits total).
I have seen some answers such as
Convert integer to string without access to libraries
Converting a big integer to decimal string
but was wondering if a faster algorithm was possible. I am open to working with bits directly(e.g. base2 to base10-string - I could not find such an algorithm however), but I was just hoping to avoid repeated division by 10 for numbers possibly as large as 2^128.
You can use divide-and-conquer in such a way that the parts can be converted to string using your standard library (which will typically be quite efficient at that job).
So instead of dividing by 10 in every iteration, you can e.g. divide by 10**15, and have your library convert the chunks to 15-digit strings. After at most three steps, you're finished.
Of course you have to do some string manipulation regarding the zero-padding. But maybe your library can help you here as well, if you use something like a %015d zero-padding format for all the lower parts, and for the highest non-zero part use a non-padding %d format.
You may try your luck with a contrived method, as follows.
Numbers can be represented using the Binary-Coded Decimal representation. In this representation, every decimal digit is stored on 4 bits, and when performing an addition, if the sum of two digits exceeds 9, you add 6 and carry to the left.
If you have pre-stored the BCD representation of all powers of 2, then it takes at most 128 additions to perform the conversion. You can spare a little by the fact that for low powers, you don't need full length addition (39 digits).
But this sounds as a lot of operations. You can "parallelize" them by packing several BCD digits in an single integer: an integer addition on 32 bits is equivalent to 8 simultaneaous BCD digit additions. But we have a problem with the carries. To work around, we can store the digits on 5 bits instead of 4, and the carries will appear in the fifth bit. Then we can obtain the carries by masking, add them to the next digits (shift left 5), and adjust the digit sums (multiply by 10 and subtract).
2 3 4 5 6
+ 7 6 9 2 1
= 9 913 7 7
Carries:
0-0-1-0-0
Adjustments:
9 913 7 7
-0000010000
= 9 9 3 7 7
Actually, you have to handle possible cascaded carries, so the sum will involve the two addends and carries in, and generate a sum and carries out.
32 bits operations allow you to process 6 digits at a time (7 rounds for 39 digits), and 64 bits operations, 12 digits (4 rounds for 39 digits).
if you want to just encode your numbers as string
use hex numbers that is fast as you can cast all the digits just by bit operations ... also using Base64 encoding is doable just by bit operations + the translation table. Booth representations can be done on small int arithmetics only in O(n) where n is the count of printed digits.
If you need base10
then print a hex string and convert it to decimal on strings like this:
str_hex2dec
this is much slower than #1 but still doable on small int arithmetics ... You can do this also in reverse (input number from string) by using dec2hex ...
For bigint libs there are also another ways of easing up the string/integer conversions:
BCD
binary coded decimal ... the number printed as hex is the decadic number. So each digit has 4 bits. This waste some memory but many CPU's has BCD support and can do operations on such integers natively.
Base 10^n
sometimes is used base 10^n instead of 2^m while
10^n <= 2^m
The m is bitwidth of your atomic integer and n i snumber of decadic digits that fits inside it.
for example if your atomic unsigned integer is 16 bit it can hold up to 65536 values in base 2. If you use base 10000 instead you can print each atom asa decadic number with zeropad from left and simply stack all such prints together.
This also waste some memory but usually not too much (if the bitwidth is reasonably selected) and you can use standard instructions on the integers. Only the Carry propagation will change a bit...
for example for 32bit words:
2^32 = 4294967296 >= 1000000000
so we wasted log2(4.2949...) = ~2.1 bits per each 32 bits. This is much better than BCD log2(16/10)*(32/4)= ~5.42 bits And usually even better with higher bit widths
Let's say we want to represent a signed number with 5 bits where the first bit is used for the sign (+ or -) of the number. Then the zero can be represented by two bit representations (10000 and 00000).
How is this problem solved?
Okay. There are always two bit in binary 1 or 0
And then there could be any number of bits for example 1bit to 64bit
If the question is 5-bit string then it should be XXXXX where X can be any bit(1 or 0)
First bit(sign bit) we can have either +0 and -0. (Thanks #machinery)
So if it is positive, we put 0 at first position and if it is negative, we put 1 at first position.
Four Bits
Now, we got our first bit, we are left with another 4-bits 0XXXX or 1XXXX as the question asked for 0,
the rest bit will be zero.
therefore the answer is 00000 or 10000
Look how to convert decimal to binary and binary to decimal.
I saw a question on a math forum a while back where a person was discussing adding up the digits in a number over and over again until a single digit is achieved. (i.e. "362" would become "3+6+2" which would become "11"... then "11" would become "1+1" would would become "2" therefor "362" would return 2... I wrote some nice code to get an answer to this and posted it only to be outdone by a user who suggested that any number in modulo 9 is equal to this "infinite digit sum", I checked it an he was right... well almost right, if zero was returned you had to switch it out with a "9" but that was a very quick fix...
362 = 3+6+2 = 11 = 1+1 = 2
or...
362%9 = 2
Anways, the mod9 method works fantastic for infinitely adding the sum of the digits until you are left with just a single digit... but what about only doing it once (i.e. 362 would just return "11")... Can anyone think of fast algorithms?
There's a cool trick for summing the 1 digits in binary, and with a fixed-width integer. At each iteration, you separate out half the digits each into two values, bit-shift one value down, then add. First iteration, separate ever other digit. Second iteration, pairs of digits, and so on.
Given that 27 is 00011011 as 8-bit binary, the process is...
00010001 + 00000101 = 00010110 <- every other digit step
00010010 + 00000001 = 00010011 <- pairs of digits
00000011 + 00000001 = 00000100 <- quads, giving final result 4
You could do a similar trick with decimal, but it would be less efficient than a simple loop unless you had a direct representation of decimal numbers with fast operations to zero out selected digits and to do digit-shifting. So for 12345678 you get...
02040608 + 01030507 = 03071115 <- every other digit
00070015 + 00030011 = 00100026 <- pairs
00000026 + 00000010 = 00000036 <- quads, final result
So 1+2+3+4+5+6+7+8 = 36, which is correct, but you can only do this efficiently if your number representation is fixed-width decimal. It always takes lg(n) iterations, where lg means the base two logarithm, and you round upwards.
To expand on this a little (based on in-comments discussions), let's pretend this was sane, for a bit...
If you count single-digit additions, there's actually more work than a simple loop here. The idea, as with the bitwise trick for counting bits, is to re-order those additions (using associativity) and then to compute as many as possible in parallel, using a single full-width addition to implement two half-width additions, four quarter-width additions etc. There's significant overhead for the digit-clearing and digit-shifting operations, and even more if you implement this as a loop (calculating or looking up the digit-masking and shift-distance values for each step). The "loop" should probably be fully unrolled and those masks and shift-distances be included as constants in the code to avoid that.
A processor with support for Binary Coded Decimal (BCD) could handle this. Digit masking and digit shifting would be implemented using bit masking and bit shifting, as each decimal digit would be encoded in 4 (or more) bits, independent of the encoding of other digits.
One issue is that BCD support is quite rare these days. It used to be fairly common in the 8 bit and 16 bit days, but as far as I'm aware, processors that still support it now do so mainly for backward compatibility. Reasons include...
Very early processors didn't include hardware multiplication and division. Hardware support for these operations means it's easier and more efficient to convert binary to decimal now. Binary is used for almost everything now, and BCD is mostly forgotten.
There are decimal number representations around in libraries, but few if any high level languages ever provided portable support to hardware BCD, so since assembler stopped being a real-world option for most developers BCD support simply stopped being used.
As numbers get larger, even packed BCD is quite inefficiently packed. Number representations base 10^x have the most important properties of base 10, and are easily decoded as decimal. Base 1000 only needs 10 bits per three digits, not 12, because 2^10 is 1024. That's enough to show you get an extra decimal digit for 32 bits - 9 digits instead of 8 - and you've still got 2 bits left over, e.g. for a sign bit.
The thing is, for this digit-totalling algorithm to be worthwhile at all, you need to be working with fixed-width decimal of probably at least 32 bits (8 digits). That gives 12 operations (6 masks, 3 shifts, 3 additions) rather than 15 additions for the (fully unrolled) simple loop. That's a borderline gain, though - and other issues in the code could easily mean it's actually slower.
The efficiency gain is clearer at 64 bits (16 decimal digits) as there's still only 16 operations (8 masks, 4 shifts, 4 additions) rather than 31, but the odds of finding a processor that supports 64-bit BCD operations seems slim. And even if you did, how often do you need this anyway? It seems unlikely that it could be worth the effort and loss of portability.
Here's something in Haskell:
sumDigits n =
if n == 0
then 0
else let a = mod n 10
in a + sumDigits (div n 10)
Oh, but I just read you're doing that already...
(then there's also the obvious:
sumDigits n = sum $ map (read . (:[])) . show $ n
)
For short code, try this:
int digit_sum(int n){
if (n<10) return n;
return n%10 + digit_sum(n/10);
}
Or, in words,
-If the number is less than ten, then the digit sum is the number itself.
-Otherwise, the digit sum is the current last digit (a.k.a. n mod10 or n%10), plus the digit sum of everything to the left of that number (n divided by 10, using integer division).
-This algorithm can also be generalized for any base, substituting the base in for 10.
int digit_sum(int n)
Do
if (n<10) return n;
Exit do
else
n=n%10 + digit_sum(n/10);
Loop
While working on ruby I came across:
> "cc".unpack('b8B8')
=> ["11000110", "01100011"]
Then I tried Googleing to find a good answer on "least significant bit", but could not find any.
Anyone care to explain, or point me in the right direction where I can understand the difference between "LSB first" and "MSB first".
It has to do with the direction of the bits. Notice that in this example it's unpacking two ascii "c" characters, and yet the bits are mirror images of each other. LSB means the rightmost (least-significant) bit is the first bit. MSB means the leftmost (most-significant) is the first bit.
As a simple example, consider the number 5, which in "normal" (readable) binary looks like this:
00000101
The least significant bit is the rightmost 1, because that is the 2^0 position (or just plan 1). It doesn't impact the value too much. The one next to it is the 2^1 position (or just plain 0 in this case), which is a bit more significant. The bit to its left (2^2 or just plain 4) is more significant still. So we say this is MSB notation because the most significant bit (2^7) comes first. If we change it to LSB, it simply becomes:
10100000
Easy right?
(And yes, for all you hardware gurus out there I'm aware that this changes from one architecture to another depending on endianness, but this is a simple answer for a simple question)
The term "significance" of a bit or byte only makes sense in the context of interpreting a sequence of bits or bytes as an integer. The bigger the impact of the bit or byte on the value of the resulting integer - the higher its significance. The more "significant" it is to the value.
So, for example, when we talk about a sequence of four bytes having the least significant byte first (aka little-endian), what we mean is that when we interpret those four bytes as a 32bit integer, the first byte denotes the lowest eight binary digits of the integer, the second bytes denotes the 17th through 24th binary digits of the integer, the third denotes the 9th through 16th, and the last byte denotes the highest eight bits of the integer.
Likewise, if we say a sequence of 8 bits is in most significant bit first order, what we mean is that if we interpret the 8 bits as an 8-bit integer, the first bit in the sequence denotes the highest binary digit of the integer, the second bit the second highest, and so on, until the last bit denotes the lowest binary digit of the integer.
Another thing to think about is that one can say that the usual decimal notation has a convention that is most significant digit first. For example, a decimal number like:
1250
is read to mean:
1 x 1000 +
2 x 100 +
5 x 10 +
0 x 1
Right? Now imagine a different convention that is least significant digit first. The same number would be written:
0521
and would be read as:
0 x 1 +
5 x 10 +
2 x 100 +
1 x 1000
Another thing you should observe in passing is that in the C family of languages (most modern programming languages), the shift-left operator (<<) and shift-right operator (>>) are pointing in a most significant bit direction. That is shifting a bit left increases its significance and shifting it right decreases it, meaning that left is most significant (and the left side is usually what we mean by first, at least in the west).
I need to represent a number n using ONLY x bits. Usually, I can choose the suitable base, and find the number of digits needed. But here my constraint is that I have only 'x' bits available. I can have more 1 set of 'x' bits, however.
I am trying to understand about how numbers can be represented in any given system like this one.
Not sure if I understood your problem correctly, but assuming you have a natural number x that can be represented with m (e.g., 20) bits, but you have only arrays of n bits at your disposal (say, bytes, i.e. 8-bit arrays), the amount of arrays you need is simply m/n rounded up to the next natural number. For a number that has 20 digits in binary format, that would be 3 bytes.
E.g. if your number is
1001 01101100 10110100,
you could store it as
00001001
01101100
10110100.
What you have done is to
(integer-) divide your number by 100000000 (10^1000, or 2^8 in decimal system), write down the remainder, truncate the result
(integer-) divide the result of 1. by 100000000, write down the remainder, truncate the result
(integer-) divide the result of 2. by 100000000, write down the remainder, truncate the result
nothing interesting to do anymore because the result of 3 was 0.
Assuming we talk about natural numbers here, in the decimal system the above would look like this:
1. 617652/256 = 2412 remainder 180 (10110100 in binary system)
2. 2412/256 = 9 remainder 108 (01101100 in binary system)
3. 9/256 = 0 remainder 9 (00001101 in binary system)
So what you are doing is
while (number > 0) {
divide number by 2^n
remember remainder
truncate number
}
Restoring the original number is left as an exercise :)
This is actually a problem that comes up whenever you want to deal with very large integer numbers on the computer. I guess a good place to start looking for further information might be http://en.wikipedia.org/wiki/Positional_notation.