How can one byte be more significant than another? - endianness

The difference between little endian and big endian was explained to me like this: "In little endian the least significant byte goes into the low address, and in big endian the most significant byte goes into the low address". What exactly makes one byte more significant than the other?

In the number 222, you could regard the first 2 as most significant because it represents the value 200; the second 2 is less significant because it represents the value 20; and the third 2 is the least significant digit because it represents the value 2.
So, although the digits are the same, the magnitude of the number they represent is used to determine the significance of a digit.
It is the same as when a value is rounded to a number of significant figures ("S.F." or "SF"): 123.321 to 3SF is 123, to 2SF it is 120, to 4SF it is 123.3. That terminology has been used since before electronics were invented.

In any positional numeric system, each digit has a different weight in creating the overall value of the number.
Consider the number 51354 (in base 10): the first 5 is more significant than the second 5, as it stands for 5 multiplied by 10000, while the second 5 is just 5 multiplied by 10.
In computers number are generally fixed-width: for example, a 16 bit unsigned integer can be thought as a sequence of exactly 16 binary digits, with the leftmost one being unambiguously the most significant (it is worth exactly 32768, more than any other bit in the number), and the rightmost the least significant (it is worth just one).
As long as integers are in the CPU registers we don't really need to care about their representation - the CPU will happily perform operations on them as required. But when they are saved to memory (which generally is some random-access bytes store), they need to be represented as bytes in some way.
If we consider "normal" computers, representing a number (bigger than one byte) as a sequence of bytes means essentially representing it in base 256, each byte being a digit in base 256, and each base-256 digit being more or less significant.
Let's see an example: take the value 54321 as a 16 bit integer. If you write it in base 256, it'll be two base-256 digits: the digit 0xD41 (which is worth 0xD4 multiplied by 256) and the digit 0x31 (which is worth 0x31 multiplied by 1). It's clear that the first one is more significant than the second one, as indeed the leftmost "digit" is worth 256 times more than the one at its right.
Now, little endian machines will write in memory the least significant digit first, big endian ones will do the opposite.
Incidentally, there's a nice relationship between binary, hexadecimal and base-256: 4 bits are mapped straight to a hexadecimal digit, and 2 hexadecimal digits are mapped straight to a byte. For this reason you can also see that 54321 is in binary
1101010000110001 = 0xD431
can be split straight into two groups of 8 bits
11010100 00110001
which are the 0xD4 and 0x31 above. So you can see as well that the most significant byte is the one that contains the most significant bits.
Here I'm using the corresponding hexadecimal values to represent each base-256 digit, as there's no good way to represent them symbolically. I could use their ASCII character value, but I 0xD4 is outside ASCII, and 0x31 is 1, which would only add confusion.

Related

What is the output of '%b' verb when it is floating number

According to the go doc, %b used with floating number means:
decimalless scientific notation with exponent a power of two,
in the manner of strconv.FormatFloat with the 'b' format,
e.g. -123456p-78
As the code shows below, the program output is
8444249301319680p-51
I'm a little confused about %b in floating number, can anybody tell me how this result is calculated? Also what does p- mean?
f := 3.75
fmt.Printf("%b\n", f)
fmt.Println(strconv.FormatFloat(f, 'b', -1, 64))
The decimalless scientific notation with exponent a power of two that means follows:
8444249301319680*(2^-51) = 3.75 or 8444249301319680/(2^51) = 3.75
p-51 means 2^-51 which can also be calculated as 1/(2^51)
Nice article on Floating-Point Arithmetic.
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
The five rules of scientific notation are given below:
The base is always 10
The exponent must be a non-zero integer, which means it can be either positive or negative
The absolute value of the coefficient is greater than or equal to 1 but it should be less than 10
The coefficient carries the sign (+) or (-)
he mantissa carries the rest of the significant digits
p
%b scientific notation with exponent a power of two (its p)
%e scientific notation
It is worth pointing out that the %b output is particularly easy for the runtime system to generate as well, due to the internal storage format for floating point numbers.
If we ignore "denormalized" floating point numbers (we can add them back later), a floating point number is stored, internally, as 1.bbbbbb...bbb x 2exp for some set of bits ("b" here), e.g., the value four is stored as 1.000...000 <exp> 2. The value six is stored as 1.100...000 <exp> 2, the value seven is stored as 1.110...000 <exp> 2, and eight is stored as 1.000...000 <exp> 3. The value seven-and-a-half is 1.111 <exp> 2, seven and three quarters is 1.1111 <exp> 2, and so on. Each bit here, in the 1.bbbb, represents the next power of two lower than the exponent.
To print out 1.111 <exp> 2 with the %b format, we simply note that we need four 1 bits in a row, i.e., the value 15 decimal or 0xf or 1111 binary, which causes the exponent to need to be decreased by 3, so that instead of multiplying by 22 or 4, we want to multiply by 2-1 or ½. So we can take the actual exponent (2), subtract 3 (because we moved the "point" three times to print 1111 binary or 15), and hence print out the string 15p-1.
That's not what Go's %b prints though: it prints 8444249301319680p-50. This is the same value (so either one would be correct output)—but why?
Well, 8444249301319680 is, in hexadecimal, 1E000000000000. Expanded into full binary, this is 1 1110 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000. That's 53 binary digits. Why 53 binary digits, when four would suffice?
The answer to that is found in the link in Nick's answer: IEEE 754 floating point format uses a 53-digit "mantissa" or "significand" (the latter is the better term and the one I usually try to use, but you'll see the former pop up very often). That is, the 1.bbb...bbb has 52 bs, plus that forced-in leading 1. So there are always exactly 53 binary digits (for IEEE "double precision").
If we just treat this 53-binary-digit number as a decimal number, we can always print it out without a decimal point. That means we just adjust the power-of-two exponent.
In IEEE754 format, the exponent itself is already stored in "excess form", with 1023 added (for double precision again). That means that 1.111000...000 <exp> 2 is actually stored with an exponent value of 2+1023 = 1025. What this means is that to get the actual power of two, the machine code formatting the number is already going to have to subtract 1023. We can just have it subtract 52 more at the same time.
Last, because the implied 1 is always there, the internal IEEE754 number doesn't actually store the 1 bit. So to read out the value and convert it, the code internally does:
decimalPart := machineDependentReinterpretation1(&doubleprec_value)
expPart := machineDependentReinterpretation2(&doubleprec_value)
where the machine-dependent-reinterpretation simply extracts the correct bits, puts in the implied 1 bit as needed in the decimal part, subtracts the offset (1023+52) for the exponent part, and then does:
fmt.Sprint("%dp%d", decimalPart, expPart)
When printing a floating-point number in decimal, the base conversion (from base 2 to base 10) is problematic, requiring a lot of code to get the rounding right. Printing it in binary like this is much easier.
Exercises for the reader, to help with understanding this:
Compute 1.102 x 22. Note: 1.12 is 1½ decimal.
Compute 11.02 x 21. (11.02 is 3.)
Based on the above, what happens as you "slide the binary point" left and right?
(more difficult) Why can we assume a leading 1? If necessary, read on.
Why we can assume a leading 1?
Let's first note that when we use scientific notation in decimal, we can't assume a leading 1. A number might be 1.7 x 103, or 5.1 x 105, or whatever. But when we use scientific notation "correctly", the first digit is never zero. That is, we do not write 0.3 x 100 but rather 3.0 x 10-1. In this kind of notation, the number of digits tells us about the precision, and the first digit never has to be zero and generally isn't supposed to be zero. If the first digit were zero, we just move the decimal point and adjust the exponent (see exercises 1 and 2 above).
The same rules apply with floating-point numbers. Instead of storing 0.01, for instance, we just slide the binary point two over two positions and get 1.00, and decrease the exponent by 2. If we might want to have stored 11.1, we slide the binary point one position the other way and increase the exponent. Whenever we do this, the first digit always winds up being a one.
There is one big exception here, which is: when we do this, we can't store zero! So we don't do this for the number 0.0. In IEEE754, we store 0.0 as all-zero-bits (except for the sign, which we can set to store -0.0). This has an all-zero exponent, which the computer hardware handles as a special case.
Denormalized numbers: when we can't assume a leading 1
This system has one notable flaw (which isn't entirely fixed by denorms, but nonetheless, IEEE has denorms). That is: the smallest number we can store "abruptly underflows" to zero. Kahan has a 15 page "brief tutorial" on gradual underflow, which I am not going to attempt to summarize, but when we hit the minimum allowed exponent (2-1023) and want to "get smaller", IEEE lets us stop using these "normalized" numbers with the leading 1 bit.
This doesn't affect the way that Go itself formats floating point numbers, because Go just takes the entire significand "as is". All we have to do is stop inserting the 253 "implied 1" when the input value is a denormalized number, and everything else Just Works. We can hide this magic inside the machine-dependent float64 reinterpretation code, or do it explicitly in Go, whichever is more convenient.

Algorithm for Converting large integer to string without modulo base

I have looked for a while to find an algorithm which converts integers to string. My requirement is to do this manually as I am using my own large number type. I have + - * /(with remainder) defined, but need to find a way to print a single number from a double int (high and low, if int is 64bits, 128bits total).
I have seen some answers such as
Convert integer to string without access to libraries
Converting a big integer to decimal string
but was wondering if a faster algorithm was possible. I am open to working with bits directly(e.g. base2 to base10-string - I could not find such an algorithm however), but I was just hoping to avoid repeated division by 10 for numbers possibly as large as 2^128.
You can use divide-and-conquer in such a way that the parts can be converted to string using your standard library (which will typically be quite efficient at that job).
So instead of dividing by 10 in every iteration, you can e.g. divide by 10**15, and have your library convert the chunks to 15-digit strings. After at most three steps, you're finished.
Of course you have to do some string manipulation regarding the zero-padding. But maybe your library can help you here as well, if you use something like a %015d zero-padding format for all the lower parts, and for the highest non-zero part use a non-padding %d format.
You may try your luck with a contrived method, as follows.
Numbers can be represented using the Binary-Coded Decimal representation. In this representation, every decimal digit is stored on 4 bits, and when performing an addition, if the sum of two digits exceeds 9, you add 6 and carry to the left.
If you have pre-stored the BCD representation of all powers of 2, then it takes at most 128 additions to perform the conversion. You can spare a little by the fact that for low powers, you don't need full length addition (39 digits).
But this sounds as a lot of operations. You can "parallelize" them by packing several BCD digits in an single integer: an integer addition on 32 bits is equivalent to 8 simultaneaous BCD digit additions. But we have a problem with the carries. To work around, we can store the digits on 5 bits instead of 4, and the carries will appear in the fifth bit. Then we can obtain the carries by masking, add them to the next digits (shift left 5), and adjust the digit sums (multiply by 10 and subtract).
2 3 4 5 6
+ 7 6 9 2 1
= 9 913 7 7
Carries:
0-0-1-0-0
Adjustments:
9 913 7 7
-0000010000
= 9 9 3 7 7
Actually, you have to handle possible cascaded carries, so the sum will involve the two addends and carries in, and generate a sum and carries out.
32 bits operations allow you to process 6 digits at a time (7 rounds for 39 digits), and 64 bits operations, 12 digits (4 rounds for 39 digits).
if you want to just encode your numbers as string
use hex numbers that is fast as you can cast all the digits just by bit operations ... also using Base64 encoding is doable just by bit operations + the translation table. Booth representations can be done on small int arithmetics only in O(n) where n is the count of printed digits.
If you need base10
then print a hex string and convert it to decimal on strings like this:
str_hex2dec
this is much slower than #1 but still doable on small int arithmetics ... You can do this also in reverse (input number from string) by using dec2hex ...
For bigint libs there are also another ways of easing up the string/integer conversions:
BCD
binary coded decimal ... the number printed as hex is the decadic number. So each digit has 4 bits. This waste some memory but many CPU's has BCD support and can do operations on such integers natively.
Base 10^n
sometimes is used base 10^n instead of 2^m while
10^n <= 2^m
The m is bitwidth of your atomic integer and n i snumber of decadic digits that fits inside it.
for example if your atomic unsigned integer is 16 bit it can hold up to 65536 values in base 2. If you use base 10000 instead you can print each atom asa decadic number with zeropad from left and simply stack all such prints together.
This also waste some memory but usually not too much (if the bitwidth is reasonably selected) and you can use standard instructions on the integers. Only the Carry propagation will change a bit...
for example for 32bit words:
2^32 = 4294967296 >= 1000000000
so we wasted log2(4.2949...) = ~2.1 bits per each 32 bits. This is much better than BCD log2(16/10)*(32/4)= ~5.42 bits And usually even better with higher bit widths

Best way to represent numbers of unbounded length?

What's the most optimal (space efficient) way to represent integers of unbounded length?
(The numbers range from zero to positive-infinity)
Some sample number inputs can be found here (each number is shown on it's own line).
Is there a compression algorithm that is specialized in compressing numbers?
You've basically got two alternatives for variable-length integers:
Use 1 bit of every k as an end terminator. That's the way Google protobuf does it, for example (in their case, one bit from every byte, so there are 7 useful bits in every byte).
Output the bit-length first, and then the bits. That's how ASN.1 works, except for OIDs which are represented in form 1.
If the numbers can be really big, Option 2 is better, although it's more complicated and you have to apply it recursively, since you may have to output the length of the length, and then the length, and then the number. A common technique is to use a Option 1 (bit markers) for the length field.
For smallish numbers, option 1 is better. Consider the case where most numbers would fit in 64 bits. The overhead of storing them 7 bits per byte is 1/7; with eight bytes, you'd represent 56 bits. Using even the 7/8 representation for length would also represent 56 bits in eight bytes: one length byte and seven data bytes. Any number shorter than 48 bits would benefit from the self-terminating code.
"Truly random numbers" of unbounded length are, on average, infinitely long, so that's probably not what you've got. More likely, you have some idea of the probability distribution of number sizes, and could choose between the above options.
Note that none of these "compress" (except relative to the bloated ascii-decimal format). The asymptote of log n/n is 0, so as the numbers get bigger the size of the size of the numbers tends to occupy no (relative) space. But it still needs to be represented somehow, so the total representation will always be a bit bigger than log2 of the number.
You cannot compress per se, but you can encode, which may be what you're looking for. You have files with sequences of ASCII decimal digits separated by line feeds. You should simply Huffman encode the characters. You won't do much better than about 3.5 bits per character.

Fastest method for adding/summing the individual digit components of a number

I saw a question on a math forum a while back where a person was discussing adding up the digits in a number over and over again until a single digit is achieved. (i.e. "362" would become "3+6+2" which would become "11"... then "11" would become "1+1" would would become "2" therefor "362" would return 2... I wrote some nice code to get an answer to this and posted it only to be outdone by a user who suggested that any number in modulo 9 is equal to this "infinite digit sum", I checked it an he was right... well almost right, if zero was returned you had to switch it out with a "9" but that was a very quick fix...
362 = 3+6+2 = 11 = 1+1 = 2
or...
362%9 = 2
Anways, the mod9 method works fantastic for infinitely adding the sum of the digits until you are left with just a single digit... but what about only doing it once (i.e. 362 would just return "11")... Can anyone think of fast algorithms?
There's a cool trick for summing the 1 digits in binary, and with a fixed-width integer. At each iteration, you separate out half the digits each into two values, bit-shift one value down, then add. First iteration, separate ever other digit. Second iteration, pairs of digits, and so on.
Given that 27 is 00011011 as 8-bit binary, the process is...
00010001 + 00000101 = 00010110 <- every other digit step
00010010 + 00000001 = 00010011 <- pairs of digits
00000011 + 00000001 = 00000100 <- quads, giving final result 4
You could do a similar trick with decimal, but it would be less efficient than a simple loop unless you had a direct representation of decimal numbers with fast operations to zero out selected digits and to do digit-shifting. So for 12345678 you get...
02040608 + 01030507 = 03071115 <- every other digit
00070015 + 00030011 = 00100026 <- pairs
00000026 + 00000010 = 00000036 <- quads, final result
So 1+2+3+4+5+6+7+8 = 36, which is correct, but you can only do this efficiently if your number representation is fixed-width decimal. It always takes lg(n) iterations, where lg means the base two logarithm, and you round upwards.
To expand on this a little (based on in-comments discussions), let's pretend this was sane, for a bit...
If you count single-digit additions, there's actually more work than a simple loop here. The idea, as with the bitwise trick for counting bits, is to re-order those additions (using associativity) and then to compute as many as possible in parallel, using a single full-width addition to implement two half-width additions, four quarter-width additions etc. There's significant overhead for the digit-clearing and digit-shifting operations, and even more if you implement this as a loop (calculating or looking up the digit-masking and shift-distance values for each step). The "loop" should probably be fully unrolled and those masks and shift-distances be included as constants in the code to avoid that.
A processor with support for Binary Coded Decimal (BCD) could handle this. Digit masking and digit shifting would be implemented using bit masking and bit shifting, as each decimal digit would be encoded in 4 (or more) bits, independent of the encoding of other digits.
One issue is that BCD support is quite rare these days. It used to be fairly common in the 8 bit and 16 bit days, but as far as I'm aware, processors that still support it now do so mainly for backward compatibility. Reasons include...
Very early processors didn't include hardware multiplication and division. Hardware support for these operations means it's easier and more efficient to convert binary to decimal now. Binary is used for almost everything now, and BCD is mostly forgotten.
There are decimal number representations around in libraries, but few if any high level languages ever provided portable support to hardware BCD, so since assembler stopped being a real-world option for most developers BCD support simply stopped being used.
As numbers get larger, even packed BCD is quite inefficiently packed. Number representations base 10^x have the most important properties of base 10, and are easily decoded as decimal. Base 1000 only needs 10 bits per three digits, not 12, because 2^10 is 1024. That's enough to show you get an extra decimal digit for 32 bits - 9 digits instead of 8 - and you've still got 2 bits left over, e.g. for a sign bit.
The thing is, for this digit-totalling algorithm to be worthwhile at all, you need to be working with fixed-width decimal of probably at least 32 bits (8 digits). That gives 12 operations (6 masks, 3 shifts, 3 additions) rather than 15 additions for the (fully unrolled) simple loop. That's a borderline gain, though - and other issues in the code could easily mean it's actually slower.
The efficiency gain is clearer at 64 bits (16 decimal digits) as there's still only 16 operations (8 masks, 4 shifts, 4 additions) rather than 31, but the odds of finding a processor that supports 64-bit BCD operations seems slim. And even if you did, how often do you need this anyway? It seems unlikely that it could be worth the effort and loss of portability.
Here's something in Haskell:
sumDigits n =
if n == 0
then 0
else let a = mod n 10
in a + sumDigits (div n 10)
Oh, but I just read you're doing that already...
(then there's also the obvious:
sumDigits n = sum $ map (read . (:[])) . show $ n
)
For short code, try this:
int digit_sum(int n){
if (n<10) return n;
return n%10 + digit_sum(n/10);
}
Or, in words,
-If the number is less than ten, then the digit sum is the number itself.
-Otherwise, the digit sum is the current last digit (a.k.a. n mod10 or n%10), plus the digit sum of everything to the left of that number (n divided by 10, using integer division).
-This algorithm can also be generalized for any base, substituting the base in for 10.
int digit_sum(int n)
Do
if (n<10) return n;
Exit do
else
n=n%10 + digit_sum(n/10);
Loop

Least significant bit first

While working on ruby I came across:
> "cc".unpack('b8B8')
=> ["11000110", "01100011"]
Then I tried Googleing to find a good answer on "least significant bit", but could not find any.
Anyone care to explain, or point me in the right direction where I can understand the difference between "LSB first" and "MSB first".
It has to do with the direction of the bits. Notice that in this example it's unpacking two ascii "c" characters, and yet the bits are mirror images of each other. LSB means the rightmost (least-significant) bit is the first bit. MSB means the leftmost (most-significant) is the first bit.
As a simple example, consider the number 5, which in "normal" (readable) binary looks like this:
00000101
The least significant bit is the rightmost 1, because that is the 2^0 position (or just plan 1). It doesn't impact the value too much. The one next to it is the 2^1 position (or just plain 0 in this case), which is a bit more significant. The bit to its left (2^2 or just plain 4) is more significant still. So we say this is MSB notation because the most significant bit (2^7) comes first. If we change it to LSB, it simply becomes:
10100000
Easy right?
(And yes, for all you hardware gurus out there I'm aware that this changes from one architecture to another depending on endianness, but this is a simple answer for a simple question)
The term "significance" of a bit or byte only makes sense in the context of interpreting a sequence of bits or bytes as an integer. The bigger the impact of the bit or byte on the value of the resulting integer - the higher its significance. The more "significant" it is to the value.
So, for example, when we talk about a sequence of four bytes having the least significant byte first (aka little-endian), what we mean is that when we interpret those four bytes as a 32bit integer, the first byte denotes the lowest eight binary digits of the integer, the second bytes denotes the 17th through 24th binary digits of the integer, the third denotes the 9th through 16th, and the last byte denotes the highest eight bits of the integer.
Likewise, if we say a sequence of 8 bits is in most significant bit first order, what we mean is that if we interpret the 8 bits as an 8-bit integer, the first bit in the sequence denotes the highest binary digit of the integer, the second bit the second highest, and so on, until the last bit denotes the lowest binary digit of the integer.
Another thing to think about is that one can say that the usual decimal notation has a convention that is most significant digit first. For example, a decimal number like:
1250
is read to mean:
1 x 1000 +
2 x 100 +
5 x 10 +
0 x 1
Right? Now imagine a different convention that is least significant digit first. The same number would be written:
0521
and would be read as:
0 x 1 +
5 x 10 +
2 x 100 +
1 x 1000
Another thing you should observe in passing is that in the C family of languages (most modern programming languages), the shift-left operator (<<) and shift-right operator (>>) are pointing in a most significant bit direction. That is shifting a bit left increases its significance and shifting it right decreases it, meaning that left is most significant (and the left side is usually what we mean by first, at least in the west).

Resources