Approach for determining max and min values of various types - c++11

I would like to know if my approach to determining the max and min values of the types are correct.I have googled around and could not find an exact methodology to determine this
This is my approach :
To confirm the sizes of types I am using this link
Now the link states that the size of int_32_t (which by default is signed) will be 16 bits in LP32 .So max 16 bit no is 65535. But since its signed we will get a max of 65535/2 = 32767.5 so I am assuming its range will be -32767 to 32767 ? Am I correct ? And similarly for uint32_t the size is will be 16 bits in LP32 .So max 16 bit no is 65535 so range will be 0 to 65535 ? Am I correct ? Also what is the difference between LP32 and ILP32 which one should I be following ?

You don't have to make any assumptions. Just use the standard traits:
http://en.cppreference.com/w/cpp/types/numeric_limits
E.g.:
std::numeric_limits<std::int32_t>::min();
std::numeric_limits<std::int32_t>::max();
To address some of your points:
int_32_t (which by default is signed)
Not by default, but mandated by the language standard. It's a signed integer. The unsigned equivalent is std::uint32_t.
int32_t [...] will be 16 bits
Umm... nope. It's signed integer type with width of exactly 32 bits with no padding bits and using 2's complement for negative values (provided only if the implementation directly supports the type)
I am assuming its range will be -32767 to 32767? Am I correct ?
No. For a 16 bits signed integer using 2's complement (std::int16_t) the range is
-32,768 .. 32,767
Here is how you can reach these numbers:
The int type has 16 bits (and no padding). With 16 bits you can encode 2^16 = 65,536 distinct values. In two's complement these values are distributed as follows:
[0, 32,767]: 32,768 positive values
[-32,768, -1]: 32,768 negative values

Related

Binary problem : how to find integer B conforms to integer A

I want to write an algorithm (in Python), that get all the integers that are conforms to an another integer B, written in binary.
When A is conforms to B, it means that in all positions where B has bits set to 1, A has corresponding bits set to 1.
By example :
If we have 1001, the confoms numbers are : 1111, 1011, 1101;
We can assume that the solution should work with very large numbers (so has to be quite efficient).
I have thought about many solutions about doing some binary operations but I cannot get a complete solution.
Do you have any idea ?
As shown in your example:
An integer with z zero bits has 2**z conforming integers. We can subtract one, because one of these is the integer itself.
Accordingly, your algorithm has to count from 1 to 2**z and replace the z zero bits in the original integer by the z bits of your counter.
In python, you can use bitwise operators to test or change bit positions within an integer.
Examples for bitwise operations:
x & 1 returns 1, if the least-significant bit is set. Otherwise 0
x = x | 4 will set the 3rd bit corresponding to 4
Sketch of your algorithm:
1. Loop through the integer to find and count the zero bits
2. Loop from 1 to 2**z
Inner loop: Scan the z bits of the counter
Transfer the bits to a copy of the original integer
Record/output the resulting conformant integer

Generate n different floats that aren't ∞ or NaN (in Go)

I want a function getNthFloat(uint32 n) float32 such that for each n,m < 2³²-4 with n≠m, getNthFloat(n) and getNthFloat(m) return distinct floats that real numbers (neither NaN nor ±∞). 2³²-4 is chosen because if I understand IEEE 754 correctly, there are two binary representations of NaN, one for ∞ and one for -∞.
I imagine I should convert my uint32 into bits and convert bits into float32, but I can't figure out how to avoid the four values efficiently.
You can't get 2^32-4 valid floating point numbers in a float32. IEEE 754 binary32 numbers have two infinities (negative and positive) and 2^24-2 possible NaN values.
A 32 bit floating point number has the following bits:
31 30...23 22...0
sign exponent mantissa
All exponents with the value 0xff are either infinity (when mantissa is 0) or NaN (when mantissa isn't 0). So you can't generate those exponents.
Then it's just a simple matter or mapping your allowed integers into this format and then use math.Float32frombits to generate a float32. How you do that is your choice. I'd probably be lazy and just use the lowest bit for the sign and then reject all numbers higher than 2^32 - 2^24 - 1 and then shift the bits around.
So something like this (untested):
func foo(n uint32) float32 {
if n >= 0xff000000 {
panic("xxx")
}
return math.Float32frombits((n & 1) << 31 | (n >> 1))
}
N.B. I'd probably also avoid denormal numbers, that is numbers with the exponent 0 and non-zero mantissa. They can be slow and might not be handled correctly. For example they could all be mapped to zero, there's nothing in the go spec that talks about how denormal numbers are handled, so I'd be careful.
i think you are looking for math.Float32frombits function: https://golang.org/pkg/math/#Float32frombits

Understanding two's complement

I'm trying to understand two's complement:
Does two's complement mean that this number is invalid:
1000
Does two's complement disallow the use of the most significant bit for positive numbers. Ie. Could
1000
Ever represent 2^3? Or would it represent -0?
I'm also confused about why you need to add 1 to a one's complement.
in two's complement the MSB (most significant bit) is set to one for negatives. to multiply by -1 a two's complement number you well do the following:
add one to the number.
reverse all bits after the first one.
for example:
the number 10010 after adding one you well get: 10011 after reversing you get: 01101.
that means that 10010 is negative 13.
the number 1000 after adding one is: 1001 after reversing: 0111. that means that 1000 is negative 7.
now, to your last question: no. if you work with two's complement you can't use the MSB for positive numbers. but, you could define you are not using two's complement and use higher positive numbers.
Twos-complement is based on two requirements:
numbers are represented by a fixed number of bits;
x + -x = 0.
Assuming a four bit representation, say, we have
0 + -0 = 0000 + -0000 (base 2) = 0000 => -0000 = 0000
1 + -1 = 0001 + -0001 (base 2) = 0000 => -0001 = 1111 (carry falls off the end)
Now we have our building blocks, a drop of induction will show you the "flip the bits and add 1" algorithm is exactly what you need to convert a positive number to its twos-complement negative representation.
2's complement is mostly a matter of how you interpret the value, most math* doesn't care whether you view a number as signed or not. If you're working with 4 bits, 1000 is 8 as well as -8. This "odd symmetry" arises here because adding it to a number is the same as xoring it with a number (since only the high bit is set, so there is no carry into any bits). It also arises from the definition of two's complement - negation maps this number to itself.
In general, any number k represents the set of numbers { a | a = xk mod n } where n is 2 to the power of how many bits you're working with. This perhaps somewhat odd effect is a direct result of using modular arithmetic and is true whether you view number as signed or unsigned. The only difference between the signed and unsigned interpretations is which number you take to be the representative of such a set. For unsigned, the representative is the only such a that lies between 0 and n. For signed numbers, the representative is the only such a that lies between -(n/2) and (n/2)-1.
As for why you need to add one, the goal of negation is to find an x' such that x' + x = 0. If you only complemented the bits in x but didn't add one, x' + x would not have carries at any position and just sum to "all ones". "All ones" plus 1 is zero, so adding one fixes x' so that the sum will go to zero. Alternatively (well it's not really an alternative), you can take ~(x - 1), which gives the same result as ~x + 1.
*Signedness affects the result of division, right shift, and the high half of multiplication (which is rarely used and, in many programming languages, unavailable anyway).
It depends on how many bits you use to represent numbers.
The leftmost (largest) bit has a value of -1*(2**N-1) or in this case, -8. (N being the number of bits.) Subsequent bits are their normal values.
So
1000
is -8
1111
is -1
0111
is 7.
However, if you have 8 bits these become different values!
0000 1000
is positive 8. Only the leftmost bit adds a negative value to the answer.
In either case, the range of numbers is from
1000....0
for -2**(N-1) with N bits
to
0111....1
Which is 2**(N-1) -1. (This is just normal base 2 since the leftmost bit is 0.)

hashing a small number to a random looking 64 bit integer

I am looking for a hash-function which operates on a small integer (say in the range 0...1000) and outputs a 64 bit int.
The result-set should look like a random distribution of 64 bit ints: a uniform distribution with no linear correlation between the results.
I was hoping for a function that only takes a few CPU-cycles to execute. (the code will be in C++).
I considered multiplying the input by a big prime number and taking the modulo 2**64 (something like a linear congruent generator), but there are obvious dependencies between the outputs (in the lower bits).
Googling did not show up anything, but I am probably using wrong search terms.
Does such a function exist?
Some Background-info:
I want to avoid using a big persistent table with pseudo random numbers in an algorithm, and calculate random-looking numbers on the fly.
Security is not an issue.
I tested the 64-bit finalizer of MurmurHash3 (suggested by #aix and this SO post). This gives zero if the input is zero, so I increased the input parameter by 1 first:
typedef unsigned long long uint64;
inline uint64 fasthash(uint64 i)
{
i += 1ULL;
i ^= i >> 33ULL;
i *= 0xff51afd7ed558ccdULL;
i ^= i >> 33ULL;
i *= 0xc4ceb9fe1a85ec53ULL;
i ^= i >> 33ULL;
return i;
}
Here the input argument i is a small integer, for example an element of {0, 1, ..., 1000}. The output looks random:
i fasthash(i) decimal: fasthash(i) hex:
0 12994781566227106604 0xB456BCFC34C2CB2C
1 4233148493373801447 0x3ABF2A20650683E7
2 815575690806614222 0x0B5181C509F8D8CE
3 5156626420896634997 0x47900468A8F01875
... ... ...
There is no linear correlation between subsequent elements of the series:
The range of both axes is 0..2^64-1
Why not use an existing hash function, such as MurmurHash3 with a 64-bit finalizer? According to the author, the function takes tens of CPU cycles per key on current Intel hardware.
Given: input i in the range of 0 to 1,000.
const MaxInt which is the maximum value that cna be contained in a 64 bit int. (you did not say if it is signed or unsigned; 2^64 = 18446744073709551616 )
and a function rand() that returns a value between 0 and 1 (most languages have such a function)
compute hashvalue = i * rand() * ( MaxInt / 1000 )
1,000 * 1,000 = 1,000,000. That fits well within an Int32.
Subtract the low bound of your range, from the number.
Square it, and use it as a direct subscript into some sort of bitmap.

How do I detect overflow while multiplying two 2's complement integers?

I want to multiply two numbers, and detect if there was an overflow. What is the simplest way to do that?
Multiplying two 32 bit numbers results in a 64 bit answer, two 8s give a 16, etc. binary multiplication is simply shifting and adding. so if you had say two 32 bit operands and bit 17 set in operand A and any of the bits above 15 or 16 set in operand b you will overflow a 32 bit result. bit 17 shifted left 16 is bit 33 added to a 32.
So the question again is what are the size of your inputs and the size of your result, if the result is the same size then you have to find the most significant 1 of both operands add those bit locations if that result is bigger than your results space you will overflow.
EDIT
Yes multiplying two 3 bit numbers will result in either a 5 bit number or 6 bit number if there is a carry in the add. Likewise a 2 bit and 5 bit can result in 6 or 7 bits, etc. If the reason for this question posters question is to see if you have space in your result variable for an answer then this solution will work and is relatively fast for most languages on most processors. It can be significantly faster on some and significantly slower on others. It is generically fast (depending on how it is implemented of course) to just look at the number of bits in the operands. Doubling the size of the largest operand is a safe bet if you can do it within your language or processor. Divides are downright expensive (slow) and most processors dont have one much less at an arbitrary doubling of operand sizes. The fastest of course is to drop to assembler do the multiply and look at the overflow bit (or compare one of the result registers with zero). If your processor cant do the multiply in hardware then it is going to be slow no matter what you do. I am guessing that asm is not the right answer to this post despite being by far the fastest and has the most accurate overflow status.
binary makes multiplication trivial compared to decimal, for example take the binary numbers
0b100 *
0b100
Just like decimal math in school you (can) start with the least significant bit on the lower operand and multiply it against all the locations in the upper operand, except with binary there are only two choices you multiply by zero meaning you dont have to add to the result, or you multiply by one which means you just shift and add, no actual multiplication is necessary like you would have in decimal.
000 : 0 * 100
000 : 0 * 100
100 : 1 * 100
Add up the columns and the answer is 0b10000
Same as decimal math a 1 in the hundreds column means copy the top number and add two zeros, it works the same in any other base as well. So 0b100 times 0b110 is 0b1000, a one in the second column over so copy and add a zero + 0b10000 a one in the third column over so copy and add two zeros = 0b11000.
This leads to looking at the most significant bits in both numbers. 0b1xx * 0b1xx guarantees a 1xxxx is added to the answer, and that is the largest bit location in the add, no other single inputs to the final add have that column populated or a more significant column populated. From there you need only more bit in case the other bits being added up cause a carry.
Which happens with the worst case all ones times all ones, 0b111 * 0b111
0b00111 +
0b01110 +
0b11100
This causes a carry bit in the addition resulting in 0b110001. 6 bits. a 3 bit operand times a 3 bit operand 3+3=6 6 bits worst case.
So size of the operands using the most significant bit (not the size of the registers holding the values) determines the worst case storage requirement.
Well, that is true assuming positive operands. If you consider some of these numbers to be negative it changes things but not by much.
Minus 4 times 5, 0b1111...111100 * 0b0000....000101 = -20 or 0b1111..11101100
it takes 4 bits to represent a minus 4 and 4 bits to represent a positive 5 (dont forget your sign bit). Our result required 6 bits if you stripped off all the sign bits.
Lets look at the 4 bit corner cases
-8 * 7 = -56
0b1000 * 0b0111 = 0b1001000
-1 * 7 = -7 = 0b1001
-8 * -8 = 64 = 0b01000000
-1 * -1 = 2 = 0b010
-1 * -8 = 8 = 0b01000
7 * 7 = 49 = 0b0110001
Lets say we count positive numbers as the most significant 1 plus one and negative the most significant 0 plus one.
-8 * 7 is 4+4=8 bits actual 7
-1 * 7 is 1+4=5 bits, actual 4 bits
-8 * -8 is 4+4=8 bits, actual 8 bits
-1 * -1 is 1+1=2 bits, actual 3 bits
-1 * -8 is 1+4=5 bits, actual 5 bits
7 * 7 is 4+4=8 bits, actual 7 bits.
So this rule works, with the exception of -1 * -1, you can see that I called a minus one one bit, for the plus one thing find the zero plus one. Anyway, I argue that if this were a 4 bit * 4 bit machine as defined, you would have 4 bits of result at least and I interpret the question as how may more than 4 bits do I need to safely store the answer. So this rule serves to answer that question for 2s complement math.
If your question was to accurately determine overflow and then speed is secondary, then, well it is going to be really really slow for some systems, for every multiply you do. If this is the question you are asking, to get some of the speed back you need to tune it a little better for the language and/or processor. Double up the biggest operand, if you can, and check for non-zero bits above the result size, or use a divide and compare. If you cant double the operand sizes, divide and compare. Check for zero before the divide.
Actually your question doesnt specify what size of overflow you are talking about either. Good old 8086 16 bit times 16 bit gives a 32 bit result (hardware), it can never overflow. What about some of the ARMs that have a multiply, 32 bit times 32 bit, 32 bit result, easy to overflow. What is the size of your operands for this question, are they the same size or are they double the input size? Are you willing to perform multiplies that the hardware cannot do (without overflowing)? Are you writing a compiler library and trying to determine if you can feed the operands to the hardware for speed or if you have to perform the math without a hardware multiply. Which is the kind of thing you get if you cast up the operands, the compiler library will try to cast the operands back down before doing the multiply, depending on the compiler and its library of course. And it will use the count the bit trick determine to use the hardware multiply or a software one.
My goal here was to show how binary multiply works in a digestible form so you can see how much maximum storage you need by finding the location of a single bit in each operand. Now how fast you can find that bit in each operand is the trick. If you were looking for minimum storage requirements not maximum that is a different story because involves every single one of the significant bits in both operands not just one bit per operand, you have to do the multiply to determine minimum storage. If you dont care about maximum or minimum storage you have to just do the multiply and look for non zeros above your defined overflow limit or use a divide if you have the time or hardware.
Your tags imply you are not interested in floating point, floating point is a completely different beast, you cannot apply any of these fixed point rules to floating point, they DO NOT work.
Check if one is less than a maximum value divided by the other. (All values are taken as absolute).
2's complementness hardly has anything to do with it, since the multiplication overflows if x*(2n - x)>2M, which is equal to (x*2n - x2)>2M, or x2 < (x*2n - 2M), so you'll have to compare overflowing numbers anyway (x2 may overflow, while result may not).
If your number are not from the largest integral data type, then you might just cast them up, multiply and compare with the maximum of the number's original type. E.g. in Java, when multiplying two int, you can cast them to long and compare the result to Integer.MAX_VALUE or Integer.MIN_VALUE (depending on sign combination), before casting the result down to int.
If the type already is the largest, then check if one is less than the maximum value divided by the other. But do not take the absolute value! Instead you need separate comparison logic for each of the sign combinations negneg, pospos and posneg (negpos can obviously be reduced to posneg, and pospos might be reduced to neg*neg). First test for 0 arguments to allow safe divisions.
For actual code, see the Java source of MathUtils class of the commons-math 2, or ArithmeticUtils of commons-math 3. Look for public static long mulAndCheck(long a, long b). The case for positive a and b is
// check for positive overflow with positive a, positive b
if (a <= Long.MAX_VALUE / b) {
ret = a * b;
} else {
throw new ArithmeticException(msg);
}
I want to multiply two (2's complement) numbers, and detect if there was an overflow. What is the simplest way to do that?
Various languages do not specify valid checking for overflow after it occurs and so prior tests are required.
With some types, a wider integer type may not exist, so a general solution should limit itself to a single type.
The below (Ref) only requires compares and known limits to the integer range. It returns 1 if a product overflow will occur, else 0.
int is_undefined_mult1(int a, int b) {
if (a > 0) {
if (b > 0) {
return a > INT_MAX / b; // a positive, b positive
}
return b < INT_MIN / a; // a positive, b not positive
}
if (b > 0) {
return a < INT_MIN / b; // a not positive, b positive
}
return a != 0 && b < INT_MAX / a; // a not positive, b not positive
}
Is this the simplest way?
Perhaps, yet it is complete and handle all cases known to me - including rare non-2's complement.
Alternatives to Pavel Shved's solution ...
If your language of choice is assembler, then you should be able to check the overflow flag. If not, you could write a custom assembler routine that sets a variable if the overflow flag was set.
If this is not acceptable, you can find the most signficant set bit of both values (absolutes). If the sum exceeds the number of bits in the integer (or unsigned) then you will have an overflow if they are multiplied together.
Hope this helps.
In C, here's some maturely optimized code that handles the full range of corner cases:
int
would_mul_exceed_int(int a, int b) {
int product_bits;
if (a == 0 || b == 0 || a == 1 || b == 1) return (0); /* always okay */
if (a == INT_MIN || b == INT_MIN) return (1); /* always underflow */
a = ABS(a);
b = ABS(b);
product_bits = significant_bits_uint((unsigned)a);
product_bits += significant_bits_uint((unsigned)b);
if (product_bits == BITS(int)) { /* cases where the more expensive test is required */
return (a > INT_MAX / b); /* remember that IDIV and similar are very slow (dozens - hundreds of cycles) compared to bit shifts, adds */
}
return (product_bits > BITS(int));
}
Full example with test cases here
The benefit of the above approach is it doesn't require casting up to a larger type, so the approach could work on larger integer types.

Resources