#include <iostream>
int main()
{
for (int i = 0; i < 4; ++i)
std::cout << i*5000000000 << std::endl;
}
getting a warning from gcc whenever i try to run this.
:-
warning: iteration 3u invokes undefined behavior [-Waggressive-loop-optimizations]
std::cout << i*5000000000 << std::endl;
Whats the cause of this error?
Signed integer overflow (as strictly speaking, there is no such thing as "unsigned integer overflow") means undefined behaviour.
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.I suspect that it's something like: (1) because every iteration with i of any value larger than 2 has undefined behavior -> (2) we can assume that i <= 2 for optimization purposes -> (3) the loop condition is always true -> (4) it's optimized away into an infinite loop.
What is going on is a case of strength reduction, more specifically, induction variable elimination. The compiler eliminates the multiplication by emitting code that instead increments i by 1e9 each iteration (and changing the loop condition accordingly). This is a perfectly valid optimization under the "as if" rule as this program could not observe the difference were it well-behaving. Alas, it's not, and the optimization "leaks"
Related
I have two functions to calculate CRC32:
1)
for (loop = 0u; loop < len; ++loop)
{
crc = lut[((uint8_t)(crc >> 24) ^ data[loop])] ^ (crc << 8u);
}
for (i = 0u; i < len; i++)
{
crc = lut[((uint32_t)data[i] ^ crc) & 0xFFu] ^ (crc >> 8u);
}
Both can calculate the same result but:
lookup table for second one has vales with different Endianess
After calculation result also has swapped endianess
The question is why there are two different implementations? Is there a specific name for calculation like in the second example?
They are clearly equivalent after byte-swapping the results of one of them, if the table is also byte swapped.
Normally I see those two forms of CRC calculation for different CRCs, because one is using a bit-reflected polynomial (the code with >>), and the other with a normal polynomial ("<<").
However I have not seen a case where someone took one of those and then byte-swapped the table and switched from "<<" to ">>" or vice versa. I don't think there is a name for that.
The application I might imagine for such a thing is that someone had to byte swap the result at the end in order to make it easier to put the CRC in some predefined format They then realized they could avoid the byte swap if instead they built that into the table and the direction of shifting when computing.
while(true) {
int x(0), y(0);
std::thread t0([&x, &y]() {
x=1;
y=3;
});
std::thread t1([&x, &y]() {
std::cout << "(" << y << ", " <<x <<")" << std::endl;
});
t0.join();
t1.join();
}
Firstly, I know that it is UB because of the data race.
But, I expected only the following outputs:
(3,1), (0,1), (0,0)
I was convinced that it was not possible to get (3,0), but I did. So I am confused- after all x86 doesn't allow StoreStore reordering.
So x = 1 should be globally visible before y = 3
I suppose that from theoretical point of view the output (3,0) is impossible because of the x86 memory model. I suppose that it appeared because of the UB. But I am not sure. Please explain.
What else besides StoreStore reordering could explain getting (3,0)?
You're writing in C++, which has a weak memory model. You didn't do anything to prevent reordering at compile-time.
If you look at the asm, you'll probably find that the stores happen in the opposite order from the source, and/or that the loads happen in the opposite order from what you expect.
The loads don't have any ordering in the source: the compiler can load x before y if it wants to, even if they were std::atomic types:
t2 <- x(0)
t1 -> x(1)
t1 -> y(3)
t2 <- y(3)
This isn't even "re"ordering, since there was no defined order in the first place:
std::cout << "(" << y << ", " <<x <<")" << std::endl; doesn't necessarily evaluate y before x. The << operator has left-to-right associativity, and operator overloading is just syntactic sugar for
op<<( op<<(op<<(y),x), endl); // omitting the string constants.
Since the order of evaluation of function arguments is undefined (even if we're talking about nested function calls), the compiler is free to evaluate x before evaluating op<<(y). IIRC, gcc often just evaluates right to left, matching the order of pushing args onto the stack if necessary. The answers on the linked question indicate that that's often the case. But of course that behaviour is in no way guaranteed by anything.
The order they're loaded is undefined even if they were std::atomic. I'm not sure if there's a sequence point between the evaluation of x and y. If not, then it would be the same as if you evaluated x+y: The compiler is free to evaluate the operands in any order because they're unsequenced. If there is a sequence point, then there is an order but it's undefined which order (i.e. they're indeterminately sequenced).
Slightly related: gcc doesn't reorder non-inline function calls in expression evaluation, to take advantage of the fact that C leaves the order of evaluation unspecified. I assume after inlining it does optimize better, but in this case you haven't given it any reason to favour loading y before x.
How to do it correctly
The key point is that it doesn't matter exactly why the compiler decided to reorder, just that it's allowed to. If you don't impose all the necessary ordering requirements, your code is buggy, full-stop. It doesn't matter if it happens to work with some compilers with some specific surrounding code; that just means it's a latent bug.
See http://en.cppreference.com/w/cpp/atomic/atomic for docs on how/why this works:
// Safe version, which should compile to the asm you expected.
while(true) {
int x(0); // should be atomic, too, because it can be read+written at the same time. You can use memory_order_relaxed, though.
std::atomic<int> y(0);
std::thread t0([&x, &y]() {
x=1;
// std::atomic_thread_fence(std::memory_order_release); // A StoreStore fence is an alternative to using a release-store
y.store(3, std::memory_order_release);
});
std::thread t1([&x, &y]() {
int tx, ty;
ty = y.load(std::memory_order_acquire);
// std::atomic_thread_fence(std::memory_order_acquire); // A LoadLoad fence is an alternative to using an acquire-load
tx = x;
std::cout << ty + tx << "\n"; // Don't use endl, we don't need to force a buffer flush here.
});
t0.join();
t1.join();
}
For Acquire/Release semantics to give you the ordering you want, the last store has to be the release-store, and the acquire-load has to be the first load. That's why I made y a std::atomic, even though you're setting x to 0 or 1 more like a flag.
If you don't want to use release/acquire, you could put a StoreStore fence between the stores and a LoadLoad fence between the loads. On x86, this would just prevent compile-time reordering, but on ARM you'd get a memory-barrier instruction. (Note that y still technically needs to be atomic to obey C's data-race rules, but you can use std::memory_order_relaxed on it.)
Actually, even with Release/Acquire ordering for y, x should be atomic as well. The load of x still happens even if we see y==0. So reading x in thread 2 is not synchronized with writing y in thread 1, so it's UB. In practice, int loads/stores on x86 (and most other architectures) are atomic. But remember that std::atomic implies other semantics, like the fact that the value can be changed asynchronously by other threads.
The hardware-reordering test could run a lot faster if you looped inside one thread storing i and -i or something, and looped inside the other thread checking that abs(y) is always >= abs(x). Creating and destroying two threads per test is a lot of overhead.
Of course, to get this right, you have to know how to use C to generate the asm you want (or write in asm directly).
In the following code(C language), on each iteration, 1 << n calculated again and again ??
and the overhead could be significant in competitive programming for the
larger inputs??
#define for(i,n) for(int i=0;i<(n);++i)
for(i,1<<n){
...
}
Your question is not very clear but this is not a programming issue, but a compiler optimisation issue.
In C , most of the compiler will be able to 'see' that n his modified or not inside the loop, and modify the condition.
So as a programmer, don't worry about this unless you have very specific constraint and conditions
So if you do this:
int n=10;
for(i=0;i<(1<<n);i++){
n=func(i)
}
1 < < n will be recomputed at each iteration, while in this case:
int n=10;
for(i=0;i<(1<<n);i++){
println(i)
}
It's higly probable that 1 < < n will be computed only once
At http://tour.golang.org/#14 they show an example where the number 1 is shifted by 64 bits. This of course would result in an overflow, but then it is subtracted by 1 and all is well. How does half of the expression result in a failure while the entire expression as whole work just fine?
Thoughts:
I would assume that the setting of the unsigned to a number larger than what it allows is what causes the explosion. It would seem that memory is allocated more loosely on the right hand side of the expression than on the left? Is this true?
The result of your expression is a (compile time) constant and the expression is therefore evaluated during compilation. The language specification mandates that
Constant expressions are always evaluated exactly; intermediate values
and the constants themselves may require precision significantly
larger than supported by any predeclared type in the language. The
following are legal declarations:
const Huge = 1 << 100 // Huge == 1267650600228229401496703205376 (untyped integer constant)
const Four int8 = Huge >> 98 // Four == 4 (type int8)
https://golang.org/ref/spec#Constant_expressions
That is because the Go compiler handles constant expressions as numeric constants. Contrary to the data-types that have to obey the law of range, storage bits and side-effects like overflow, numeric constants never lose precision.
Numeric constants only get deduced down to a data-type with limited precision and range at the moment when you assign them to a variable (which has a known type and thus a numeric range and a defined way to store the number into bits). You can also force them to get deduced to a ordinary data-type by using them as part of a equation that contains non Numeric constant types.
Confused? So was I..
Here is a longer write-up on the data-types and how constants are handled: http://www.goinggo.net/2014/04/introduction-to-numeric-constants-in-go.html?m=1
I decided to try it. For reasons that are subtle, executing the expression as a constant expression (1 << 64 -1) or piece by piece at run time gives the same answer. This is because of 2 different mechanisms. A constant expression is fully evaluated with infinite precision before being assigned to the variable. The step by step execution explicitly allows overflows and underflows through addition, subtraction and shift operations, and thus the result is the same.
See https://golang.org/ref/spec#Integer_overflow for a description of how integers are supposed to overflow.
However, doing it in groups, ie 1<<64 and then -1 causes overflow errors!
You can make a variable overflow though arithmetic, but you can not assign an overflow to a variable.
Try it yourself. Paste the code below into http://try.golang.org/
This one works:
// You can edit this code!
// Click here and start typing.
package main
import "fmt"
func main() {
var MaxInt uint64 = 1
MaxInt = MaxInt << 64
MaxInt = MaxInt - 1
fmt.Println("%d",MaxInt)
}
This one doesn't work:
// You can edit this code!
// Click here and start typing.
package main
import "fmt"
func main() {
var MaxInt uint64 = 1 << 64
MaxInt = MaxInt - 1
fmt.Println("%d",MaxInt)
}
Actually 1 << 64 - 1 does not always result in a left shift of 64 and minus 1. The - operator is applied before the << operator in most languages, at least in any I know (like C++, Java, ...). Therefore 1 << 64 - 1 <=> 1 << 63.
But Go behaves different: https://golang.org/ref/spec#Operator_precedence
The - operator comes after the << operator.
The result of a 64 Bit left shift is based on the data type. It's just like adding 64 of 0 on the right, while cutting any Bit that extend the data type on the left side. In some languages an overflow may be valid, in some other not.
Compilers may also behave different based on the interpretion when your shift is greater or equal than the actual data type size. I know that the Java compiler will reduce the actual shift as often by size of the data type until it's smaller than the size of the data byte.
Sounds difficult, but here and easy example for a long data type with 64 Bit size.
so i << 64 <=> i << 0 <=> i
or i << 65 <=> i << 1
or i << 130 <=> i << 66 <=> i << 2.
As said, this may differ with different compilers / languages. There is never a solid answer without refering to a certain language.
For learning I would suggest a more common language than Go, maybe like something from the C family.
var MaxInt uint64 = 1<<64 - 1
BITSHIFTING
In binary, 1, with 64 0s after it (10000...).
Same as 2^64.
This maxes out 64-bit unsigned integer (positive numbers only).
So, we subtract 1 to prevent the error.
Therefore, this is the maximum value unsigned integer we can write.
You can see here go constant int actually is a bigInt, so 1 << 63 won't overflow. But var a int64 = 1 << 63 will overflow, because you are a assign a value bigger than int64.
This code is working perfectly until 100000 but if you input 1000000 it is starting to give the error C++ 0xC0000094: Integer division by zero. I am sure it is something about floating points. I tried all the combinations of (/fp:precise), (/fp:strict), (/fp:except) and (/fp:except-) but had no positive result.
#include "stdafx.h"
#include "time.h"
#include "math.h"
#include "iostream"
#define unlikely(x)(x)
int main()
{
using namespace std;
begin:
int k;
cout<<"Please enter the nth prime you want: ";
cin>>k;
int cloc=clock();
int*p;p=new int [k];
int i,j,v,n=0;
for(p[0]=2,i=3;n<k-1;i+=2)
for(j=1;unlikely((v=p[j],pow(v,2)>i))?!(p[++n]=i):(i%v);++j);
cout <<"The "<<k<<"th prime is "<<p[n]<<"\nIt took me "<<clock()-cloc<<" milliseconds to find your prime.\n";
goto begin;
}
The code displayed in the question does not initialize p[1] or assign a value to it. In the for loop that sets j=1, p[j] is used in an assignment to v. The results in an unknown value for v. Apparently, it happens to be zero, which causes a division by zero in the expression i%v.
As this code is undocumented, poorly structured, and unreadable, the proper solution is to discard it and start from scratch.
Floating point has no bearing on the problem, although the use of pow(v, 2) to calculate v2 is a poor choice; v*v would serve better. However, some systems print the misleading message “Floating exception” when an integer division by zero occurs. In spite of the message, this is an error in an integer operation.