I am using Go 1.14 with linux/riscv64 target, and I'm compiling a hello world where I am seeing this in the assembly:
1b078: 04813183 ld gp,72(sp)
1b07c: 00018003 lb zero,0(gp)
1b080: 00313423 sd gp,8(sp)
As you can see there is a load to zero from [GP+0], which should be an "exception or whatever" according to the specification:
Loads with a destination of x0 must still raise any exceptions and cause any other side effects even though the load value is discarded.
What exactly is going on here? Is the compiler producing erroneous output?
I don’t know anything about go on riscv but this is a common pattern.
The memory access only checks that [gp+0] is accessible and readable, without actually reading.
This is useful for cases like:
func f(a *[0x100001]byte) {
(*a)[0x100000] = 1;
}
The compiler must generate the following pseudo code:
check_not_null(a)
store(a + 0x100000, 1)
The null check can be implemented using the same construct that you’ve discovered, without branches.
Related
For code append(slice1, 1), Go compile will give error "append(...) evaluated but not used". And we have to use like slice1 = append(slice1,1) because append doesn't modify slice1 and it will return a new slice.
I think it is a good hint since this will prevent lots of bug since we often didn't know function like append will change original array or not. In JavaScript array1.push('item') will change array1 in place and return new length of the array.
I want to utilize this kind of code checking:
func appendStr(str string, tail string) string {
b := str + tail
return b
}
a := "a"
appendStr(a, "b")
But Go compiler didn't give error. So compiler do some special checking on append method? Since Go pass parameter by value, Compiler should know appendStr has no change to modify pass-in parameter.
append() is special because it's a built-in function, and the compiler does extra check on it. It is very rarely useful to not use the return value of append(), so the Go authors decided to make it a compile-time error if it is not used.
On the other hand, calling "ordinary" functions which have return values often have "side effects", and it's common to just call a function for its "side effects" and not use its return values. A very common example is fmt.Println(): you often print something to the console, and you rarely (if ever) check if that succeeds or how many bytes were actually written.
The language spec allows you to not use the return values of functions, so it's valid to do so and you can't force the compiler to make an error or warning out of it, you can't make the compiler to "mark" valid code with error.
See related question: Return map like 'ok' in Golang on normal functions
The way this is typically done in Go is by using an extra tool, a linter if you will. go vet is commonly used to point out things in the code that "don't look right" and which are probably bugs. It seems that you cannot make it check your own functions out-of-the-box, only things like fmt.Sprintf will be checked.
Another tool is errcheck which reports ignored errors.
You could fork one of these tools and insert your own function(s) there, then make everyone check their code before committing it to source control or check it automatically.
I've profiled my program with Valgrind and Callgrind and found that most of the time is spent in the nearbyint$fenv_access_off function.
I've found that it's a LLVM intrinsic, but which Rust language construct uses it? How can I avoid it?
Doing a search for nearbyint finds the related symbols nearbyintf32 and nearbyintf64. These are documented as returning the nearest integer to a floating point value. However, there appears to be no calls to that specific function.
fenv_access_off appears to be an OS X specific aspect of the math library.
The other thing in your trace is round. I can believe that round could use nearbyint. I also don't see any cases of round in the standard library that seem like they would occur in a tight loop.
Beyond this, anything is pure guessing.
I've reproduced it with:
fn main() {
let data:Vec<_> = (0..999999).map(|x|{
(x as f64).powf(2.2).round() as u8
}).collect();
}
so it seems as u8 is implemented using nearbyint.
It's the same speed as C uchar = round(pow(i, 2.2)), so I'll have to replace it with a good'ol lookup table…
For the following statement inside function func(), I'm trying to figure out the variable name (which is 'dictionary' in the example) that points to the malloc'ed memory region.
Void func() {
uint64_t * dictionary = (uint64_t *) malloc ( sizeof(uint64_t) * 128 );
}
The instrumented malloc() can record the start address and size of the allocation. However, no knowledge of variable 'dictionary' that will be assigned to, any features from the compilers side can help to solve this problem, without modifying the compiler to instrument such assignment statements?
One way I've been thinking is to use the feature that variable 'dictionary' and function 'malloc' is on one source code line or next to each other, the dwarf provides line information.
One thing you can do with Clang and LLVM is emit the code with debug information and then look for malloc calls. These will be assigned to LLVM values, which can be traced (when not compiled with optimizations, that is) to the original C/C++ source code via the debug information metadata.
I am working on a Jitter which is based on LLVM. I have a real issue with performance. I was reading a lot about this and I know it is a problem in LLVM. However, I am wondering if there are other bottlenecks. Hence, I want to use in my Jitter the same mechanism offers by -time-passes, but saving the result to a specific file. In this way, I can do some simple math like:
real_execution_time = total_time - time_passes
I added the option to the command line, but it does not work:
// Disable branch fold for accurate line numbers.
llvm_argv[arrayIndex++] = "-disable-branch-fold";
llvm_argv[arrayIndex++] = "-stats";
llvm_argv[arrayIndex++] = "-time-passes";
llvm_argv[arrayIndex++] = "-info-output-file";
llvm_argv[arrayIndex++] = "pepe.txt";
cl::ParseCommandLineOptions(arrayIndex, const_cast<char**>(llvm_argv));
Any solution?
Ok, I found the solution. I am publishing the solution because It may be useful for someone else.
Before any exit(code) in your program you must include a call to
llvm::llvm_shutdown();
This call flush the information to the file.
My problem was:
1 - Other threads emitted exit without the mentioned call.
2 - There is a fancy struct llvm::llvm_shutdown_obj with a destructor which call to the mentioned method. I had declared a variable in the main function as follow:
llvm::llvm_shutdown_obj X();
Everybody know that the compiler should call the destructor, but in this case it was no happening. The reason is that the variable was not used, so the compiler removed it.
No variable => No destructor => No flush to the file
AFAIK there are pthread functions that acts as memory barriers (e.g. here clarifications-on-full-memory-barriers-involved-by-pthread-mutexes). But what about compile-time barrier, i.e. is compiler (especially gcc) aware of this?
In other words - e.g. - is pthread_create() reason for gcc not to perform reordering?
For example in code:
a = 1;
pthread_create(...);
Is it certain that reordering will not take place?
What about invocations from different functions:
void fun(void) {
pthread_create(...);
...
}
a = 1;
fun();
Is fun() also compile time barrier (assuming pthread_create() is)?
What about functions in different translation units?
Please note that I am interested in general gcc and pthreads behavior scpecification, not necessarily x86-specific (various different embedded platforms in focus).
I am also not interested in other compilers/thread libraries behavior.
Because functions such as pthread_create() are external functions the compiler must ensure that any side effects that could be visible to an external function (such as a write to a global variable) must be done before calling the function. The compile couldn't reorder the write to a until after the function call in the first case) assuming a was global or otherwise potentially accessible externally).
This is behavior that is necessary for any C compiler, and really has little to do with threads.
However, if the variable a was a local variable, the compiler might be able to reorder it until after the function call (a might not even end up in memory at all for that matter), unless something like the address of a was taken and made available externally somehow (like passing it as the thread parameter).
For example:
int a;
void foo(void)
{
a = 1;
pthread_create(...); // the compiler can't reorder the write to `a` past
// the call to `pthread_create()`
// ...
}
void bar(void)
{
int b;
b = 1;
pthread_create(...); // `b` can be initialized after calling `pthread_create()`
// `b` might not ever even exist except as a something
// passed on the stack or in a register to `printf()`
printf( "%d\n", b);
}
I'm not sure if there's a document that outlines this in more detail - this is covered largely by C's 'as if' rule. In C99 that's in 5.1.2.3/3 "Program execution". C is specified by an abstract machine with sequence points where side effects must be complete, and programs must follow that abstract machine model except where the compiler can deduce that the side effects aren't needed.
In my foo() example above, the compiler would generally not be able to deduce that setting a = 1; isn't needed by pthread_create(), so the side effect of setting a to the value 1 must be completed before calling pthread_create(). Note that if there are compilers that perform global optimizations that can deduce that a isn't used elsewhere, they could delay or elide the assignment. However, in that case nothing else is using the side effect, so there would be no problem with that.