Im using the GCC Compiler for Tricore and PPC since a while.
On the PPC side, i didn't figured out until now, how i can define TOC & SDA addresses for the compiler to use to shorten the code.
Im adding code into some existing binary, which is using TOC 0x5C9FF0 (%R12) and SDA 0x7FFFF0 (%R13).
If im writing my code in the linker script it works perfectly fine if i am writing it in asm directly like this, to access an variable via SDA:
But in "normal" code like this:
It will load "epc_blink_timer", which is defined #0x806F10 (in reach of SDA), it will access it via an lis/lhz combination. This takes a lot of space, which i don't have, and also unnecessary CPU usage:
Is there any way to define TOC/SDA, or at least SDA somewhere in my linker files, and tell the compiler to use them?
I literally tried it since years, but don't had luck so far.
Br. Chris
I'm trying to use gprof (I have to use gprof - no other option is available) when I get the flat file the result is empty even though everything works fine.
By the way, the code is in c so I'm using gcc.
Result:
Each sample counts as 0.01 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
What can I do to fix this? Thank you.
You can use gcc -no-pie option as a workaround.
My code doesn't depend on sm level. I can build it with sm10, If I want. But when I tried to build it with 1.3 instead of 2.0, as I did it before, I got x1.25 performance with no code changes!
sm20 -> 35ms
sm13 -> 25ms
After that gorgeous results, I tried to box/unbox every option in project settings->CUDA settings->all :) I guess, I found the stuff, which made that awesome speed:
If I use sm13 with "no fast math generation" (further fm - fast
math), I have 25ms
If I use sm13 with fm, I have 25ms
sm20 without fm = 35ms
sm20 with fm = 25ms (that is the same result)
Why is this so? Maybe sm13 forces using hardware maths, but sm20 not? Or it is only coincidence, and the latter sm level have lower performance, refer to lower sm level programs?
In addition to compiling in release mode, as pointed out by #Robert Crovella, you should also consider that when you target sm_13 the compiler is able to simplify some of the floating point maths. sm_20 and later supports precise division, precise square root, and denormals by default.
You can try disabling these features with the command line options -ftz=true -prec-div=false -prec-sqrt=false. See the best practices guide for more information.
I was looking up the pypy project (Python in Python), and started pondering the issue of what is running the outer layer of python? Surely, I conjectured, it can't be as the old saying goes "turtles all the way down"! Afterall, python is not valid x86 assembly!
Soon I remembered the concept of bootstrapping, and looked up compiler bootstrapping. "Ok", I thought, "so it can be either written in a different language or hand compiled from assembly". In the interest of performance, I'm sure C compilers are just built up from assembly.
This is all well, but the question still remains, how does the computer get that assembly file?!
Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?
Can someone explain this to me?
Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?
I understand what you're asking... what would happen if we had no C compiler and had to start from scratch?
The answer is you'd have to start from assembly or hardware. That is, you can either build a compiler in software or hardware. If there were no compilers in the whole world, these days you could probably do it faster in assembly; however, back in the day I believe compilers were in fact dedicated pieces of hardware. The wikipedia article is somewhat short and doesn't back me up on that, but never mind.
The next question I guess is what happens today? Well, those compiler writers have been busy writing portable C for years, so the compiler should be able to compile itself. It's worth discussing on a very high level what compilation is. Basically, you take a set of statements and produce assembly from them. That's it. Well, it's actually more complicated than that - you can do all sorts of things with lexers and parsers and I only understand a small subset of it, but essentially, you're looking to map C to assembly.
Under normal operation, the compiler produces assembly code matching your platform, but it doesn't have to. It can produce assembly code for any platform you like, provided it knows how to. So the first step in making C work on your platform is to create a target in an existing compiler, start adding instructions and get basic code working.
Once this is done, in theory, you can now cross compile from one platform to another. The next stages are: building a kernel, bootloader and some basic userland utilities for that platform.
Then, you can have a go at compiling the compiler for that platform (once you've got a working userland and everything you need to run the build process). If that succeeds, you've got basic utilities, a working kernel, userland and a compiler system. You're now well on your way.
Note that in the process of porting the compiler, you probably needed to write an assembler and linker for that platform too. To keep the description simple, I omitted them.
If this is of interest, Linux from Scratch is an interesting read. It doesn't tell you how to create a new target from scratch (which is significantly non trivial) - it assumes you're going to build for an existing known target, but it does show you how you cross compile the essentials and begin building up the system.
Python does not actually assemble to assembly. For a start, the running python program keeps track of counts of references to objects, something that a cpu won't do for you. However, the concept of instruction-based code is at the heart of Python too. Have a play with this:
>>> def hello(x, y, z, q):
... print "Hello, world"
... q()
... return x+y+z
...
>>> import dis
dis.dis(hello)
2 0 LOAD_CONST 1 ('Hello, world')
3 PRINT_ITEM
4 PRINT_NEWLINE
3 5 LOAD_FAST 3 (q)
8 CALL_FUNCTION 0
11 POP_TOP
4 12 LOAD_FAST 0 (x)
15 LOAD_FAST 1 (y)
18 BINARY_ADD
19 LOAD_FAST 2 (z)
22 BINARY_ADD
23 RETURN_VALUE
There you can see how Python thinks of the code you entered. This is python bytecode, i.e. the assembly language of python. It effectively has its own "instruction set" if you like for implementing the language. This is the concept of a virtual machine.
Java has exactly the same kind of idea. I took a class function and ran javap -c class to get this:
invalid.site.ningefingers.main:();
Code:
0: aload_0
1: invokespecial #1; //Method java/lang/Object."<init>":()V
4: return
public static void main(java.lang.String[]);
Code:
0: iconst_0
1: istore_1
2: iconst_0
3: istore_1
4: iload_1
5: aload_0
6: arraylength
7: if_icmpge 57
10: getstatic #2;
13: new #3;
16: dup
17: invokespecial #4;
20: ldc #5;
22: invokevirtual #6;
25: iload_1
26: invokevirtual #7;
//.......
}
I take it you get the idea. These are the assembly languages of the python and java worlds, i.e. how the python interpreter and java compiler think respectively.
Something else that would be worth reading up on is JonesForth. This is both a working forth interpreter and a tutorial and I can't recommend it enough for thinking about "how things get executed" and how you write a simple, lightweight language.
In the interest of performance, I'm sure C compilers are just built up from assembly.
C compilers are, nowadays, (almost?) completely written in C (or higher-level languages - Clang is C++, for instance). Compilers gain little to nothing from including hand-written assembly code. The things that take most time are as slow as they are because they solve very hard problems, where "hard" means "big computational complexity" - rewriting in assembly brings at most a constant speedup, but those don't really matter anymore at that level.
Also, most compilers want high portability, so architecture-specific tricks in the front and middle end are out of question (and in the backends, they' not desirable either, because they may break cross-compilation).
Say I buy a new cpu with nothing on it. During the first operation I wish to install an OS, which runs C. What runs the C compiler? Is there a miniature C compiler in the BIOS?
When you're installing an OS, there's (usually) no C compiler run. The setup CD is full of readily-compiled binaries for that architecture. If there's a C compiler included (as it's the case with many Linux distros), that's an already-compiled exectable too. And those distros that make you build your own kernel etc. also have at least one executable included - the compiler. That is, of course, unless you have to compile your own kernel on an existing installation of anything with a C compiler.
If by "new CPU" you mean a new architecture that isn't backwards-compatible to anything that's yet supported, self-hosting compilers can follow the usual porting procedure: First write a backend for that new target, then compile yourself for it, and suddenly you got a mature compiler with a battle-hardened (compiled a whole compiler) native backend on the new platform.
If you buy a new machine with a pre-installed OS, it doesn't even need to include a compiler anywhere, because all the executable code has been compiled on some other machine, by whoever provides the OS - your machine doesn't need to compile anything itself.
How do you get to this point if you have a completely new CPU architecture? In this case, you would probably start by writing a new code generation back-end for your new CPU architecture (the "target") for an existing C compiler that runs on some other platform (the "host") - a cross-compiler.
Once your cross-compiler (running on the host) works well enough to generate a correct compiler (and necessary libraries, etc.) that will run on the target, then you can compile the compiler with itself on the target platform, and end up with a target-native compiler, which runs on the target and generates code which runs on the target.
It's the same principle with a new language: you have to write code in an existing language that you do have a toolchain for, which will compile your new language into something that you can work with (let's call this the "bootstrap compiler"). Once you get this working well enough, you can write a compiler in your new language (the "real compiler"), and then compile the real compiler with the bootstrap compiler. At this point you're writing the compiler for your new language in the new language itself, and your language is said to be "self-hosting".
I try to compile an old program (which was compiled by cc) using gcc. In the makefile there is one line like this:
CFLAGS = -O2 -Olimit 2000 -w
There is no '-Olimit 2000' in gcc. I am wondering what does it really mean. Whether it is safe to just delete this option when using gcc.
As far as I can tell, this was only supported by IRIX's C compiler. I can't even find a solid reference as to what it was used for. Since it doesn't do anything with GCC, its definitely safe to remove it.
A little more detail, it was used to disable optimization on routines that were larger than the "Olimit". This limit is to make it so the amount of time doing optimization is limited. If you specify 0 for the Olimit, it means an "infinite Olimit" and will optimization every routine. Here's a man page for MIPSpro: http://cimss.ssec.wisc.edu/~gumley/modis/old/mips_64.pdf