Im using the GCC Compiler for Tricore and PPC since a while.
On the PPC side, i didn't figured out until now, how i can define TOC & SDA addresses for the compiler to use to shorten the code.
Im adding code into some existing binary, which is using TOC 0x5C9FF0 (%R12) and SDA 0x7FFFF0 (%R13).
If im writing my code in the linker script it works perfectly fine if i am writing it in asm directly like this, to access an variable via SDA:
But in "normal" code like this:
It will load "epc_blink_timer", which is defined #0x806F10 (in reach of SDA), it will access it via an lis/lhz combination. This takes a lot of space, which i don't have, and also unnecessary CPU usage:
Is there any way to define TOC/SDA, or at least SDA somewhere in my linker files, and tell the compiler to use them?
I literally tried it since years, but don't had luck so far.
Br. Chris
Related
I've read this tutorial
I could follow the guide and run the code. but I have questions.
1) Why do we need both load-address and run-time address. As I understand it is because we have put .data at flash too; so why we don't run app there, but need start-up code to copy it into RAM?
http://www.bravegnu.org/gnu-eprog/c-startup.html
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
arm-none-eabi-gcc -nostdlib -o sum_array.elf sum_array.c
Many thanks
Your first question was answered in the guide.
When you load a program on an operating system your .data section, basically non-zero globals, are loaded from the "binary" into the right offset in memory for you, so that when your program starts those memory locations that represent your variables have those values.
unsigned int x=5;
unsigned int y;
As a C programmer you write the above code and you expect x to be 5 when you first start using it yes? Well, if are booting from flash, bare metal, you dont have an operating system to copy that value into ram for you, somebody has to do it. Further all of the .data stuff has to be in flash, that number 5 has to be somewhere in flash so that it can be copied to ram. So you need a flash address for it and a ram address for it. Two addresses for the same thing.
And that begins to answer your second question, for every line of C code you write you assume things like for example that any function can call any other function. You would like to be able to call functions yes? And you would like to be able to have local variables, and you would like the variable x above to be 5 and you might assume that y will be zero, although, thankfully, compilers are starting to warn about that. The startup code at a minimum for generic C sets up the stack pointer, which allows you to call other functions and have local variables and have functions more than one or two lines of code long, it zeros the .bss so that the y variable above is zero and it copies the value 5 over to ram so that x is ready to go when the code your entry point C function is run.
If you dont have an operating system then you have to have code to do this, and yes, there are many many many sandboxes and toolchains that are setup for various platforms that already have the startup and linker script so that you can just
gcc -O myprog.elf myprog.c
Now that doesnt mean you can make system calls without a...system...printf, fopen, etc. But if you download one of these toolchains it does mean that you dont actually have to write the linker script nor the bootstrap.
But it is still valuable information, note that the startup code and linker script are required for operating system based programs too, it is just that native compilers for your operating system assume you are going to mostly write programs for that operating system, and as a result they provide a linker script and startup code in that toolchain.
1) The .data section contains variables. Variables are, well, variable -- they change at run time. The variables need to be in RAM so that they can be easily changed at run time. Flash, unlike RAM, is not easily changed at run time. The flash contains the initial values of the variables in the .data section. The startup code copies the .data section from flash to RAM to initialize the run-time variables in RAM.
2) Linker-script: The object code created by your compiler has not been located into the microcontroller's memory map. This is the job of the linker and that is why you need a linker script. The linker script is input to the linker and provides some instructions on the location and extent of the system's memory.
Startup code: Your C program that begins at main does not run in a vacuum but makes some assumptions about the environment. For example, it assumes that the initialized variables are already initialized before main executes. The startup code is necessary to put in place all the things that are assumed to be in place when main executes (i.e., the "run-time environment"). The stack pointer is another example of something that gets initialized in the startup code, before main executes. And if you are using C++ then the constructors of static objects are called from the startup code, before main executes.
1) Why do we need both load-address and run-time address.
While it is in most cases possible to run code from memory mapped ROM, often code will execute faster from RAM. In some cases also there may be a much larger RAM that ROM and application code may compressed in ROM, so the executable code may not simply be copied from ROM also decompressed - allowing a much larger application than the available ROM.
In situations where the code is stored on non-memory mapped mass-storage media such as NAND flash, it cannot be executed directly in any case and must be loaded into RAM by some sort of bootloader.
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
The linker script defines the memory layout of you target and application. Since this tutorial is for bare-metal programming, there is no OS to handle that for you. Similarly the start-up code is required to at least set an initial stack-pointer, initialise static data, and jump to main. On an embedded system it is also necessary to initialise various hardware such as the PLL, memory controllers etc.
Id like to use Ada with Stm32F103 uc, but here is the problem - there is no build-in runtime system within GNAT 2016. There is another cortex-m3 uc by TI RTS included - zfp-lm3s, but seems like it needs some global updates, simple change of memory size/origin doesn't work.
So, there is some questions:
Does some body have RTS for stm32f103?
Is there any good books about low-level staff of cortex-m3 or other arm uc?
PS. Using zfp-lm3s rises this error, when i try to run program via GPS:
Loading section .text, size 0x140 lma 0x0
Load failed
The STM32F series is from STMicroelectronics, not TI, so the stm32f4 might seem to be a better starting point.
In particular, the clock code in bsp/setup_pll.adb should need only minor tweaking; use STM’s STM32CubeMX tool (written in Java) to find the magic numbers to set up the clock properly.
You will also find that the assembler code used in bsp/start*.S needs simplifying/porting to the Cortex-M3 part.
My Cortex GNAT Run Time Systems project includes an Arduino Due version (also Cortex-M3), which has startup code written entirely in Ada. I don’t suppose the rest of the code would help a lot, being based on FreeRTOS - you’d have to be very very careful about memory usage.
I stumbled upon this question while looking for a zfp runtime specific to the stm32l0xx boards. It doesn't look like one exists from what I can see, but I did stumble upon this guide to creating a new runtime from AdaCore, which might help anyone stuck with the same issue:
https://blog.adacore.com/porting-the-ada-runtime-to-a-new-arm-board
ELF executables have a fixed load address (0x804800 on 32-bit x86 Linux binaries, and 0x40000 on 64-bit x86_64 binaries).
I read the SO answers (e.g., this one) about the historical reasons for those specific addresses. What I still don't understand is why to use a fixed load address and not a randomized one (given some range to be randomized within)?
why to use a fixed load address and not a randomized one
Traditionally that's how executables worked. If you want a randomized load address, build a PIE binary (which is really a special case of shared library that has startup code in it) with -fPIE and link with -pie flags.
Building with -fPIE introduces runtime overhead, in some cases as bad as 10% performance degradation, which may not be tolerable if you have a large cluster or you need every last bit of performance.
not sure if I understood your question correct, but saying I did, that's sort-off a "legacy" / historical issue, ELF is the file format used by UNIX derived operating systems, both POSIX (IOS) and Unix-like (Linux).
and the elf format simply states that there must be some resolved and absolute virtual address that the code is loaded into and begins running from...
and simply that's how the file format is, and during to historical reasons that cant be changed... you couldn't just "throw" the executable in any memory address and have it run successfully, back in the 90's when the ELF format was introduced problems such as calling functions with virtual tables we're raised and it was decided that the elf format would have absolute addresses within it.
Also think about it, take a look at the elf format -https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
how would you design an OS executable-loader that would be able to handle an executable load it to ANY desired virtual address and have the code run successfully without actually having to change the binary itself... if you would like to do something like that you'd either need to vastly change the output compilers generate or the format itself, which again isn't possible
As time passed the requirement of position independent executing (PIE/PIC) has raised and shared objects we're introduced in order to allow that and ASLR
(Address Space Layout Randomization) - which means that the code could be thrown in any memory address and still be able to execute, that is simply implemented by making sure that all calls within the code itself are relative to the current address of the executed instruction, AND that when the shared object is loaded the OS loader would have to change some data within the binary given that the data changed is not executable instructions (R E) but actual data (RW, e.g .data segment), which also is implemented by calling functions from some "Jump tables" ( which would be changed at load time ) for example PLT / GOT.... those shared objects allow absolute randomization of the addresses the code is loaded to and if you want to execute some more "secure" code you'd have to compile it as a shared object and and dynamically link it and load time or run time..
( hope I've cleared some things out :) )
I want to write some inline ARM assembly in my C code. For this code, I need to use a register or two more than just the ones declared as inputs and outputs to the function. I know how to use the clobber list to tell GCC that I will be using some extra registers to do my computation.
However, I am sure that GCC enjoys the freedom to shuffle around which registers are used for what when optimizing. That is, I get the feeling it is a bad idea to use a fixed register for my computations.
What is the best way to use some extra register that is neither input nor output of my inline assembly, without using a fixed register?
P.S. I was thinking that using a dummy output variable might do the trick, but I'm not sure what kind of weird other effects that will have...
Ok, I've found a source that backs up the idea of using dummy outputs instead of hard registers:
4.8 Temporary registers:
People also sometimes erroneously use clobbers for temporary registers. The right way is
to make up a dummy output, and use “=r” or “=&r” depending on the permitted overlap
with the inputs. GCC allocates a register for the dummy value. The difference is that
GCC can pick a convenient register, so it has more flexibility.
from page 20 of this pdf.
For anyone who is interested in more info on inline assembly with GCC this website turned out to be very instructive.
I'd like to generate pseudo-random ARM instructions. Via assembler directives, I can tell gcc what mode I'm in, and it will complain if I try a set of opcodes and operands that's not legal in that mode, so it must have some internal listing of what can be done in which mode. Where does that live? Would it be easier to extract that info from LLVM?
Is this question "not even wrong"? Should I try a different approach entirely?
To answer my own question, this is actually really easy to do from arm.md and and constraints.md in gcc/config/arm/. I probably spent more time answering asking this question and answering comments for it than I did figuring this out. Turns out I just need to look for 'TARGET_THUMB1', until I get around to implementing thumb2.
For the ARM family the buck stops at the ARM ARM (ARM Architectural Reference Manual). There is an ARM instruction set section and a Thumb instruction set section. Within both each instruction tells you what generation (ARMvX where X is some number like 4 (arm7), or 5 (arm9 time frame) ,etc). Since the opcode and pseudo code is listed for each instruction you should be able to figure out what is a real instruction and, if any, are syntax to save typing on another (push and pop for example).
With the Cortex-m3 and thumb2 in particular you also need to look at the TRM (Technical Reference Manual) as well. ARM has, I forget the name, a universal syntax they are trying to use that should work on both Thumb and ARM. For example on an ARM you have three register instructions:
add r1,r1,r2
In thumb there are only two register operations
add r1,r2
The desire basically is to meet in the middle or I would say more accurately to encourage ARM assemblers to parse Thumb instructions and encode them with the equivalent ARM instruction without complaining. This may have started with thumb and not thumb2, I have always separated the two syntaxes in my code until recently (and I still generally use ARM syntax for ARM and Thumb for Thumb).
And then yes you have to see what the specific implementation of the assembler tool is, in your case binutils. And it sounds like you have found the binutils/gnu secret decoder ring.