Understanding go tool compile and link commands - go

I understand that there are three steps involved in converting a high level language code to machine language or executable code namely - compiling, assembling, and linking.
As per go docs go tool compile does the following -
It then writes a single object file named for the basename of the first source file with a .o suffix
So, the final object file must contain the machine language code (after compiling and assembling are run) of each file. If I pass go tool compile -S on the go file, it shows assembly language go generates.
After this when I run go tool link on the object file, it must link all required object files (if multiple) and then generate final machine language code (based on GOOS and GOARCH). It generates a file a.out
Have few basic questions here -
How do I know which variables and when are they going to be allocated memory in stack and heap? Does it matter if I generate executable for one machine and run on another with different architecture?
My test program
package main
func f(a *int, b *int) *int {
var c = (*a + *b);
var d = c;
return &d;
}
func main() {
var a = 2;
var b = 6;
f(&a,&b);
}
Result of go tool compile -m -l test.go
test.go:6: moved to heap: d
test.go:7: &d escapes to heap
test.go:3: f a does not escape
test.go:3: f b does not escape
test.go:14: main &a does not escape
test.go:14: main &b does not escape

In which step is the memory getting allocated?
It depends, some during linking, some during compiling, most during runtime (And some during loading).
How do I know which variables and when are they going to be allocated memory in stack and heap?
Ad hoc you do not know at all. The compiler decides this. If the compiler can prove that a variable does not escape it might keep it on the stack. Google for "golang escape analysis". There is a flag -m which makes the compiler output his decisions if you are interested in it.
Does it matter if I generate executable for one machine and run on another with different architecture?
No, but simply because this does not work at all: Executables are tied to architecture and won't run on a different one.
It seems you are mixing up compilation/linking and memory allocation. The later is pretty different from the former two. (Technically your linked program may contain memory and during loading it might get even more, but this is highly technical and architecture specific and really nothing to be concerned of).

Related

how to understand the relation between uintptr and struct?

I have learned code like the following
func str2bytes(s string) []byte {
x := (*[2]uintptr)(unsafe.Pointer(&s))
h := [3]uintptr{x[0], x[1], x[1]}
return *(*[]byte)(unsafe.Pointer(&h))
}
this function is to change string to []byte without the stage copying data.
I try to convert num to reverseNum
type Num struct {
name int8
value int8
}
type ReverseNum struct {
value int8
name int8
}
func main() {
n := Num{100, 10}
z := (*[2]uintptr)(unsafe.Pointer(&n))
h := [2]uintptr{z[1], z[0]}
fmt.Println(*(*ReverseNum)(unsafe.Pointer(&h))) // print result is {0, 0}
}
this code doesn't get the result I want.
Can anybody tell my about
That's too compilcated.
A simpler
package main
import (
"fmt"
"unsafe"
)
type Num struct {
name int8
value int8
}
type ReverseNum struct {
value int8
name int8
}
func main() {
n := Num{name: 42, value: 12}
p := (*ReverseNum)(unsafe.Pointer(&n))
fmt.Println(p.value, p.name)
}
outputs "42, 12".
But the real question is why on Earth would you want to go for such trickery instead of copying two freaking bytes which is done instantly on any sensible CPU Go programs run on?
Another problem with your approach is that IIUC nothing in the Go language specification guarantees that two types which have seemingly identical fields must have identical memory layouts. I beleive they should on most implementations but I do not think they are required to do that.
Also consider that seemingly innocuous things like also having an extra field (even of type struct{}!) in your data type may do interesting things to memory layouts of the variables of those types, so it may be outright dangerous to assume you may reinterpret memory of Go variables the way you want.
... I just want to learn about the principle behind the package unsafe.
It's an escape hatch.
All strongly-typed but compiled languages have a basic problem: the actual machines on which the compiled programs will run do not have the same typing system as the compiler.1 That is, the machine itself probably has a linear address space where bytes are assembled into machine words that are grouped into pages, and so on. The operating system may also provide access at, say, page granularity: if you need more memory, the OS will give you one page—4096 bytes, or 8192 bytes, or 65536 bytes, or whatever the page size is—of additional memory at a time.
There are many ways to attack this problem. For instance, one can write code directly in machine (or assembly) language, using the hardware's instruction set, to talk to the OS to achieve OS-level things. This code can then talk to the compiled program, acting as the go-between. If the compiled program needs to allocate a 40-byte data structure, this machine-level code can figure out how to do that within the strictures of the OS's page-size allocations.
But writing machine code is difficult and time-consuming. That's precisely why we have high-level languages and compilers in the first place. What if we had a way to, within the high-level language, violate the normal rules imposed by the language? By violating specific requirements in specific ways, carefully coordinating those ways with all other code that also violates those requirements, we can, in code we keep away from the usual application programming, write much of our memory-management, process-management, and so on in our high-level language.
In other words, we can use unsafe (or something similar in other languages) to deliberately break the type-safety provided by our high level language. When we do this—when we break the rules—we must know what all the rules are, and that our specific violations here will function correctly when combined with all the normal code that does obey the normal rules and when combined with all the special, unsafe code that breaks the rules.
This often requires help from the compiler itself. If you inspect the runtime source distributed with Go, you will find routines with annotations like go:noescape, go:noinline, go:nosplit, and go:nowritebarrier. You need to know when and why these are required if you are going to make much use of some of the escape-hatch programming.
A few of the simpler uses, such as tricks to gain access to string or slice headers, are ... well, they are still unsafe, but they are unsafe in more-predictable ways and do not require this kind of close coordination with the compiler itself.
To understand how, when, and why they work, you need to understand how the compiler and runtime allocate and work with strings and slices, and in some cases, how memory is laid out on the hardware, and some of the rules about Go's garbage collector. In particular, the GC code is aware of unsafe.Pointer but not of uintptr. Much of this is pretty tricky: see, e.g., https://utcc.utoronto.ca/~cks/space/blog/programming/GoUintptrVsUnsafePointer and the link to https://github.com/golang/go/issues/19135, in which writing nil to a Go pointer value caused Go's garbage collector to complain, because the write caused the GC to inspect the previously stored value, which was invalid.
1See this Wikipedia article on the Intel 432 for a notable attempt at designing hardware to run compiled high level languages. There have been others in the past as well, often with the same fate, though some IBM projects have been more successful.

How does Go make system calls?

As far as I know, in CPython, open() and read() - the API to read a file is written in C code. The C code probably calls some C library which knows how to make system call.
What about a language such as Go? Isn't Go itself now written in Go? Does Go call C libraries behind the scenes?
The short answer is "it depends".
Go compiles for multiple combinations of H/W and OS, and they all have different approaches to how syscalls are to be made when working with them.
For instance, Solaris does not provide a stable supported set of syscalls, so they go through the systems libc — just as required by the vendor.
Windows does support a rather stable set of syscalls but it is defined as a C API provided by a set of standard DLLs.
The functions exposed by those DLLs are mostly shims which use a single "make a syscall by number" function, but these numbers are not documented and are different between the kernel flavours and releases (perhaps, intentionally).
Linux does provide a stable and documented set of numbered syscalls and hence there Go just calls the kernel directly.
Now keep in mind that for Go to "call the kernel directly" means following the so-called ABI of the H/W and OS combo. For instance, on modern Linux on amd64 making a syscall requires filling a set of CPU registers with certain values, doing some other arrangements and then issuing the SYSENTER CPU instruction.
On Windows, you have to use its native calling convention (which is stdcall, not cdecl).
Yes go is now written in go. But, you don't need C to make syscalls.
An important thing to call out is that syscalls aren't "written in C." You can make syscalls from C on Unix because of <unistd.h>. In particular, how Linux defines this header is a little convoluted, but you can see from this file the general idea. Syscalls are defined with a name and a number. When you call read for example, what really happens behind the scenes is the parameters are setup in the proper registers/memory (linux expects the syscall number in eax) followed by the instruction syscall which fires interrupt 0x80. The OS has already setup the proper interrupt handlers that will receive this interrupt and the OS goes about doing whatever is needed for that syscall. So, you don't need something written in C (or a standard library for that matter) to make syscalls. You just need to understand the call ABI and know the interrupt numbers.
However, as #retgits points out golang's approach is to piggyback off the fact that libc already has all of the logic for handling syscalls. mksyscall.go is a CLI script that parses these libc files to extract the necessary information.
You can actually trace the life of a syscall if you compile a go script like:
package main
import (
"syscall"
)
func main() {
var buf []byte
syscall.Read(9, buf)
}
Run objdump -D on the resulting binary. The go runtime is rather large, so your best bet is to find the main function, see where it calls syscall.Read and then search for the offsets from there: syscall.Read calls syscall.syscall, syscall.syscall calls runtime.libcCall (which switches from the go ABI to C ABI compatibility so that arguments are located where the OS expects--you can see this in runtime, for darwin for example), runtime.libcCall calls runtime.asmcgocall, etc.
For extra fun, run that binary with gdb and continue stepping in until you hit the syscall.
The sys package takes care of the syscalls to the underlying OS. Depending on the OS you're using different packages are used to generate the appropriate calls. Here is a link to the README for Go running on Unix systems: https://github.com/golang/sys/blob/master/unix/README.md the parts on mksyscall.go, which are hand-written Go files which implement system calls that need special handling, and type files, should walk you through how it works.
The Go compiler (which translates the Go code to target CPU code) is written in Go but that is different to the run time support code which is what you are talking about. The standard library is mainly written in Go and probably knows how to directly make system calls with no C code involved. However, there may be a bit of C support code, depending on the target platform.

why need linker script and startup code?

I've read this tutorial
I could follow the guide and run the code. but I have questions.
1) Why do we need both load-address and run-time address. As I understand it is because we have put .data at flash too; so why we don't run app there, but need start-up code to copy it into RAM?
http://www.bravegnu.org/gnu-eprog/c-startup.html
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
arm-none-eabi-gcc -nostdlib -o sum_array.elf sum_array.c
Many thanks
Your first question was answered in the guide.
When you load a program on an operating system your .data section, basically non-zero globals, are loaded from the "binary" into the right offset in memory for you, so that when your program starts those memory locations that represent your variables have those values.
unsigned int x=5;
unsigned int y;
As a C programmer you write the above code and you expect x to be 5 when you first start using it yes? Well, if are booting from flash, bare metal, you dont have an operating system to copy that value into ram for you, somebody has to do it. Further all of the .data stuff has to be in flash, that number 5 has to be somewhere in flash so that it can be copied to ram. So you need a flash address for it and a ram address for it. Two addresses for the same thing.
And that begins to answer your second question, for every line of C code you write you assume things like for example that any function can call any other function. You would like to be able to call functions yes? And you would like to be able to have local variables, and you would like the variable x above to be 5 and you might assume that y will be zero, although, thankfully, compilers are starting to warn about that. The startup code at a minimum for generic C sets up the stack pointer, which allows you to call other functions and have local variables and have functions more than one or two lines of code long, it zeros the .bss so that the y variable above is zero and it copies the value 5 over to ram so that x is ready to go when the code your entry point C function is run.
If you dont have an operating system then you have to have code to do this, and yes, there are many many many sandboxes and toolchains that are setup for various platforms that already have the startup and linker script so that you can just
gcc -O myprog.elf myprog.c
Now that doesnt mean you can make system calls without a...system...printf, fopen, etc. But if you download one of these toolchains it does mean that you dont actually have to write the linker script nor the bootstrap.
But it is still valuable information, note that the startup code and linker script are required for operating system based programs too, it is just that native compilers for your operating system assume you are going to mostly write programs for that operating system, and as a result they provide a linker script and startup code in that toolchain.
1) The .data section contains variables. Variables are, well, variable -- they change at run time. The variables need to be in RAM so that they can be easily changed at run time. Flash, unlike RAM, is not easily changed at run time. The flash contains the initial values of the variables in the .data section. The startup code copies the .data section from flash to RAM to initialize the run-time variables in RAM.
2) Linker-script: The object code created by your compiler has not been located into the microcontroller's memory map. This is the job of the linker and that is why you need a linker script. The linker script is input to the linker and provides some instructions on the location and extent of the system's memory.
Startup code: Your C program that begins at main does not run in a vacuum but makes some assumptions about the environment. For example, it assumes that the initialized variables are already initialized before main executes. The startup code is necessary to put in place all the things that are assumed to be in place when main executes (i.e., the "run-time environment"). The stack pointer is another example of something that gets initialized in the startup code, before main executes. And if you are using C++ then the constructors of static objects are called from the startup code, before main executes.
1) Why do we need both load-address and run-time address.
While it is in most cases possible to run code from memory mapped ROM, often code will execute faster from RAM. In some cases also there may be a much larger RAM that ROM and application code may compressed in ROM, so the executable code may not simply be copied from ROM also decompressed - allowing a much larger application than the available ROM.
In situations where the code is stored on non-memory mapped mass-storage media such as NAND flash, it cannot be executed directly in any case and must be loaded into RAM by some sort of bootloader.
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
The linker script defines the memory layout of you target and application. Since this tutorial is for bare-metal programming, there is no OS to handle that for you. Similarly the start-up code is required to at least set an initial stack-pointer, initialise static data, and jump to main. On an embedded system it is also necessary to initialise various hardware such as the PLL, memory controllers etc.

Building a two-part firmware image using GCC toolchain

I have some firmware built with GCC that runs on an ARM Cortex M0 based microcontroller. The build currently generates a single binary image that can be written into the program memory of the microcontroller.
For reasons to do with field update, I need to split this image into two parts that can be updated separately. I'll call these Core and App.
Core: contains the interrupt vector table, main() routine, and various drivers and library routines. It will be located in the first half of the program memory.
App: contains application-specific code. It will be located in the second half of the program memory. It will have a single entry point, at a known address, which is called by the core to start the application. It will access functions and data in the core via known addresses.
There are some obvious limitations here, which I'm well aware of:
When building the app, the addresses of symbols in the core will need to be known. So the core must be built first, and must be available when linking the app.
An app image will only be compatible with the specific core image it was built against.
It will be possible to update the app without updating the core, but not vice versa.
All of that is OK.
My question is simply, how can I build these images using GCC and the GNU binutils?
Essentially I want to build the core like a normal firmware image, and then build the app image, with the app treating the core like a library. But neither shared linking (which would require a dynamic linking mechanism) or static linking (which would copy the core functions used into the app binary) are applicable here. What I'm trying to do is actually a lot simpler: link against an existing binary using its known, fixed addresses. It's just not clear to me how to do so with the tools.
We have this working now so I am going to answer my own question. Here is what was necessary to do this, starting from a normal single image build, turning that into the "core" and then setting up the build for the "app".
Decide how to split up both the flash and the RAM into separate areas for the core and the app. Define the start address and size of each area.
Create a linker script for the core. This will be the same as the standard linker script for the platform except that it must only use the areas reserved for the core. This can be done by changing the ORIGIN and LENGTH of the flash & RAM entries in the MEMORY section of the linker script.
Create a header file declaring the entry point for the app. This just needs a prototype e.g.:
void app_init(void);.
Include this header from the core C code and have the core call app_init() to start the app.
Create a symbol file declaring the address of the entry point, which will be the start address of the flash area for the app. I'll call this app.sym. It can just be one line in the following format:
app_init = 0x00010000;
Build the core, using the core linker script and adding --just-symbols=app.sym to the linker parameters to give the address of app_init. Retain the ELF file from the build, which I'll call core.elf.
Create a linker script for the app. This will again be based on the standard linker script for the platform, but with the flash & RAM memory ranges changed to those reserved for the app. Additionally, it will need a special section to ensure that app_init is placed at the start of the app flash area, before the rest of the code in the .text section:
SECTIONS
{
.text :
{
KEEP(*(.app_init))
*(.text*)
Write the app_init function. This will need to be in assembly, as it must do some low level work before any C code in the app can be called. It will need to be marked with .section .app_init so that the linker puts it in the correct place at the start of the app flash area. The app_init function needs to:
Populate variables in the app's .data section with initial values from flash.
Set variables in the app's .bss section to zero.
Call the C entry point for the app, which I'll call app_start().
Write the app_start() function that starts the app.
Build the app, using the app linker script. This link step should be passed the object files containing app_init, app_start, and any code called by app_start that is not already in the core. The linker parameter --just-symbols=core.elf should be passed to link functions in the core by their addresses. Additionally, -nostartfiles should be passed to leave out the normal C runtime startup code.
It took a while to figure all this out but it is now working nicely.
First of all... if this is just for field updating, you don't need to rely on the interrupt vector table in the core space for the app. I think ARM M0 parts always have the ability to move it. I know it can be done on some (all?) the STM32Fx stuff, but I believe this is an ARM M-x thing, not an ST thing. Look into this before committing yourself to the decision to make your application ISRs all be hooks called from the core.
If you plan on having a lot of interaction with your core (btw, I always call the piece that does self-updating a "bootloader" on MCUs), here's an alternate suggestion:
Have the Core pass a pointer to a struct / table of functions that describes its capabilities into the App entry point?
This would allow complete separation of the code for the app vs core except for a shared header (assuming your ABI doesn't change) and prevent name collisions.
It also provides a reasonable way to prevent GCC from optimizing away any functions that you might call only from the App without messing up your optimization settings or screwing around with pragmas.
core.h:
struct core_functions
{
int (*pcore_func1)(int a, int b);
void (*pcore_func2)(void);
};
core.c:
int core_func1(int a, int b){ return a + b; }
void core_func2(void){ // do something here }
static const struct core_functions cfuncs=
{
core_func1,
core_func2
};
void core_main()
{
// do setup here
void (app_entry*)(const struct core_functions *) = ENTRY_POINT;
app_entry( &cfuncs );
}
app.c
void app_main(const struct core_functions * core)
{
int res;
res = core->pcore_func1(20, 30);
}
The downside / cost is a slight runtime & memory overhead and more code.

Delete dynamically allocated memory twice?

First I'd like to point out that I'm using a GNU GCC compiler. I'm using Code::Blocks as my IDE so I don't have to type in all the compiler junk into the Windows DOS command prompt. If I could be more specific about my compiler, what shows up as a line at the bottom of Cod::Blocks when I successfully compile is
mingw32-g++.exe -std=c++11 -g
Anyways, my question involves using the delete operator to release dynamically allocated memory. When I compile this code snippet:
int* x;
x = new int;
delete x;
delete x;
I don't get any warnings or errors or crashes. From the book I'm learning C++ from, releasing a pointer to a dynamically allocated memory chuck can only be done once, then the pointer is invalid. If you use delete on the same pointer again, then there will be problems. However, I don't get this problem.
Likewise, if I pass an object by value to a function, so that it is shallow copied, I get no error if I don't have a copy constructor to ensure deep copy (using raw pointers in the object). This means that when the function returns, the shallow copy goes out of scope, and invokes its destructor (where I'm using delete on a pointer). When int main returns, the original object goes out of scope, its destructor is invoked, and that same shallow copied pointer is deleted. But I have no problems.
I tried finding documentation online about the compiler I'm using and couldn't find any. Does this mean that the mingw32 compiler has some sort of default copy constructor it uses? Thus, I don't have to worry about creating copy constructors?
The compiler documentation is not likely to be helpful in this case: If it exists, it is likely to list exceptions to the C++ spec. It's the C++ spec that you need here.
When you delete the same pointer twice, the result--according to the C++ spec--is undefined. The compiler could do anything at all and have it meet spec. The compiler is allowed to recognize the fault and give an error message, or not recognize the fault and blow up immediately or sometime later. That your compiler appeared to work this time is no indication that the double delete is safe. It could be mucking up the heap in a way that results in a seg fault much later.
If you do not define a copy constructor, C++ defines one for you. The default copy constructor does a memberwise copy.
When you have the same object pointed to by multiple pointers, such as you do, consider using std::smart_ptr.

Resources