What does the "aw" flag in the section attribute mean? - gcc

In the following line of code (which declares a global variable),
unsigned int __attribute__((section(".myVarSection,\"aw\",#nobits#"))) myVar;
what does the "aw" flag mean?
My understanding is that the nobits flag will prevent the variable from being initialised to zero, but I am struggling to find info about the "aw" flag.
Also, what meaning do the # and # have around the nobits flag?

The section("section-name") attribute places a variable in a specific section by producing the following assembler line:
.section section-name,"aw",#progbits
When you set section-name to ".myVarSection,\"aw\",#nobits#" you exploit a kind of "code injection" in GCC to produce:
.section .myVarSection,"aw",#nobits#,"aw",#progbits
Note that # sign starts a one-line comment.
See GNU Assembler manual for the full description of .section directive. A general syntax is
.section name [, "flags"[, #type[,flag_specific_arguments]]]
so "aw" are flags:
a: section is allocatable
w: section is writable
and #nobits is a type:
#nobits: section does not contain data (i.e., section only occupies space)
All the above is also applicable to functions, not just variables.

what does the "aw" flag mean?
It means that the section is allocatable (i.e. it's loaded to the memory at runtime) and writable (and readable, of course).
My understanding is that the nobits flag will prevent the variable from being initialised to zero, but I am struggling to find info about the "aw" flag.
Also, what meaning do the # and # have around the nobits flag?
#nobits (# is just a part of the name) means that the section isn't stored in the image on disk, it only exists in runtime (and it's filled with zeros at the startup).
# character begins the comment, so whatever the compiler will put in addition to what you have specified will be ignored.

Related

gcc linker script difference between *(.rodata*) and *(.rodata.*)

We have a linker script where a part of the .text section has the read only data input section specified as:
.text:
{
...
*(.rodata .rodata* .gnu.linkonce.r.*)
...
}
However, other input sections that have a trailing wildcard usually have a . between name and wildcard. Like: *(.text .text.* .gnu.linkonce.t.*)
Is there a difference in how the linker treats .'s and wildcard combinations or is there no difference?
Should the .rodata* actually be .rodata.*?
.text* is simply shorter than the more explicit .text .text.*, though not equivalent. It would pick up other sections like .text_foo. This can be intended (to make it more independent of the compiler convention maybe?) or not.
https://sourceware.org/binutils/docs/ld/Input-Section-Wildcards.html
The .* sections are generated by gcc if you pass -ffunction-sections. The same holds for data and -fdata-sections.

objcopy is removing a section unless I declare a static volatile variable in that section (using attribute)

I have a linker script in which I have defined a section for containing the checksum of a software image. Something like:
...
.my_checksum :
{
__checksum_is_here = .;
KEEP (*(.my_checksum))
. = ALIGN(4);
_sw_image_code_end = .;
} > IMAGE
...
The checksum is placed into that section by using objcopy --update-section.
I build an elf file by using the arm gcc compiler, and I can see this section and its value within it:
> arm-none-eabu-objdumph -h my_elf_file.elf
...
0 .text 0001496c 08010000 08010000 00010000 2**4
...
7 .my_checksum 00000004 080250c0 080250c0 000350c0 2**2
...
// Notice that 000350c0 is the file offset and 080250c0 is the LMA.
// The starting LMA is 08010000
And I can retrieve its value:
> xxd -s 0x000350c0 -l 4 my_elf_file.elf
000350c0: 015e 028e // I have checked this value and it is correct.
Now I generate a bin file by executing
> arm-none-eabi-objcopy -O binary --gap-fill 0xFF -S my_elf_file.elf my_elf_file.bin
Now, if I try to read the checksum value again, using the difference between the checksum LMA and the first section LMA (see above):
> xxd -s 0x150c0 -l 4 my_elf_file.bin
The result I obtain here is different from the one obtained in the elf file, that is, the checksum section has been removed by objcopy. (That's what I think at least).
Nevertheless, If I define this in my main.c file:
static volatile unsigned int __aux_checksum __attribute__((section(".my_checksum")));
...
int main() {
...
((void)__aux_checksum); // Avoid compiler/linker optimizations.
...
}
Now, if I replicate the same steps as above with the elf and bin files (using the proper offsets), I can retrieve the checksum from the bin file (elf and bin give the same result).
Questions
My first question is: I know that you can define a section using __attribute__((section)), but if you use a section already defined within the linker script, does this command changes its behaviour for placing the variable within the section, instead of creating a new one?
My second question is: Is this the only way for preventing objcopy of removing this particular section?
Lets answer your 2nd question first,
Is this the only way for preventing objcopy of removing this particular section?
You need a concept as documented in the gnu LD manual under SECTIONS.
4.6.8.1. Output Section Type
Each output section may have a type. The type is a keyword in parentheses. The following types are defined:
NOLOAD
The section should be marked as not loadable, so that it will not be loaded into memory when the program is run.
DSECT, COPY, INFO, OVERLAY
These type names are supported for backward compatibility, and are rarely used. They all have the same effect: the section should be marked as not allocatable, so that no memory is allocated for the section when the program is run.
The linker normally sets the attributes of an output section based on the input sections which map into it. You can override this by using the section type. For example, in the script sample below, the ROM section is addressed at memory location 0 and does not need to be loaded when the program is run. The contents of the ROM section will appear in the linker output file as usual.
SECTIONS {
ROM 0 (NOLOAD) : { … }
…
}
So what does that mean? Say you have debugging info in your objects. If you are burning a ROM image you probably don't want to place the debugging info in the object. As well, the BSS segment is all zero and there is no need to store it to ROM, but you need to clear our RAM (at the load address) to make way for it. The 'init value' for the .data section is initialized from ROM but resides in RAM. The concepts are 'loadable' and 'allocatable' and they have flags for them in an ELF file. By default your .my_checksum gets no flags. Ie, not allocated and not loadable like debug info.
I know that you can define a section using attribute((section)), but if you use a section already defined within the linker script, does this command changes its behaviour for placing the variable within the section, instead of creating a new one?
From the above,
The linker normally sets the attributes of an output section based on the input sections which map into it.
Your input sections flags get inherited by your output section. So you have put in at least allocatable as a flag.
I would suggest that you just put your checksum at the end of either .text or .data. For instance, input secttions .rodata (constant values) usually get put with the output .text. There is usually no need to invent another output sections unless you want some book keeping that wont get to the final image. Your __checksum_is_here label is sufficient to find it and you can look at this question on CRCs.

Is there a way to use math expressions in gnu assembly constants?

What is the correct gnu assembly syntax for doing the following:
.section .data2
.asciz "******* Output Data ********"
total_sectors_written: .word 0x0
max_buffer_sectors: .word ((0x9fc00 - $data_buffer) / 512) # <=== need help here
.align 512
data_buffer: .asciz "<The actual data will overwrite this>"
Specifically, I'm writing a toy OS. The code above is in 16-bit real mode. I'm setting up a data buffer that will be dumped back to the boot disk. I want to calculate the number of sectors there are between where data_buffer gets placed in memory, and the upper bound of that data buffer. (Address 0x9fc00 is where the buffer would run into RAM reserved for other purposes.)
I know I could write assembly code to calculate this; but, since it is a constant known at build time, I'm curious if I can get the assembler to calculate it for me.
I'm running into three specific problems:
(1) If I use $data_buffer I get this error:
os_src/boot.S: Assembler messages:
os_src/boot.S:497: Error: missing ')'
os_src/boot.S:497: Error: can't resolve `L0' {*ABS* section} - `$data_buffer' {*UND* section}
which I find confusing, because I should use $ when I want the memory address of a label, correct?
(2) If I use data_buffer instead of $data_buffer, I get this error:
os_src/boot.S: Assembler messages:
os_src/boot.S:497: Error: missing ')'
os_src/boot.S:497: Error: value of 653855 too large for field of 2 bytes at 31
make: *** [obj/boot/dd_test.o] Error 1
which seems to suggest that the assembler is complaining about the size of the intermediate value (which does not need to fit in a 16-bit word).
(3) And, of course, what's up with the missing ')'?
When you use expressions in GNU assembler they have to resolve to absolute values. GNU assembler isn't aware of what the origin point of the code will actually be at. That is what the linker is for. Because of that data_buffer absolute address isn't known until linking is done so it is considered relocatable. If you take an absolute value like 0x9fc00 and subtract a relocatable value from it you get a relocatable value. Relocatable values can't be used in constant (absolute) expressions.
All is not lost. The linker itself will know the absolute address once it arranges everything in memory. You seem to suggest you already use a linker script which means the work you have to do is minimal. You can use the linker to compute the value of max_buffer_sectors.
Your linker script will have a SECTIONS directive like:
SECTIONS
{
[your section contents here]
}
You can create a linker symbol max_buffer_sectors with something like:
SECTIONS
{
max_buffer_sectors = (0x9fc00 - (data_buffer)) / 512;
[your section contents here]
}
This will allow the linker to compute the size since it will know data_buffer absolute address in memory.
Your GNU assembly file will need a bit of tweaking:
.globl data_buffer
.section .data2
.asciz "******* Output Data ********"
total_sectors_written: .word 0x0
.align 512
data_buffer: .asciz "<The actual data will overwrite this>"
You'll notice I used .globl data_buffer. This exports the symbol and makes it global so that the linker can use it.
You can then use the symbol max_buffer_sectors in code like:
mov $max_buffer_sectors, %ax

What is the syntax for directives in gcc asm command?

I'm trying to use .ascii directive in the gcc extended asm command but I keep getting compiler errors. What is the exact syntax for directives inside extended asm?
I tried the following options but none of the worked:
asm ("NOP;"
".ASCII ""ABC"""
);
I got "Error: junk at end of line, first unrecognized character is `/'"
asm ("NOP;"
".ASCII "ABC""
);
I got Error: junk at end of line, first unrecognized character is `/'"
asm ("NOP;"
.ASCII "ABC"
);
I got "error: expected ‘:’ or ‘)’ before ‘/’ token"
The syntax for directives inside the asm is identical to writing GNU Assembler, so you can reference the GNU Assembler manual for the relevant syntax.
Example:
#include <stdio.h>
int
main (void)
{
char *string;
asm (".pushsection .rodata\n"
"0:\n"
" .ascii \"Testing 1 2 3!\"\n"
" .popsection\n"
" mov $0b, %0\n":"=rm" (string));
puts (string);
}
In the example we use an extended asm to copy the address of a string to a char * and then pass that to puts to print the string.
The string needs to be placed into the appropriate linker section, not just added to the current (usually the code section i.e. .text). So you begin by pushing the section you want the string stored to into the assembler's section stack. In this example I give it's the read only data section (.rodata) where most strings live. Then you pop the section off the section stack to get back to whatever section the compiler left you in, and do your operation with the string address. The trick is to use a local label like 0 to reference the string and let the assembler and linker compute the offset for you. This may require more work if you're PIE or PIC depending on how much more complicated your references become or if they require relocations.

MinGW's ld cannot perform PE operations on non PE output file

I know there are some other similar questions about this out there, be it StackOverflow or not. I've researched a lot for this, and still didn't find a single solution.
I'm doing an operative system as a side project. I've been doing all in Assembly, but now I wanna join C code.
To test, I made this assembly code file (called test.asm):
[BITS 32]
GLOBAL _a
SECTION .text
_a:
jmp $
Then I made this C file (called main.c):
extern void a(void);
int main(void)
{
a();
}
To link, I used this file (called make.bat):
"C:\minGW\bin\gcc.exe" -ffreestanding -c -o c.o main.c
nasm -f coff -o asm.o test.asm
"C:\minGW\bin\ld.exe" -Ttext 0x100000 --oformat binary -o out.bin c.o asm.o
pause
I've been researching for ages, and I'm still struggling to find an answer. I hope this won't be flagged as duplicate. I acknowledge about the existence of similar questions, but all have different answers, and none work for me.
Question: What am I doing wrong?
Old MinGW versions had the problem that "ld" was not able to create non-PE files at all.
Maybe current versions have the same problem.
The work-around was creating a PE file with "ld" and then to transform the PE file to binary, HEX or S19 using "objcopy".
--- EDIT ---
Thinking about the question again I see two problems:
As I already said some versions of "ld" have problems creating "binary" output (instead of "PE", "ELF" or whatever format is used).
Instead of:
ld.exe --oformat binary -o file.bin c.o asm.o
You should use the following sequence to create the binary file:
ld.exe -o file.tmp c.o asm.o
objcopy -O binary file.tmp file.bin
This will create an ".exe" file named "binary.tmp"; then "objcopy" will create the raw data from the ".exe" file.
The second problem is the linking itself:
"ld" assumes a ".exe"-like file format - even if the output file is a binary file. This means that ...
... you cannot even be sure if the object code of "main.o" is really placed at the first address of the resulting object code. "ld" would also be allowed to put the code of "a()" before "main()" or even put "internal" code before "a()" and "main()".
... addressing works a bit differently which means that a lot of padding bytes will be created (maybe at the start of the file!) if you do something wrong.
The only possibility I see is to create a "linker script" (sometimes called "linker command file") and to create a special section in the assembler code (because I normally use another assembler than "nasm" I do not know if the syntax here is correct):
[BITS 32]
GLOBAL _a
SECTION .entry
jmp _main
SECTION .text
_a:
jmp $
In the linker script you can specify which sections appear in which order. Specify that ".entry" is the first section of the file so you can be sure it is the first instruction of the file.
In the linker script you may also say that multiple sections (e.g. ".entry", ".text" and ".data") should be combined into a single section. This is useful because sections are normally 0x1000-byte-aligned in PE files! If you do not combine multiple sections into one you'll get a lot of stub bytes between the sections!
Unfortunately I'm not the expert for linker scripts so I cannot help you too much with that.
Using "-Ttext" is also problematic:
In PE files the actual address of a section is calculated as "image base" + "relative address". The "-Ttext" argument will influence the "relative address" only. Because the "relative address" of the first section is typically fixed to 0x1000 in Windows a "-Ttext 0x2000" would do nothing but filling 0x1000 stub bytes at the start of the first section. However you do not influence the start address of ".text" at all - you only fill stub bytes at the start of the ".text" section so that the first useful byte is located at 0x2000. (Maybe some "ld" versions behave differently.)
If you wish that the first section of your file is located at address 0x100000 you should use the equivalent of "-Ttext 0x1000" in the linker script (-Ttext is not used if a linker script is used) and define the "image base" to 0xFF000:
ld.exe -T linkerScript.ld --image-base 0xFF000 -o binary.tmp a.o main.o
The memory address of the ".text" section will be 0xFF000 + 0x1000 = 0x100000.
(And the first byte of the binary file generated by "objcopy" will be the first byte of the first section - representing memory address 0x100000.)

Resources