gcc / ld: overlapping sections (.tbss, .init_array) in statically-linked ELF binary

gcc / ld: overlapping sections (.tbss, .init_array) in statically-linked ELF binary - gcc

I'm compiling a very simple hello-world one-liner statically on Debian 7 system on x86_64 machine with gcc version 4.8.2 (Debian 4.8.2-21):
gcc test.c -static -o test
and I get an executable ELF file that includes the following sections:
[17] .tdata PROGBITS 00000000006b4000 000b4000
0000000000000020 0000000000000000 WAT 0 0 8
[18] .tbss NOBITS 00000000006b4020 000b4020
0000000000000030 0000000000000000 WAT 0 0 8
[19] .init_array INIT_ARRAY 00000000006b4020 000b4020
0000000000000010 0000000000000000 WA 0 0 8
[20] .fini_array FINI_ARRAY 00000000006b4030 000b4030
0000000000000010 0000000000000000 WA 0 0 8
[21] .jcr PROGBITS 00000000006b4040 000b4040
0000000000000008 0000000000000000 WA 0 0 8
[22] .data.rel.ro PROGBITS 00000000006b4060 000b4060
00000000000000e4 0000000000000000 WA 0 0 32
Note that .tbss section is allocated at addresses 0x6b4020..0x6b4050 (0x30 bytes) and it intersects with allocation of .init_array section at 0x6b4020..0x6b4030 (0x10 bytes), .fini_array section at 0x6b4030..0x6b4040 (0x10 bytes) and with .jcr section at 0x6b4040..0x6b4048 (8 bytes).
Note it does not intersect with the following sections, for example, .data.rel.ro, but that's probably because .data.rel.ro alignment is 32 and thus it can't be placed any earlier than 0x6b4060.
The resulting file runs ok, but I still don't exactly get how it works. From what I read in glibc documentation, .tbss is a just .bss section for thread local storage (i.e. allocated memory scratch space, not really mapped in physical file). Is it that .tbss section is so special that it can overlap other sections? Are .init_array, .fini_array and .jcr are so useless (for example, they are not needed anymore then TLS-related code runs), so they can be overwritten by bss? Or is it some sort of a bug?
Basically, what do I get to read and write if I'll try to read address 0x6b4020 in my application? .tbss contents or .init_array pointers? Why?

The virtual address of .tbss is meaningless as that section only serves as a template for the TLS storage as allocated by the threading implementation in GLIBC.
The way this virtual address comes into place is that .tbss follows .tbdata in the default linker script:
...
.gcc_except_table : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
/* Thread Local Storage sections */
.tdata : { *(.tdata .tdata.* .gnu.linkonce.td.*) }
.tbss : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) }
.preinit_array :
{
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array))
PROVIDE_HIDDEN (__preinit_array_end = .);
}
.init_array :
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array))
PROVIDE_HIDDEN (__init_array_end = .);
}
...
therefore its virtual address is simply the virtual address of the preceding section (.tbdata) plus the size of the preceding section (eventually with some padding in order to reach the desired alignment). .init_array (or .preinit_array if present) comes next and its location should be determined the same way, but .tbss is known to be so very special, that it is given a deeply hard-coded treatment inside GNU LD:
/* .tbss sections effectively have zero size. */
if ((os->bfd_section->flags & SEC_HAS_CONTENTS) != 0
|| (os->bfd_section->flags & SEC_THREAD_LOCAL) == 0
|| link_info.relocatable)
dotdelta = TO_ADDR (os->bfd_section->size);
else
dotdelta = 0; // <----------------
dot += dotdelta;
.tbss is not relocatable, it has the SEC_THREAD_LOCAL flag set, and it does not have contents (NOBITS), therefore the else branch is taken. In other words, no matter how large the .tbss is, the linker does not advance the location of the section that follows it (also know as "the dot").
Note also that .tbss sits in a non-loadable ELF segment:
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000b1f24 0x00000000000b1f24 R E 200000
LOAD 0x00000000000b2000 0x00000000006b2000 0x00000000006b2000
0x0000000000002288 0x00000000000174d8 RW 200000
NOTE 0x0000000000000158 0x0000000000400158 0x0000000000400158
0x0000000000000044 0x0000000000000044 R 4
TLS 0x00000000000b2000 0x00000000006b2000 0x00000000006b2000 <---+
0x0000000000000020 0x0000000000000060 R 8 |
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 |
0x0000000000000000 0x0000000000000000 RW 8 |
|
Section to Segment mapping: |
Segment Sections... |
00 .note.ABI-tag ... |
01 .tdata .ctors ... |
02 .note.ABI-tag ... |
03 .tdata .tbss <---------------------------------------------------+
04

This is rather simple if you have an understanding about two things:
1) What is SHT_NOBITS
2) What is tbss section
SHT_NOBITS means that this section occupies no space inside file.
Normally, NOBITS sections, like bss are placed after all PROGBITS sections at the end of the loaded segments.
tbss is special section to hold uninitialized thread-local data that contribute to the program's memory image. Take an attention here: this section must hold unique data for each program thread.
Now lets talk about overlapping. We have two possible overlappings -- inside binary file and inside memory.
1) Binary files offset:
There is no data to write under this section in binary. Inside file it holds no space, so linker start next section init_array immediately after tbss declared. You may think about its size not as about size, but as about special service information for code like:
if (isTLSSegment) tlsStartAddr += section->memSize();
So it doesn't overlap anything inside file.
2) Memory offset
The tdata and tbss sections may be possibly modified at startup time by the dynamic linker
performing relocations, but after that the section data is kept around as the initialization image and not modified anymore. For each thread, including the initial one, new memory is allocated into which then the content of the initialization image is copied. This ensures that all threads get the same starting conditions.
This what makes tbss (and tdata) so special.
Do not think about their memory offsets as about statically known -- they are more like "generation patterns" for per-thread work. So they also can not overlap with "normal" memory offsets -- they are being processed in other way.
You may consult with this paper to know more.

Related

GNU LD for ARM produces section alignment to unwanted bound

I'm building an embeeded software for STM32 microcontroller with the toolchain GNU Tools for STM32 and I need the binary output without gaps.
The linker produces a gap between sections .text and .rodata. The problem is the alignment of the section .rodata. The issue appears by using of the GNU Tools for STM32 version 9-2020-q2-update. The previous version I had used (7-2018-q2-update) did not produced that issue.
Excerpt from the linker script (it's the same for both GNU Tools versions):
SECTIONS
{
.text :
{
. = ALIGN(4);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4); /* PaulV: change that to ALIGN(8) eliminates the gap */
} >FLASH
/* Constant data into "FLASH" Rom type memory */
.rodata :
{
. = ALIGN(4);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(4);
} >FLASH
}
More details:
The version 7-2018-q2-update produces the output without gap.
The .lst file (note that section .rodata is aligned to bound 4):
K4_G1.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
....
3 .text 0001a20c 08100800 08100800 00010800 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
4 .rodata 00009b54 0811aa0c 0811aa0c 0002aa0c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
and .map file (no gap between the non-empty sections .fini and .rodata):
.fini 0x000000000811aa04 0x8 c:/st/stm32cubeide_1.4.0/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.7-2018-q2-update.win32_1.5.0.202011040924/tools/bin/../lib/gcc/arm-none-eabi/7.3.1/thumb/v7e-m/fpv5/hard/crtn.o
0x000000000811aa0c . = ALIGN (0x4)
0x000000000811aa0c _etext = .
.vfp11_veneer 0x000000000811aa0c 0x0
.vfp11_veneer 0x000000000811aa0c 0x0 linker stubs
.v4_bx 0x000000000811aa0c 0x0
.v4_bx 0x000000000811aa0c 0x0 linker stubs
.iplt 0x000000000811aa0c 0x0
.iplt 0x000000000811aa0c 0x0 c:/st/stm32cubeide_1.4.0/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.7-2018-q2-update.win32_1.5.0.202011040924/tools/bin/../lib/gcc/arm-none-eabi/7.3.1/thumb/v7e-m/fpv5/hard/crtbegin.o
.rodata 0x000000000811aa0c 0x9b54
0x000000000811aa0c . = ALIGN (0x4)
*(.rodata)
.rodata 0x000000000811aa0c 0x8c Src/app_composer/init.o
The version 9-2020-q2-update produces the output with gap.
The .lst file (note that section .rodata is aligned to bound 8, but why?):
K4_G1.elf: file format elf32-littlearm
Sections:
Idx Name Size VMA LMA File off Algn
...
3 .text 0001923c 08100800 08100800 00010800 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
4 .rodata 000061f0 08119a40 08119a40 00029a40 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
and .map file (there is a gap between the non-empty sections .fini and .rodata):
.fini 0x0000000008119a34 0x8 c:/st/stm32cubeide_1.4.0/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.9-2020-q2-update.win32_1.5.0.202011040924/tools/bin/../lib/gcc/arm-none-eabi/9.3.1/thumb/v7e-m+dp/hard/crtn.o
.vfp11_veneer 0x0000000008119a3c 0x0
.vfp11_veneer 0x0000000008119a3c 0x0 linker stubs
.v4_bx 0x0000000008119a3c 0x0
.v4_bx 0x0000000008119a3c 0x0 linker stubs
.iplt 0x0000000008119a3c 0x0
.iplt 0x0000000008119a3c 0x0 c:/st/stm32cubeide_1.4.0/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.9-2020-q2-update.win32_1.5.0.202011040924/tools/bin/../lib/gcc/arm-none-eabi/9.3.1/thumb/v7e-m+dp/hard/crtbegin.o
.rodata 0x0000000008119a40 0x61f0
0x0000000008119a40 . = ALIGN (0x4)
*(.rodata)
.rodata 0x0000000008119a40 0x96 Src/app_composer/init.o
Edit 03/16/2021
There are no sections *(.rodata) in the input object files having
alignment on the boundary 8 or greater.
Changing the section name .rodata to the name .text eliminates the
gap (The same result if I join the sections .text and .rodata content to a
single .text section):
SECTIONS
{
.text :
{
. = ALIGN(4);
*(.text) /* .text sections (code) */
/* ... */
. = ALIGN(4);
} >FLASH
/* Constant data into "FLASH" Rom type memory */
.text : /* <-- the same name as the previous section instead of .rodata */
{
. = ALIGN(4);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(4);
} >FLASH
}
The source code and build settings are also the same for both variants.
What could be the reason for the problem and how could it be solved? Do I missing something?
P.S. Of cause I can change the alignment at the end of .text section to 8. That would be treating the symptoms, but I want to understand the cause.
Thanks in advance for your help!

Run objdump -h on the input object files. I suspect that you will find that the compiler is putting a minimum alignment of 8 on one of the input .rodata sections. The linker then sets the output alignment to the maximum of the input sections.

RAM section is part of the binary firmware

I am trying to use a custom RAM section to be able to pass information across reboot. This section will not be erased at boot and so the variables placed in this section will be kept across reboots (if there is no alimentation loss of course).
I use GNU toolchain and a Cortex-M0 (STM32) MCU
So I added in the linker script a new memory area before RAM :
RAM_PERSIST (xrw) : ORIGIN = 0x20000000, LENGTH = 0x0040
RAM (xrw) : ORIGIN = 0x20000040, LENGTH = 0x0FD0
Then a section to go in there :
.pds :
{
KEEP(*(.pds))
} >RAM_PERSIST
Finally in the C code, I declare some data in this section :
data_t __attribute((section(".pds")) data;
I does compile but I could not upload the generated binary on my target. Using objdump I discovered that my firmware got a new section ".sec2" beginning at 0x20000000 :
> (...)/arm-none-eabi-objdump -s ./obj/firmware.hex | tail
8006d20 f8bc08bc 9e467047 f8b5c046 f8bc08bc .....FpG...F....
8006d30 9e467047 e9000008 c1000008 00127a00 .FpG..........z.
8006d40 19000000 e0930400 409c0000 400d0300 ........#...#...
8006d50 c0c62d00 30750000 ffffffff 01000000 ..-.0u..........
8006d60 04000000 ....
Contents of section .sec2:
20000000 00000000 00000000 00000000 00000000 ................
20000010 00000000 00000000 00000000 00000000 ................
20000020 00000000 00000000 00000000 00000000 ................
20000030 00000000 00000000 00000000 00000000 ................
So I think I have to tell the linker this section is not in the flash so must not be part of the firmware.
Am I right ? If so, how to do that ?
Thanks by advance.

MEMORY
{
rom : ORIGIN = 0x00000000, LENGTH = 0x40000
ram : ORIGIN = 0x20000000, LENGTH = 0x4000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.bss : { *(.bss*) } > ram
}
I had more control/success when I stopped using xrw, etc in the memory definition and instead went with control over .text, .bss, .data, etc. and if you then further want a specific object somewhere you add that. etc...

I did achieve what I wanted by adding NOLOAD attibute to my custom section :
.pds (NOLOAD): { KEEP(*(.pds)) } >RAM
Here is the NOLOAD description (gcc documentation) :
(NOLOAD)
The (NOLOAD) directive will mark a section to not be loaded at run time. The linker will process the section normally, but will mark
it so that a program loader will not load it into memory. For example,
in the script sample below, the ROM section is addressed at memory
location 0 and does not need to be loaded when the program is run.
The contents of the ROM section will appear in the linker output file
as usual.
SECTIONS {
ROM 0 (NOLOAD) : { ... }
...
}
I found a similar post which helped me, I add a link here for reference : GCC (NOLOAD) directive loads memory into section anyway

static C variable not getting initialized

I have one file-level static C variable that isn't getting initialized.
const size_t VGA_WIDTH = 80;
const size_t VGA_HEIGHT = 25;
static uint16_t* vgat_buffer = (uint16_t*)0x62414756; // VGAb
static char vgat_initialized= '\0';
In particular, vgat_initialized isn't always 0 the first time it is accessed. (Of course, the problem only appears on certain machines.)
I'm playing around with writing my own OS, so I'm pretty sure this is a problem with my linker script; but, I'm not clear how exactly the variables are supposed to be organized in the image produced by the linker (i.e., I'm not sure if this variable is supposed to go in .data, .bss, some other section, etc.)
VGA_WIDTH and VGA_HEIGHT get placed in the .rodata section as expected.
vgat_buffer is placed in the .data section, as expected (By initializing this variable to 0x62417656, I can clearly see where the linker places it in the resulting image file.)
I can't figure out where vgat_initialized is supposed to go. I've included the relevant parts of the assembly file below. From what I understand, the .comm directive is supposed to allocate space for the variable in the data section; but, I can't tell where. Looking in the linker's map file didn't provide any clues either.
Interestingly enough, if I change the initialization to
static char vgat_initialized= 'x';
everything works as expected: I can clearly see where the variable is placed in the resulting image file (i.e., I can see the x in the hexdump of the image file).
Assembly code generated from the C file:
.text
.LHOTE15:
.local buffer.1138
.comm buffer.1138,100,64
.local buffer.1125
.comm buffer.1125,100,64
.local vgat_initialized
.comm vgat_initialized,1,1
.data
.align 4
.type vgat_buffer, #object
.size vgat_buffer, 4
vgat_buffer:
.long 1648445270
.globl VGA_HEIGHT
.section .rodata
.align 4
.type VGA_HEIGHT, #object
.size VGA_HEIGHT, 4
VGA_HEIGHT:
.long 25
.globl VGA_WIDTH
.align 4
.type VGA_WIDTH, #object
.size VGA_WIDTH, 4
VGA_WIDTH:
.long 80
.ident "GCC: (GNU) 4.9.2"

compilers can conform to their own names for sections certainly but using the common .data, .text, .rodata, .bss that we know from specific compilers, this should land in .bss.
But that doesnt in any way automatically zero it out. There needs to be a mechanism, sometimes depending on your toolchain the toolchain takes care of it and creates a binary that in addition to .data, .rodata (and naturally .text) being filled in will fill in .bss in the binary. But depends on a few things, primarily is this a simple ram only image, is everything living under one memory space definition in the linker script.
you could for example put .data after .bss in the linker script and depending the binary format you use and/or tools that convert that you could end up with zeroed memory in the binary without any other work.
Normally though you should expect to using toolchain specific (linker scripts are linker specific not to be assumed to be universal to all tools) mechanism for defining where .bss is from your perspective, then some form of communication from the linker as to where it starts and what size, that information is used by the bootstrap whose job it is to zero it in that case, and one can assume it is always the bootstrap's job to zero .bss with naturally some exceptions. Likewise if the binary is meant to be on a read only media (rom, flash, etc) but .data, and .bss are read/write you need to have .data in its entirety on this media then someone has to copy it to its runtime position in ram, and .bss is either part of that depending on the toolchain and how you used it or the start address and size are on the read only media and someone has to zero that space at some point pre-main(). Here again this is the job of the bootstrap. Set the stack pointer, move .data if needed, zero .bss are the typical minimal jobs of the bootstrap, you can shortcut them in special cases or avoid using .data or .bss.
Since it is the linkers job to take all the little .data and .bss (and other) definitions from the objects being linked and combine them per the directions from the user (linker script, command line, whatever that tool uses), the linker ultimately knows.
In the case of gcc you use what I would call variables that are defined in the linker script, the linker script can fill in these values with matching variable/label names for the assembler such that a generic bootstrap can be used and you dont have to do any more work than that.
Like this but possibly more complicated
MEMORY
{
bob : ORIGIN = 0x8000, LENGTH = 0x1000
ted : ORIGIN = 0xA000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > bob
__data_rom_start__ = .;
.data : {
__data_start__ = .;
*(.data*)
} > ted AT > bob
__data_end__ = .;
__data_size__ = __data_end__ - __data_start__;
.bss : {
__bss_start__ = .;
*(.bss*)
} > bob
__bss_end__ = .;
__bss_size__ = __bss_end__ - __bss_start__;
}
then you can pull these into the assembly language bootstrap
.globl bss_start
bss_start: .word __bss_start__
.globl bss_end
bss_end: .word __bss_end__
.word __bss_size__
.globl data_rom_start
data_rom_start:
.word __data_rom_start__
.globl data_start
data_start:
.word __data_start__
.globl data_end
data_end:
.word __data_end__
.word __data_size__
and then write some code to operate on those as needed for your design.
you can simply put things like that in a linked in assembly language file without other code using them and assemble, compile other code and link and then the disassembly or other tools you prefer will show you what the linker generated, tweak that until you are satisfied then you can write or borrow or steal bootstrap code to use them.
for bare metal I prefer to not completely conform to the standard with my code, dont have any .data and dont expect .bss to be zero, so my bootstrap sets the stack pointer and calls main, done. For an operating system, you should conform. the toolchains already have this solved for the native platform, but if you are taking over that with your own linker script and boostrap then you need to deal with it, if you want to use an existing toolchains solution for an existing operating system then...done...just do that.

This answer is simply an extension of the others. As has been mentioned C standard has rules about initialization:
10) If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate. If an object that has static storage duration is not initialized explicitly, then:
if it has pointer type, it is initialized to a null pointer;
if it has arithmetic type, it is initialized to (positive or unsigned) zero;
if it is an aggregate, every member is initialized (recursively) according to these rules;
if it is a union, the first named member is initialized (recursively) according to these rules.
The problem in your code is that a computers memory may not always be initialized to zero. It is up to you to make sure the BSS section is initialized to zero in a free standing environment (like your OS and bootloader).
The BSS sections usually don't (by default) take up space in a binary file and usually occupy memory in the area beyond the limits of the code and data that appears in the binary. This is done to reduce the size of the binary that has to be read into memory.
I know you are writing an OS for x86 booting with legacy BIOS. I know that you are using GCC from your other recent questions. I know you are using GNU assembler for part of your bootloader. I know that you have a linker script, but I don't know what it looks like. The usual mechanism to do this is via a linker script that places the BSS data at the end, and creates start and end symbols to define the address extents of the section. Once these symbols are defined by the linker they can be used by C code (or assembly code) to loop through the region and set it to zero.
I present a reasonably simple MCVE that does this. The code reads an extra sector with the kernel with Int 13h/AH=2h; enables the A20 line (using fast A20 method); loads a GDT with 32-bit descriptors; enables protected mode; completes the transition into 32-bit protected mode; and then calls a kernel entry point in C called kmain. kmain calls a C function called zero_bss that initializes the BSS section based on the starting and ending symbols (__bss_start and __bss_end) generated by a custom linker script.
boot.S:
.extern kmain
.globl mbrentry
.code16
.section .text
mbrentry:
# If trying to create USB media, a BPB here may be needed
# At entry DL contains boot drive number
# Segment registers to zero
xor %ax, %ax
mov %ax, %ds
mov %ax, %es
# Set stack to grow down from area under the place the bootloader was loaded
mov %ax, %ss
mov $0x7c00, %sp
cld # Ensure forward direction of MOVS/SCAS/LODS instructions
# which is required by generated C code
# Load kernel into memory
mov $0x02, %ah # Disk read
mov $1, %al # Read 1 sector
xor %ch, %ch # Cylinder 0
xor %dh, %dh # Head 0
mov $2, %cl # Start reading from second sector
mov $0x7e00, %bx # Load kernel at 0x7e00
int $0x13
# Quick and dirty A20 enabling. May not work on all hardware
a20fast:
in $0x92, %al
or $2, %al
out %al, $0x92
loadgdt:
cli # Turn off interrupts until a Interrupt Vector
# Table (IVT) is set
lgdt (gdtr)
mov %cr0, %eax
or $1, %al
mov %eax, %cr0 # Enable protected mode
jmp $0x08,$init_pm # FAR JMP to next instruction to set
# CS selector with a 32-bit code descriptor and to
# flush the instruction prefetch queue
.code32
init_pm:
# Set remaining 32-bit selectors
mov $DATA_SEG, %ax
mov %ax, %ds
mov %ax, %es
mov %ax, %fs
mov %ax, %gs
mov %ax, %ss
# Start executing kernel
call kmain
cli
loopend: # Infinite loop when finished
hlt
jmp loopend
.align 8
gdt_start:
.long 0 # null descriptor
.long 0
gdt_code:
.word 0xFFFF # limit low
.word 0 # base low
.byte 0 # base middle
.byte 0b10011010 # access
.byte 0b11001111 # granularity/limit high
.byte 0 # base high
gdt_data:
.word 0xFFFF # limit low (Same as code)
.word 0 # base low
.byte 0 # base middle
.byte 0b10010010 # access
.byte 0b11001111 # granularity/limit high
.byte 0 # base high
end_of_gdt:
gdtr:
.word end_of_gdt - gdt_start - 1
# limit (Size of GDT)
.long gdt_start # base of GDT
CODE_SEG = gdt_code - gdt_start
DATA_SEG = gdt_data - gdt_start
kernel.c:
#include <stdint.h>
extern uintptr_t __bss_start[];
extern uintptr_t __bss_end[];
/* Zero the BSS section 4-bytes at a time */
static void zero_bss(void)
{
uint32_t *memloc = __bss_start;
while (memloc < __bss_end)
*memloc++ = 0;
}
int kmain(){
zero_bss();
return 0;
}
link.ld
ENTRY(mbrentry)
SECTIONS
{
. = 0x7C00;
.mbr : {
boot.o(.text);
boot.o(.*);
}
. = 0x7dfe;
.bootsig : {
SHORT(0xaa55);
}
. = 0x7e00;
.kernel : {
*(.text*);
*(.data*);
*(.rodata*);
}
.bss : SUBALIGN(4) {
__bss_start = .;
*(COMMON);
*(.bss*);
}
. = ALIGN(4);
__bss_end = .;
/DISCARD/ : {
*(.eh_frame);
*(.comment);
}
}
To compile, link and generate a binary file that can be used in a disk image from this code, you could use commands like:
as --32 boot.S -o boot.o
gcc -c -m32 -ffreestanding -O3 kernel.c
gcc -ffreestanding -nostdlib -Wl,--build-id=none -m32 -Tlink.ld \
-o boot.elf -lgcc boot.o kernel.o
objcopy -O binary boot.elf boot.bin

The C standard says that static variables must be zero-initialized, even in absence of explicit initializer, so static char vgat_initialized= '\0'; is equivalent to static char vgat_initialized;.
In ELF and other similar formats, the zero-initialized data, such as this vgat_initialized goes to the .bss section. If you load such an executable yourself into memory, you need to explicitly zero the .bss part of the data segment.

The other answers are very complete and very helpful. In turns out that, in my specific case, I just needed to know that static variables initialized to 0 were put in .bss and not .data. Adding a .bss section to the linker script placed a zeroed-out section of memory in the image which solved the problem.

What is fill section shows in the link map file?

Yesterday i created my own u-boot module and want to set text base address at 0xd0020010.
But after compiling, in the .map file generated by linker shows like this
inker script and memory map
0x00000000 . = 0x0
0x00000000 . = ALIGN (0x4)
.text 0xd0020010 0x1f0
0xd0020010 __image_copy_start = .
*(.vectors)
*fill* 0xd0020010 0x10 00
.vectors 0xd0020020 0x60 arch/arm/lib/built-in.o
0xd0020020 _start
0xd0020044 _undefined_instruction
0xd0020048 _software_interrupt
0xd002004c _prefetch_abort
0xd0020050 _data_abort
0xd0020054 _not_used
0xd0020058 _irq
0xd002005c _fiq
You can see above the .vectors section, there are 16 bytes of 0x00 which name is "*fill*".
And my link script is like this
SECTIONS
{
. = 0x00000000;
. = ALIGN(4);
.text :
{
__image_copy_start = .;
*(.vectors)
CPUDIR/start.o (.text*)
*(.text*)
}
.........
I tried to remove ALIGH(4), but it stand still. And 0xd0020010 is a aligned address right? So it should have no matter to do with "ALIGH"
Although the 16 bytes of memory are filled by 0x00, which are nop instructions, but i still wonder why there is a "*fill*" section.

Stack allocation to process and its occupancy by the data segment

Sorry if the questions are dumb, but they are really confusing me!
According to elf standard the binary is divided into segments like text segment (containing code and RO data) and data segment (containing RW & BSS) which is loaded into memory when the program is executed and process is created, with the segments providing information for environment preparation for process execution.
The question is, how it is decided that how much stack to allocate to process, when i am not providing stack size during process creation?
Also, using the data segment we can determine how much memory the process requires (for global variables) but once this memory is allocated how mapping of variables is done with the address space inside this allocated memory?
Lastly, is there any relation of this with scatter loading? which i think is not the case as scatter loading is done when image is to be loaded into memory and once control is passed to OS, the memory to be allocated to executable or applications is take care off by the OS itself!
I know these are too many questions, but any help will be greatly appreciated.
If u can provide any reference books or links where i can study in detail about this, that is also appreciated.
Thanks a tonne! :)

The question is, how it is decided that how much stack to allocate to process, when i am not providing stack size during process creation?
When a new process created, execve() system call is used to load the new program as process image into memory from the current running process image. Which mean execve when new program is loaded replaces older .text, .data segments, heap and reset the stack. Now ELF executable file is mapped into memory address space making stack space getting initialized with environment array and the argument array to main().
In do_execve_common() procedure call under subroutine bprm_mm_init() handles tasks such as,
New instance of mm_struct to manage process address space using call to mm_alloc().
Initialize this instance with init_new_context().
bprm_mm_init() initializes stack.
search_binary_handler() routine searches for suitable binary format i.e load_binary, load_shlib to load programs or dynamic libraries respectively. Followed by mapping memory to virtual address space and making process ready to run when scheduler identifies the process.
Therefore, stack memory finally looks like below, which will appear to main() routine at start of the execution. Now and then each environment of a subset of function calls, including parameters and local variables are stored or pushed in stack memory zone dynamically when the calls happen.
-----------------
| | <--- Top of the Stack
| environmental |
| variables and |
| the other |
| parameters to |
| main() |
_________________ <--- Stack Pointer
| |
| Stack Space |
| |
Also, using the data segment we can determine how much memory the process requires (for global variables) but once this memory is allocated how mapping of variables is done with the address space inside this allocated memory?
Let try figuring out how variables are mapped to different parts of memory segments by debugging a simple C program as follows,
/* File Name: elf.c : Demonstrating Global variables */
#include <stdio.h>
int add_numbers(void);
int value1 = 10; // Global Initialized: .data section
int value2; // Global Initialized: .bss section
int add_numbers(void)
{
int result; // Local Uninitialized: Stack section
result = value1 + value2;
return result;
}
int main(void)
{
int final_result; // Local Uninitialized: Stack section
value2 = 20;
final_result = add_numbers();
printf("The sum of %d + %d is %d\n",
value1, value2, final_result);
}
Using readelf to display .data section header as below,
$readelf -a elf
...
Section Headers:
[26] .data PROGBITS 00000000006c2060 000c2060
00000000000016b0 0000000000000000 WA 0 0 32
[27] .bss NOBITS 00000000006c3720 000c3710
0000000000002bc8 0000000000000000 WA 0 0 32
...
$readelf -x 26 elf
Hex dump of section '.data':
0x006c2060 00000000 00000000 00000000 00000000 ................
0x006c2070 0a000000 00000000 00000000 00000000 ................
...
Let's use GDB to look at what these section contain,
(gdb) disassemble 0x006c2060
Dump of assembler code for function `data_start`:
0x00000000006c2060 <+0>: add %al,(%rax)
0x00000000006c2062 <+2>: add %al,(%rax)
0x00000000006c2064 <+4>: add %al,(%rax)
0x00000000006c2066 <+6>: add %al,(%rax)
End of assembler dump.
The above first address of .data section refers to data_start subroutine.
(gdb) disassemble 0x006c2070
Dump of assembler code for function `value1`:
0x00000000006c2070 <+0>: or (%rax),%al
0x00000000006c2072 <+2>: add %al,(%rax)
End of assembler dump.
....
The above disassemble dumps address of global variable value1 initialized to
10. But we don't see global uninitialized variable value2 in next addresses.
Let's look at printing the address of value2,
(gdb) p &value2
$1 = (int *) 0x6c5eb0
(gdb) info symbol 0x6c5eb0
value2 in section **.bss**
(gdb) disassemble 0x6c5eb0
Dump of assembler code for function `value2`:
0x00000000006c5eb0 <+0>: add %al,(%rax)
0x00000000006c5eb2 <+2>: add %al,(%rax)
End of assembler dump.
Tada! Disassembling reference pointer of value2 revels that the variable is stored in .bss section. This explains how the uninitialized global variables mapped to process memory space.
Lastly, is there any relation of this with scatter loading?
No.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio