Usage of as + gcc vs gcc only for ARM assembly

Usage of as + gcc vs gcc only for ARM assembly - gcc

I am using as and gcc to assemble and create executables of ARM assembly programs, as recommended by this tutorial, as follows:
Given an assembly source file, program.s, I run:
as -o program.o program.s
Then:
gcc -o program program.o
However, running gcc on the assembly source directly like so:
gcc -o program program.s
yields the same result.
Does gcc call as behind the scenes? Is there any reason at all to use both as and gcc given that gcc alone is able to produce an executable from source?
I am running this on a Raspberry Pi 3, Raspbian Jessie (a Debian derivative), gcc 4.9.2, as 2.25.

GCC calls all kinds of things behind the scenes; not just as but ld as well. This is pretty easy to instrument if you want to prove it (replace the CORRECT as and ld and other binaries with ones that say print out their command line, then run GCC and see that binary gets called).
When you use GCC as an assembler, it goes through a C preprocessor, so you can do some fairly disgusting things like this:
start.s
//this is a comment
#this is a comment
#define FOO BAR
.globl _start
_start:
mov sp,#0x80000
bl hello
b .
.globl world
world:
bx lr
And to see more of what is going on, here are other files:
so.h
unsigned int world ( unsigned int, unsigned int );
#define FIVE 5
#define SIX 6
so.c
#include "so.h"
unsigned int hello ( void )
{
unsigned int a,b,c;
a=FIVE;
b=SIX;
c=world(a,b);
return(c+1);
}
build
arm-none-eabi-gcc -save-temps -nostdlib -nostartfiles -ffreestanding -O2 start.s so.c -o so.elf
arm-none-eabi-objdump -D so.elf
producing
00008000 <_start>:
8000: e3a0d702 mov sp, #524288 ; 0x80000
8004: eb000001 bl 8010 <hello>
8008: eafffffe b 8008 <_start+0x8>
0000800c <world>:
800c: e12fff1e bx lr
00008010 <hello>:
8010: e92d4010 push {r4, lr}
8014: e3a01006 mov r1, #6
8018: e3a00005 mov r0, #5
801c: ebfffffa bl 800c <world>
8020: e8bd4010 pop {r4, lr}
8024: e2800001 add r0, r0, #1
8028: e12fff1e bx lr
being a very simple project. Here is so.i after the pre-processor, which goes and gets the include files and replaces the defines:
# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "so.c"
# 1 "so.h" 1
unsigned int world ( unsigned int, unsigned int );
# 4 "so.c" 2
unsigned int hello ( void )
{
unsigned int a,b,c;
a=5;
b=6;
c=world(a,b);
return(c+1);
}
Then GCC calls the actual compiler (whose program name is not GCC).
That produces so.s:
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global hello
.syntax unified
.arm
.fpu softvfp
.type hello, %function
hello:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
push {r4, lr}
mov r1, #6
mov r0, #5
bl world
pop {r4, lr}
add r0, r0, #1
bx lr
.size hello, .-hello
.ident "GCC: (GNU) 6.3.0"
Which is then fed to the assembler to make so.o. Then the linker is called to turn these into so.elf.
Now, you can do most of the calls directly. That doesn't mean that these programs have other programs they call. GCC still calls one or more programs to actually do the compile.
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -S so.c
arm-none-eabi-as so.s -o so.o
arm-none-eabi-ld start.o so.o -o so.elf
arm-none-eabi-objdump -D so.elf
Giving the same result:
00008000 <_start>:
8000: e3a0d702 mov sp, #524288 ; 0x80000
8004: eb000001 bl 8010 <hello>
8008: eafffffe b 8008 <_start+0x8>
0000800c <world>:
800c: e12fff1e bx lr
00008010 <hello>:
8010: e92d4010 push {r4, lr}
8014: e3a01006 mov r1, #6
8018: e3a00005 mov r0, #5
801c: ebfffffa bl 800c <world>
8020: e8bd4010 pop {r4, lr}
8024: e2800001 add r0, r0, #1
8028: e12fff1e bx lr
Using -S with GCC does feel a bit wrong. Using it like this instead feels more natural:
arm-none-eabi-gcc -O2 -c so.c -o so.o
Now there is a linker script that we didn't provide which the toolchain has a default for. We can control that, and depending on what this is being aimed at, perhaps we should.
I am not happy to see that the new/current version of as is tolerant of C comments, etc... Didn't used to be that way, must be a new thing with the latest release.
Thus the term "toolchain" it is a number of tools chained together, one linked to the next in order.
Not all compilers take the assembly language step. Some compile to intermediate code, and then there is another tool that turns that compiler specific intermediate code into assembly language. Then some assembler is called (GCC's intermediate code is inside tables in the compile step, where clang/llvm you can ask it to compile to this code then go from there to assembly language for one of the targets).
Some compilers go straight to machine code and don't stop at assembly language. This is likely one of those "climb the mountain just because it is there" things vs "go around". Like writing an operating system purely in assembly language.
For any decent sized project and a tool that can support it, you are going to have a linker, and an assembler the first tool you make to support a new to you target. The processor (chip or ip or both) vendor is going to have an assembler and then other tools available as well.
Try compiling even the above simple C program by hand using assembly language. Then try it again without using assembly language, by hand, using just machine code. You will find that using assembly language as an intermediate step is far more sane for compiler developers, along with the fact that it has been done this way forever, which is also a good reason to keep doing it this way.
If you wander about in the gnu toolchain directory you are using, you may find programs like cc1:
./libexec/gcc/arm-none-eabi/6.3.0/cc1 --help
The following options are specific to just the language Ada:
None found. Use --help=Ada to show *all* the options supported by the Ada front-end.
The following options are specific to just the language AdaSCIL:
None found. Use --help=AdaSCIL to show *all* the options supported by the AdaSCIL front-end.
The following options are specific to just the language AdaWhy:
None found. Use --help=AdaWhy to show *all* the options supported by the AdaWhy front-end.
The following options are specific to just the language C:
None found. Use --help=C to show *all* the options supported by the C front-end.
The following options are specific to just the language C++:
-Wplacement-new
-Wplacement-new= 0xffffffff
The following options are specific to just the language Fortran:
Now if you run that cc1 program against the so.i file you saved with -save-temps, you get so.s the assembly language file.
You could probably continue to dig into the directory or the gnu tools sources to find even more goodies.
Note this question has been asked before many times here at Stack Overflow in various ways.
Also note main() isn't anything special as I have demonstrated. In some compilers it might be, but I can make programs that don't require that function name.

Related

Fully understanding how .exe file is executed

Goal
I want to understand how executables work. I hope that understanding one very specific example in full detail will enable me to do so. My final (perhaps too ambitious) goal is to take a hello-world .exe file (compiled with a C compiler and linked) and understand in full detail how it is loaded into memory and executed by a x86 processor. If I succeed in doing so, I want to write an article and/or make a video about it, since I have not found something like this on the internet.
Specific questions I want to ask are marked in bold. Of course any further suggestions and sources doing something similar are very welcome. Thanks a lot in advance for any help!
What I need
This Answer gives an overview of the process that C code goes through until it gets into physical memory as a programm. I'm not sure yet how much I want to look into how the C code is compiled. Is there a way to view the Assembly code a C compiler generates before assembling it? I may decide it's worth the effort to understand the processes of loading and linking. In the meantime the most important parts I need to understand are
the PA executable file format
the relation between assembler code and x86 byte-code
the process of loading (i.e. how the process RAM is prepared for execution using information from the executable file).
I have a very basic understanding of the PA format (this understanding will be outlined in the section "What I have learned so far") and I think the sources given there should be sufficient, I just need to look into it some more until I know enough to understand a basic Hello-World programm. Further sources on this topic are of course very welcome.
The translation of byte-code into assembler code (disassembly) seems to be quite difficult for x86. Nonetheless, I would love to learn more about it. How would you go about disassembling a short byte code segment?
I'm still looking for a way to view the contents of a process' memory (the virtual memory assigned to it). I've already looked into windows-kernel32.dll functions such as ReadProcessMemory but couldn't get it to work yet. Also it's strange to me that there don't seem to be (free) tools available for this. Together with an understanding of loading, I may then be able to understand how a process is run from RAM. Also I'm looking for debugging tools for assembly programmers that allow to view the entire process virtual memory conents. My current starting point of this search is this question. Do you have further advice on how I can see and understand loading and process execution from RAM?
What I have learned so far
The rest of this StackOverflow question describes what I have learned so far in some detail and giving various sources. It is meant to be reproducible and help anyone trying to understand this. However, I still do have some questions about the example I looked at so far.
PA format
In Windows, an executable file follows the PA format. The official documentation and this article give a good overview of the format. The format describes what the individual bytes in an .exe file mean. The beginning is a DOS programm (included for legacy reasons) that I will not worry about. Then comes a bunch of headers, which give information about the executable. The actual file contents are split into sections that have names, such as '.rdata'. After the file headers, there are also section headers, which tell you which parts of the file are which section and what each section does (e.g. if it contains executable code).
The headers and sections can be parsed using tools such as dumpbin (microsoft tool to look at binary files). For comparison with dumpbin output, the hex code of a file can be viewed directly with a Hex editor or even using the Powershell (command Format-Hex -Path <Path to file>).
Specific example
I performed these steps for a very simple programm, which does nothing. This is the code:
; NASM assembler programm. Does nothing. Stores string in code section.
; Adapted from stackoverflow.com/a/1029093/9988487
global _main
section .text
_main:
hlt
db 'Hello, World'
I assembled it with NASM (command nasm -fwin32 filename.asm) and linked it with the linker that comes with VS2019 (link /subsystem:console /nodefaultlib /entry:main test.obj). This is adapted from this answer, which demonstrates how to make a hello-world programm for Windows using WinAPI call. The programm runs on Windows 10 and terminates with no output. It takes about 2 sec to run, which seems very long and makes me think there may be some error somehwere?
I then looked at the dumpbin output:
D:\ASM>dumpbin test.exe /ALL
Microsoft (R) COFF/PE Dumper Version 14.22.27905.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file test.exe
PE signature found
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (x86)
2 number of sections
5E96C000 time date stamp Wed Apr 15 10:04:16 2020
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
102 characteristics
Executable
32 bit word machine
OPTIONAL HEADER VALUES
10B magic # (PE32)
14.22 linker version
200 size of code
200 size of initialized data
0 size of uninitialized data
1000 entry point (00401000)
1000 base of code
2000 base of data
400000 image base (00400000 to 00402FFF)
1000 section alignment
200 file alignment
<further header values omitted ...>
SECTION HEADER #1
.text name
E virtual size
1000 virtual address (00401000 to 0040100D)
200 size of raw data
200 file pointer to raw data (00000200 to 000003FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
60000020 flags
Code
Execute Read
RAW DATA #1
00401000: F4 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 0A ôHello, World.
SECTION HEADER #2
.rdata name
58 virtual size
2000 virtual address (00402000 to 00402057)
200 size of raw data
400 file pointer to raw data (00000400 to 000005FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
40000040 flags
Initialized Data
Read Only
RAW DATA #2
00402000: 00 00 00 00 00 C0 96 5E 00 00 00 00 0D 00 00 00 .....À.^........
00402010: 3C 00 00 00 1C 20 00 00 1C 04 00 00 00 00 00 00 <.... ..........
00402020: 00 10 00 00 0E 00 00 00 2E 74 65 78 74 00 00 00 .........text...
00402030: 00 20 00 00 1C 00 00 00 2E 72 64 61 74 61 00 00 . .......rdata..
00402040: 1C 20 00 00 3C 00 00 00 2E 72 64 61 74 61 24 7A . ..<....rdata$z
00402050: 7A 7A 64 62 67 00 00 00 zzdbg...
Debug Directories
Time Type Size RVA Pointer
-------- ------- -------- -------- --------
5E96C000 coffgrp 3C 0000201C 41C
Summary
1000 .rdata
1000 .text
The file header field "characteristics" is a combination of flags. In particular 102h = 1 0000 0010b and the two set flags (according to the PE format doc) are IMAGE_FILE_EXECUTABLE_IMAGE and IMAGE_FILE_BYTES_REVERSED_HI. The latter has description
IMAGE_FILE_BYTES_REVERSED_HI:
Big endian: the MSB precedes the LSB in memory. This flag is deprecated and should be zero.
I ask myself: Why does a modern assembler and a modern linker produce a deprecated flag?
There are 2 sections in the file. The section .text was defined in the assembler code (and is the only one containing executable code, as specified in its header). I don't know what the second section '.rdata' (name seems to refer to "readable data") is or does here. Why was it created? How could I find out?
Disassembly
I used dumpbin to diassemble the .exe file (command dumpbin test.exe /DISASM). It gets the hlt correct, the 'Hello, World.' string is (perhaps unfortunately) interpreted as executable commands. I guess the disassembler can hardly be blamed for this. However, if I understand correctly (I have no practical experience in assembly programming), putting data into a code section is not unheard of (it was done in several examples that I found while looking into assembly programming). Is there a better way to disassemle this, that would be able to reproduce my assembly code better? Also, do compilers sometimes put data into code sections in this way?

In some respects this is a massively broad question that may not survive for that reason. The information is all out there on the internet, keep looking, it is not complicated, and not worthy of a paper or video.
So you have a rough idea that a compiler takes a program written in one language and converts it to another language be that assembly language or machine code or whatever.
Then there are file formats and there are many different ones that we all use the term "binary" for but again, different formats. Ideally they contain, using some form of encoding, the machine code and data or information about the data.
Going to use ARM for now, fixed length instructions easy to disassemble and read, etc.
#define ONE 1
unsigned int x;
unsigned int y = 5;
const unsigned int z = 7;
unsigned int fun ( unsigned int a )
{
return(a+ONE);
}
and gnu gcc/binutils because it is very well know, widely used, you can use it to make programs on your wintel machine. I run Linux so you will see elf not exe, but that is just a file format for what you are asking.
arm-none-eabi-gcc -O2 -c so.c -save-temps -o so.o
This toolchain (chain of tools that are linked for example compiler -> assembler -> linker) is Unix style and modular. You are going to have an assembler for a target so not sure why you would want to re-invent that, and it is so much easier to debug a compiler by looking at the assembly output than trying to go straight to machine code. But there are folks that like to climb the mountain just because it is there rather than go around and some tools go straight for machine code just because its there.
This specific compiler has this save temps feature, gcc itself is a front end program that preps for the real compiler then if asked for (if you don't say not to) will call the assembler and linker.
cat so.i
# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "so.c"
unsigned int x;
unsigned int y = 5;
const unsigned int z = 7;
unsigned int fun ( unsigned int a )
{
return(a+1);
}
So at this point defines and includes are taken care of and its one big file to be sent to the compiler.
The compiler does its thing and turns it onto assembly language
cat so.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global fun
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type fun, %function
fun:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
add r0, r0, #1
bx lr
.size fun, .-fun
.global z
.global y
.comm x,4,4
.section .rodata
.align 2
.type z, %object
.size z, 4
z:
.word 7
.data
.align 2
.type y, %object
.size y, 4
y:
.word 5
.ident "GCC: (GNU) 9.3.0"
which then gets put into an object file, in this case, binutils, linux default, etc
file so.o
so.o: ELF 32-bit LSB relocatable, ARM, EABI5 version 1 (SYSV), not stripped
It is using an elf file format which is easy to find info on, easy to write programs to parse, etc.
I can disassemble this, note that because I am using the disassembler it tries to disassemble everything even if it isn't machine code, sticking to 32 bit arm stuff It can grind through that and when there are real instructions they are shown (aligned and not variable length as used here, so you can disassemble linearly which you cannot with a variable length instruction set and have a hope of success (like x86) you need to disassemble in execution order and then you often miss some due to the nature of the program)
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: e2800001 add r0, r0, #1
4: e12fff1e bx lr
Disassembly of section .data:
00000000 <y>:
0: 00000005 andeq r0, r0, r5
Disassembly of section .rodata:
00000000 <z>:
0: 00000007 andeq r0, r0, r7
Disassembly of section .comment:
00000000 <.comment>:
0: 43434700 movtmi r4, #14080 ; 0x3700
4: 4728203a ; <UNDEFINED> instruction: 0x4728203a
8: 2029554e eorcs r5, r9, lr, asr #10
c: 2e332e39 mrccs 14, 1, r2, cr3, cr9, {1}
10: Address 0x0000000000000010 is out of bounds.
Disassembly of section .ARM.attributes:
00000000 <.ARM.attributes>:
0: 00002941 andeq r2, r0, r1, asr #18
4: 61656100 cmnvs r5, r0, lsl #2
8: 01006962 tsteq r0, r2, ror #18
c: 0000001f andeq r0, r0, pc, lsl r0
10: 00543405 subseq r3, r4, r5, lsl #8
14: 01080206 tsteq r8, r6, lsl #4
18: 04120109 ldreq r0, [r2], #-265 ; 0xfffffef7
1c: 01150114 tsteq r5, r4, lsl r1
20: 01180317 tsteq r8, r7, lsl r3
24: 011a0119 tsteq r10, r9, lsl r1
28: Address 0x0000000000000028 is out of bounds.
and yes the tool put extra stuff in there, but note primarily that I created. some code, some initialized read/write data, some initialized read/write data and some initialized read only data. The toolchain authors can use whatever names they want, they don't even have to use the term section. But from decades of history and communication and terminology .text is generally used for code (as in read only machine code AND related data), .bss for zeroed read/write data although I have seen other names, .data for initialized read/write data and this generation of this tool .rodata for read only initialized data (technically that could land in .text)
And note that they all have an address of zero. they are not linked yet.
Now this is ugly but to avoid adding any more code and if the tool lets me do it, let's link it to make a completely unusable binary (no bootstrap, etc, etc):
arm-none-eabi-ld -Ttext=0x1000 -Tdata=0x2000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <fun>:
1000: e2800001 add r0, r0, #1
1004: e12fff1e bx lr
Disassembly of section .data:
00002000 <y>:
2000: 00000005 andeq r0, r0, r5
Disassembly of section .rodata:
00001008 <z>:
1008: 00000007 andeq r0, r0, r7
Disassembly of section .bss:
00002004 <x>:
2004: 00000000 andeq r0, r0, r0
And now it is linked. The read only items .text and .rodata landed in the .text address space in the order found in the file. The read/write items landed in the .data address space in the order found in the file.
Yes, where was .bss in the object? It is in there, it has no actual data as in bytes that are part of the object, instead it has a name and size and that it is .bss. And for whatever reason the tool does show it from the linked binary.
So back on the term binary. The so.elf binary has the bytes that go in memory that make up the program, but also file format infrastructure plus a symbol table to make the disassembly and debugging easier plus other stuff. Elf is a flexible file format gnu can use it and you get one result some other tool or version of a tool can use it and have a different file. And obviously two compilers can generate different machine code from the same source program not just due to optimizations, the job is to make a functional program in the target language and functional is the opinion of the compiler/tool author.
What about a memory image type file:
arm-none-eabi-objcopy so.elf so.bin -O binary
hexdump -C so.bin
00000000 01 00 80 e2 1e ff 2f e1 07 00 00 00 00 00 00 00 |....../.........|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 05 00 00 00 |....|
00001004
Now how the objcopy tool works is that it starts with the first defined loadable or whatever term you want to use byte and ends with the last one and uses (zero) padding to make the file size match so that the memory image matches from an address perspective. The asterisk means essentially 0 padding. Because we started at 0x1000 with .text and 0x2000 for .data but the first byte of this file (offset 0) is the beginning of .text and 0x1000 byte later which is offset 0x1000 in the file but we know it goes to 0x2000 in memory is the read/write stuff. Also note that the bss zeros are not in the output. The bootstrap is expected to zero those.
There is no information like where in memory this data from this file goes, etc. And if you think a bit about it what if I have one byte at a section I define goes to 0x00000000 and one byte at a section I define goes to 0x80000000 and output this file, yes that is a 0x80000001 byte file even though there are only two useful bytes of relevant information. A 2GB file to hold two bytes. This is why you don't want to output this file format until you have sorted out your linker script and tools.
Same data and two other equally old school formats with a little history of intel vs motorola
arm-none-eabi-objcopy so.elf so.hex -O ihex
cat so.hex
:08100000010080E21EFF2FE158
:0410080007000000DD
:0420000005000000D7
:0400000300001000E9
:00000001FF
arm-none-eabi-objcopy so.elf so.srec -O srec
cat so.srec
S00A0000736F2E7372656338
S10B1000010080E21EFF2FE154
S107100807000000D9
S107200005000000D3
S9031000EC
now these contain the relevant bytes, plus addresses, but not much other information, takes more than two bytes for every byte of data, but compared to a huge file with padding, a worthy trade-off. Both of these formats can be found in use today, not as much as the old days but still there.
And countless other binary file formats and a tool like objdump has a decent list of formats it can generate as well as other linkers and/or tools out there.
What is relevant about all of this is that there is a binary file format of some form that contains the bytes we need to run the program.
What format and what addresses you might ask...That is part of the operating system or the system design. In the case of Windows there are specific file formats and variations perhaps of those formats that are supported by the windows operating system, the specific version you are using. Windows has determined what the address space looks like. Operating systems like this take advantage of the MMU both for virtualizing addresses and protection. Having a virtual address space means every program can live in the same space. All programs can have an address that is zero based for example....
test.c
int main ( void )
{
return 1;
}
hello.c
int main ( void )
{
return 2;
}
gcc test.c -o test
objdump -D test
Disassembly of section .text:
00000000004003e0 <_start>:
4003e0: 31 ed xor %ebp,%ebp
4003e2: 49 89 d1 mov %rdx,%r9
4003e5: 5e pop %rsi
...
gcc hello.c -o hello
objdump -D hello
Disassembly of section .text:
00000000004003e0 <_start>:
4003e0: 31 ed xor %ebp,%ebp
4003e2: 49 89 d1 mov %rdx,%r9
same address, how is that possible won't they sit on top of each other? no virtual machine. And note this is built for a specific Linux on a specific day, etc. The toolchain has a default linker script (notice I didn't specify how to link) for this platform when the compiler was built for this target/platform.
arm-none-eabi-gcc -O2 test.c -c -o test.o
arm-none-eabi-ld test.o -o test.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -D test.elf
test.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <main>:
8000: e3a00001 mov r0, #1
8004: e12fff1e bx lr
same source code, same compiler, built for a different target and system different address.
So for Windows there are definitely going to be rules for the supported binary formats and rules for address spaces that can be used, how to define those spaces in the file.
Then it is a simple matter of the operating systems launcher to read the binary file and put the loadable items into memory at those addresses (in the virtual space that the os has created for this specific program) It is very possible that a feature of the loader is to zero bss for you since the information is there. The low level programmer needs to know that to possibly deal with zeroing .bss or not.
If not you will see and may need to create a solution, unfortunately this is where you get deeper into tool specific items. While C may be somewhat standardized there are tool specific things that are not or at least are standardized by the tool/authors but no reason to assume those cross over to other tools.
.globl _start
_start:
ldr sp,sp_init
bl fun
b .
.word __bss_start__
.word __bss_end__
sp_init:
.word 0x8000
Everything about assembly language is tool specific, the mnemonics for sanity reasons no doubt will resemble the ip/processor vendors documentation which uses syntax that the tool they paid to have developed uses. But beyond that assembly language is wholly defined by the tool not the target, x86 because of its age and other things is really bad about that and this is not the Intel vs AT&T thing, just in general. Gnu assembler is well known for I would assume perhaps intentionally not making compatible languages with other assembly languages. The above is gnu assembler for arm.
Using the fun() function above, C says it should be main() but the tool doesn't care I am already typing enough here.
add a simple ram based linker script
MEMORY
{
ram : ORIGIN = 0x1000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : {
__bss_start__ = .;
*(.bss*)
} > ram
__bss_end__ = .;
}
build it all
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-ld -T sram.ld start.o so.o -o so.elf
examine
arm-none-eabi-nm so.elf
0000102c B __bss_end__
00001028 B __bss_start__
00001018 T fun
00001014 t sp_init
00001000 T _start
00001028 B x
00001024 D y
00001020 R z
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <_start>:
1000: e59fd00c ldr sp, [pc, #12] ; 1014 <sp_init>
1004: eb000003 bl 1018 <fun>
1008: eafffffe b 1008 <_start+0x8>
100c: 00001028 andeq r1, r0, r8, lsr #32
1010: 0000102c andeq r1, r0, r12, lsr #32
00001014 <sp_init>:
1014: 00008000 andeq r8, r0, r0
00001018 <fun>:
1018: e2800001 add r0, r0, #1
101c: e12fff1e bx lr
Disassembly of section .rodata:
00001020 <z>:
1020: 00000007 andeq r0, r0, r7
Disassembly of section .data:
00001024 <y>:
1024: 00000005 andeq r0, r0, r5
Disassembly of section .bss:
00001028 <x>:
1028: 00000000 andeq r0, r0, r0
So now it is possible to add to the bootstrap a memory zeroing loop (do not use C/memset you don't create chicken and egg problems you write the bootstrap in asm) based on the start and end addresses.
Fortunately or unfortunately because the linker script is tool specific and assembly language is tool specific and they need to work together if you are letting the tools do the work for you (the sane way to do it, have fun figuring out where .bss is otherwise).
This can be done on an operating system but when you get into say microcontrollers where it all has to be on non-volatile storage (flash) well it is possible to have one that is downloaded from elsewhere (like your mouse firmware sometimes, sometimes keyboard, etc) into ram, assume flash, so how do you deal with .data??
MEMORY
{
rom : ORIGIN = 0x0000, LENGTH = 0x1000
ram : ORIGIN = 0x1000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.data : {
*(.data*)
} > ram AT > rom
.bss : {
__bss_start__ = .;
*(.bss*)
} > ram
__bss_end__ = .;
}
With gnu ld this basically says that .data's home is in ram, but the output binary formats will put it in flash/rom
so.elf so.srec -O srec
cat so.srec
S00A0000736F2E7372656338
S11300000CD09FE5030000EBFEFFFFEA04100000A4
S11300100810000000800000010080E21EFF2FE1B4
S107002007000000D1 <- z variable at address 0020
S107002405000000CF <- y variable at 0024
S9030000FC
and you have to play with the linker script more to get the tool to tell you both the ram and flash starting addresses and ending addresses or length. then add code in the bootstrap (asm not C) to copy .data from flash to ram.
Also note here per another one of your many questions.
.word __bss_start__
.word __bss_end__
sp_init:
.word 0x8000
These items are technically data. but they live in .text first and foremost because they were defined in the code that was assumed to be .text (I didn't need to state that in the asm, but could have). you will see this in x86 as well, but for fixed length like arm, mips, risc-v, etc where you cant put any old immediate/constant/linked value you want in the instruction itself you put it nearby in a "pool" and do a pc relative read to get it. You will see this for linking externals too:
extern unsigned int x;
int main ( void )
{
return x;
}
arm-none-eabi-gcc -O2 -c test.c -o test.o
arm-none-eabi-objdump -D test.o
test.o: file format elf32-littlearm
Disassembly of section .text.startup:
00000000 <main>:
0: e59f3004 ldr r3, [pc, #4] ; c <main+0xc>
4: e5930000 ldr r0, [r3]
8: e12fff1e bx lr
c: 00000000 andeq r0, r0, r0 <--- the code gets the address of the
variable from here and then reads it from memory
once linked
Disassembly of section .text:
00008000 <main>:
8000: e59f3004 ldr r3, [pc, #4] ; 800c <main+0xc>
8004: e5930000 ldr r0, [r3]
8008: e12fff1e bx lr
800c: 00018010 andeq r8, r1, r0, lsl r0
Disassembly of section .data:
00018010 <x>:
18010: 00000005 andeq r0, r0, r5
for x86
gcc -c -O2 test.c -o test.o
dwelch-desktop so # objdump -D test.o
test.o: file format elf64-x86-64
Disassembly of section .text.startup:
0000000000000000 <main>:
0: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 6 <main+0x6>
6: c3 retq
00000000004003e0 <main>:
4003e0: 8b 05 4a 0c 20 00 mov 0x200c4a(%rip),%eax # 601030 <x>
4003e6: c3 retq
If you squint is it really different? there is data nearby that the processor reads to load into a register and or use. either way, due to the nature of the instruction sets the linker modifies the instruction or nearby pool data or both.
last one:
arm-none-eabi-gcc -S test.c
cat test.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "test.c"
.text
.align 2
.global main
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type main, %function
main:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 1, uses_anonymous_args = 0
# link register save eliminated.
str fp, [sp, #-4]!
add fp, sp, #0
ldr r3, .L3
ldr r3, [r3]
mov r0, r3
add sp, fp, #0
# sp needed
ldr fp, [sp], #4
bx lr
.L4:
.align 2
.L3:
.word x
.size main, .-main
.ident "GCC: (GNU) 9.3.0"
So can you see the assembly language, yes some tools will let you save the intermediate files and/or let you generate the assembly output of the file when compiling.
Can you have data in the code, yes there are times and reasons to have data values in the .text area not just target specific you will see this for various reasons and some toolchains put read only data there.
There are many file formats the ones used by modern operating systems have features not just for defining the bytes that make up the machine code and data values but also will include symbols and other debug information.
The file format and memory space for a program is operating system specific not language nor even target specific (Linux, Windows, MacOS on the same laptop are not expected to have the same rules despite the exact same target computer). A native toolchain for that platform has a default linker script and whatever other information required to build usable/working programs for that target. Including the supported file format.
The machine code and data items can be represented in different file formats in different ways, whether or not the operating system or loader of the target system can use that format depends on that target system.
Programs have bugs and nuances. File formats have versions and inconsistencies, you might find some elf file format reader only to find it doesn't work or prints out strange stuff when fed a perfectly good elf file that works on some system. Why are some flags being set? Perhaps those bytes got re-used or the flag to repurposed or the data structure changed or a tool is using it differently or in a non-standard way (think mov 20h,ax) and another tool that is not compatible can't understand or gets lucky and gets close enough.
Asking "why" questions at Stack Overflow is not very useful, the odds of finding the individual that wrote the thing are very low, better odds of asking the place you got the tool from and following that hoping the person is still alive and willing to be bothered. And 99.999(lots of 9s)% there is no global set of godly rules that the thing was written under/for. General it was some dude just felt like it that is why they did what they did, no real reason, laziness, a bug, intentionally trying to break someone else's tool. All the way up to a large committee of people with an opinion voted on it on a particular day in a particular room and that's why (and we know what we get when we design by committee or try to write specs that nobody conforms to).
I know you are on Windows and I don't have a Windows machine handy and am on Linux. But the gnu/binutils and clang/llvm tools are readily available and have a rich set of tools like readelf, nm, objdump, etc. That assist in examining things, a good tool is going to have that at least internally for the developers so they can debug the output of the tool to a certain quality level. gnu folks made tools and made them available for everyone, and while it takes time to sort through them and their features they are very powerful for the things you are trying to understand.
You are NOT going to find a good x86 disassembler, they are all crap simply because of the nature of the beast. It is a variable length instruction set, so by definition unless you are executing you cant sort it out correctly. You must disassemble in execution order from a known good entry point to have half a chance, and then for various reasons there are code paths you cannot see that way (think jump tables for example, or dlls or so files). The BEST solution is to have a very accurate/perfect emulator/simulator and run the code and perform all the actions/gyrations you need to do to get it to cover all the code paths, and have that tool record instructions from data and where each is located or each linear section without a branch.
The good side of this is that a lot of code is compiled today using tools that are not trying to hide anything. In the old days for various reasons you would see hand written asm that intentionally tried to prevent disassembly or due to other factors (hand editing a binary rom image for a video game the day before the trade show, go disassemble some of the classic roms).
mov r0,#0
cmp r0,#0
jz somewhere
.word 0x12345678
A disassembler isn't going to figure this out, some might add a case for that then
mov r0,#0
nop
nop
xor r0,#1
nop
nop
xor r0,#3
xor r0,#2
cmp r0,#0
jz somewhere
.word 0x12345678
and it thinks that data is an instruction, for variable length that is super hard for a disassembler to resolve a decent one will at least detect collisions where the non opcode part of the instruction is branched to and/or an opcode part of an instruction shows up later as additional bytes in some other instruction. The tool cant resolve it a human has to.
Even with arm and mips and having 32 and 16 bit instructions, risc-v with variable sized instructions, etc...
Very often gnu's disassembler will get tripped up with x86.

I don't think I'll be able to answer to everything. I am a beginner too so I may say some things not exact. But, I'll try my best and I think I can bring you some things.
No, compilers do not put data in code sections (correct me if I am wrong). There is the section .data (for initialized data) and section .bss (for uninitialized data).
I think, I'll better show you an example of a program which prints hello world (for linux because it's much simpler and I don't know how to do with windows. in x64 but it's like x86. Just the names of the syscalls and the registers that are different. x64 is for 64 bits and x86 for 32 bits).
BITS 64 ;not obligatory but I prefer
section .data
msg db "hello world" ;the message
len equ $-msg ;the length of msg
section .text
global _start
_start: ;the entry point
mov rax, 1 ;syscall 1 to print something
mov rdi, 1 ;1 for stdout
mov rsi, msg ;the message
mov rdx, len ;length in rdx
syscall
mov rax, 60 ;exit syscall
mov rdi, 0 ;exit with 0
syscall
(https://tio.run/#assembly-nasm if you don't want to use a VM. I advise you to look for WSL + vscode if you are using windows. you will have linux in your windows and vscode has an extension to have an access to the files in windows) but
If you wanna disassemble the code or see what is the memory, you can use gdb or radare2 in linux. For windows, there are other tools such as ghidra, IDA, olly dbg..
I don't know any way to make the compiler create a better assembly code. but it doesn't mean it doesn't exist.
I have never made anything for windows. However, to link my object file, I use ld (I don't know if it will be helpful).
ld object.o -o compiledprogram
I don't have time right now to continue writing so I can't advise you any courses right now.. I'll see later.
Hope it has helped you.

Answers to questions in your text:
1. You can see process execution step by step and process memory with debugger. I used OllyDbg for learning assembly, it's free and powerful debugger.
2. Process is loaded by Windows kernel after calling NtCreateUserProcess so I think that you would need kernel debugging to see how it is done.
3. Code that is debugged in OllyDbg is automatically disassembled.
4. You can put read-only data in ".text" section. You can change section flags to make it writable, then code and data can be mixed. Some compilers may merge ".text" and ".rdata" sections.
I would recommend that you read about PE imports, exports, relocations and resources in that order. If you want to see easiest possible i386 PE helloworld you can check my hello_world_pe_i386_dynamic.exe program here: https://github.com/pajacol/hello-world. I wrote it entirely in binary file editor. It contains only required data structures. This executable is position independent and can be loaded at any address without relocations.

How to use inline assembly instructions that are supported by GNU GAS but are newer than the latest -march my GCC has?

To test the performance feature of an assembly language feature, I would like to use it inside a C program with inline assembly to replace existing functionality.
My GNU GAS assembler already recognizes the instruction with an appropriate -march.
How to make GCC generate code that accepts such inline assembly instructions that is cannot yet emit and does not recognize?
For example, the ARM SVE extensions were more or less recently added to the ARM architecture.
So in a minimal useless inline SVE assembly example I try:
main.c
int main(void) {
__asm__ __volatile__ ("ld1d z1.d, p0/z, [x0, x4, lsl 3]");
}
and I try to compile with:
aarch64-linux-gnu-gcc -c -o main.o --save-temps -v main.c
but assembly fails with:
main.s: Assembler messages:
main.s:10: Error: selected processor does not support `ld1d z1.d,p0/z,[x0,x4,lsl 3]'
The verbose output contains:
/usr/lib/gcc-cross/aarch64-linux-gnu/7/../../../../aarch64-linux-gnu/bin/as -v -EL -mabi=lp64 -o main.o main.s
and main.s contains:
.arch armv8-a
.file "main.c"
.text
.align 2
.global main
.type main, %function
main:
#APP
// 2 "main.c" 1
ld1d z1.d, p0/z, [x0, x4, lsl 3]
// 0 "" 2
#NO_APP
mov w0, 0
ret
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 7.4.0-1ubuntu1~18.04.1) 7.4.0"
.section .note.GNU-stack,"",#progbits
However, my assembler is new enough to assemble that, since I managed to get it to assemble by either:
replacing .arch armv8-a in the generated main.s file manually with .arch armv8-a+sve
removing .arch armv8-a in main.s and adding -march=all or -march=armv8-a+sve on the GAS CLI (TODO .arch all does not exist unfortunately, I wonder why?)
However, if I try to add -march=armv8-a+sve or -march=all to the GCC CLI, it complains respectively with:
cc1: error: invalid feature modifier in ‘-march=armv8-a+sve’
cc1: error: unknown value ‘all’ for -march
I also tried to add -Xassembler -march=all to the GCC CLI:
aarch64-linux-gnu-gcc -Xassembler -march=all -c -o main.o --save-temps -v main.c
and the option does get forwarded to the assembler according to -v, but it still fails, because the .arch armv8-a that GCC puts into the assembly file takes precedence.
Out of desperation, I finally tried to modify my C example to:
int main(void) {
__asm__ __volatile__ (".arch armv8-a+sve\nld1d z1.d, p0/z, [x0, x4, lsl 3]");
}
and that did work, but I think it is too insane to be usable. Notably, the .arch armv8-a+sve stays in effect even after my inline assembly statement, and I don't know how to revert to the previous default GCC .arch.
I also looked for GCC -masm options which might be useful like this one but could not find anything interesting.
Tested in Ubuntu 18.04, aarch64-linux-gnu-gcc 7.4.0, aarch64-linux-gnu-as 2.30.

cannot access memory at address 0x10084 when trying to set breakpoint via gdb

I wrote this simple assembly-program (based on a tutorial, only slightly changed.)
# p = q + r + s
# let q=2, r=4, s=5
# this version of the simple-equation stores in memory
p: .space 4 #reserve 4 bytes in memory for variable p
q: .word 2 #create 32-bit variable q with initial value of 2
r: .word 4
s: .word 5
.global _start
_start:
ldr r1,q #load r1 with q
ldr r2,r #load r2 with r
ldr r3,s #load r3 with s
add r0,r1,r2
add r0,r0,r3
mov r7,#1 #syscall to terminate the program
svc 0
.end
I assemble the program using as -g -o main.o main.s
Then i link the object-file usind ld main.o -o main
Then i execute gdb main
Now, when trying to insert a breakpoint at any line-number, i get the error that is the title of this post (cannot access memory at address 0x10084).
As this program's code is based off of a tutorial, and the teacher in the tutorial uses a codeblocks-project and
.global main
main:
instead of
.global _start
_start:
i assume that this is where my error might come from (although not understanding how this results in not being able to set a breakpoint via gdb, while not getting any error while assembling and linking).
I would be very greatful if anyone could shed some light on this for me.
Thanks in advance!
edit:
having been asked what the output of objdump -d main might look like, i add the output of the command here:
main: file format elf32-littlearm
Disassembly of section .text:
00010054 <p>:
10054: 00000000 .word 0x00000000
00010058 <q>:
10058: 00000002 .word 0x00000002
0001005c <r>:
1005c: 00000004 .word 0x00000004
00010060 <s>:
10060: 00000005 .word 0x00000005
00010064 <_start>:
10064: e51f1014 ldr r1, [pc, #-20] ; 10058 <q>
10068: e51f2014 ldr r2, [pc, #-20] ; 1005c <r>
1006c: e51f3014 ldr r3, [pc, #-20] ; 10060 <s>
10070: e0810002 add r0, r1, r2
10074: e0800003 add r0, r0, r3
10078: e3a07001 mov r7, #1
1007c: ef000000 svc 0x00000000
readelf -a main told me (among other things), that my entry point is 0x10064.
I already tried to use main instead of _start, however then during disassembling and linking i get an error telling me that no entry point has been found.
edit:
Given the address of the entry-point, i ran the program again using gdb, then set a breakpoint to the specified address. It did so without complaining, and when running, execution indeed stops at the breakpoint. So the issue seems to be that the address 0x10084 that gdb wants to use for my breakpoint linenum command just doesn't correspond to the addresses that the instructions at the corresponding lines really have.
Using the gdb command info line 'linenumber' just confirms my
assumption. It prints out memory addresses and i can indeed set breakpoints to the printed addresses, but when i try to set a breakpoint specifying the line-number, gdb always wants to use 0x10084 and fails.
Does anybody have an idea, how this behaviour comes about, and what might be ways to fix it?

Fix relocations for global variables in position-independent executables with GCC

I'm looking for a gcc command-line flag or other settings to produce GOTOFF relocations rather than GOT relocations for my statically linked, position-independent i386 executable. More details on what I was trying below.
My source file g1.s looks like this:
extern int answer;
int get_answer1() { return answer; }
My other source file g2.s looks like this:
extern int answer;
int get_answer2() { return answer; }
I compile them with gcc -m32 -fPIE -Os -static -S -ffreestanding -fomit-frame-pointer -fno-unwind-tables -fno-asynchronous-unwind-tables g1.c for i386.
I get the following assembly output:
.file "g1.c"
.text
.globl get_answer1
.type get_answer1, #function
get_answer1:
call __x86.get_pc_thunk.cx
addl $_GLOBAL_OFFSET_TABLE_, %ecx
movl answer#GOT(%ecx), %eax
movl (%eax), %eax
ret
.size get_answer1, .-get_answer1
.section .text.__x86.get_pc_thunk.cx,"axG",#progbits,__x86.get_pc_thunk.cx,comdat
.globl __x86.get_pc_thunk.cx
.hidden __x86.get_pc_thunk.cx
.type __x86.get_pc_thunk.cx, #function
__x86.get_pc_thunk.cx:
movl (%esp), %ecx
ret
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
Here is how to reproduce this behavior online with GCC 7.2: https://godbolt.org/g/XXkxJh
Instead of GOT above, I'd like to get GOTOFF, and the movl %(eax), %eax should disappear, so the assembly code for the function should look like this:
get_answer1:
call __x86.get_pc_thunk.cx
addl $_GLOBAL_OFFSET_TABLE_, %ecx
movl answer#GOTOFF(%ecx), %eax
ret
I have verified that this GOTOFF assembly version is what works, and the GOT version doesn't work (because it has an extra pointer indirection).
How can I convince gcc to generate the GOTOFF version? I've tried various combinations of -fPIC, -fpic, -fPIE, -fpie, -pie, -fno-plt. None of them worked, all of them made gcc produce the GOT version.
I couldn't find any i386-specific flag on https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html or any generic flag here: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
In fact, I'm getting GOTOFF relocations for "..." string literals, and I also want to get them for extern variables.
The final output is a statically linked executable in a custom binary format (for which I've written a GNU ld linker script). There is no dynamic linking and no shared libraries. The address randomization is performed by a custom loader, which is free to load the executable to any address. So I do need position-independent code. There is no per-segment memory mapping: the entire executable is loaded as is, contiguously.
All the documentation I've been able to find online talk about position-independent executables which are dynamically linked, and I wasn't able to find anything useful there.

I wasn't able to solve this with gcc -fPIE, so I solved it manually, by processing the output file.
I use gcc -Wl,-q, with an output ELF executable file containing the relocations. I post-process this ELF executable file, and I add the following assembly instructions to the beginning:
call next
next:
pop ebx
add [ebx + R0 + (after_add - next)], ebx
add [ebx + R1 + (after_add - next)], ebx
add [ebx + R2 + (after_add - next)], ebx
...
after_add:
, where R0, R1, R2 ... are the addresses of R_386_32 relocations in the ELF executable. The In use objdump -O binary prog.elf prog.bin', and nowprog.bin' contains position-independent code, because it starts with the `add [ebx + ...], ebx' instructions, which do the necessary relocations to the code when the code starts running.
Depending on the execution environment, the gcc flag -Wl,-N is needed, to make the .text section writable (the `add [ebx + ...], ebx' instructions need that).

Nasm Dwarf Error Bad Offset

I have a simple Hello World program for Windows in pure x86 assembly code that I have compiled and linked with nasm and ld. The problem I am running into is that I can't get DWARF debugging to work. I am using gdb from Mingw64 (i686-posix-dwarf-rev1). This same problem happens if I use gcc to link instead of ld. But, the program builds fine, and if I use STABS debugging, then everything is fine and dandy.
EDIT: Oops, I completely forgot to give the error that gdb shows.
...Dwarf Error: bad offset (0x407000) in compilation unit header (offset 0x0
+ 6) [in module C:\Projects\AsmProjects\HelloWorldWin32\bin\x86\hello32.exe]
(no debugging symbols found)...done
The versions of each program are:
gdb 7.10.1
nasm 2.12.02
ld 2.25
gcc 6.2.0
These are the flags I'm sending to nasm: -f elf32 -Fdwarf -g
These are the flags for gcc link: -o $(BDIR)/x86/$#.exe $^ -L$(Mingw64-x86libs) -lkernel32 -luser32
And these are from ld link:
-mi386pe -o $(BDIR)/x86/$#.exe $^ -L$(Mingw64-x86libs) -lkernel32 -luser32
I have a pretty big makefile, so I'm trying to give the least information that is absolutely neccessary.
Here is the source code for the program:
global _main
extern _GetStdHandle#4
extern _WriteFile#20
extern _ExitProcess#4
section .text
_main:
push ebp
mov ebp,esp
; GetstdHandle( STD_OUTPUT_HANDLE)
push -11
call _GetStdHandle#4
mov ebx, eax
; WriteFile( hstdOut, message, length(message), &bytes, 0);
push 0
push esp
push message_end
push message
push ebx
call _WriteFile#20
; ExitProcess(0)
push 0
call _ExitProcess#4
section .data
message db 'Hello, World',10
message_end equ $ - message

This is not a proper answer but was too long for the comment section.
I compiled on Ubuntu and then ran dwarfdump
It gave an error that may be related to the offset error.
dwarfdump ERROR: dwarf_get_globals: DW_DLE_PUBNAMES_VERSION_ERROR (123)
From a similar error on LLVM, I conclude that the dwarf version information is possibly corrupt or unsupported.
This post indicates that the dwarf information is sensitive to the proper section names. The example appears to have the section names right however.
Have you tried a 64-bit version? Perhaps a clue will appear.
This program appears to work fine Ubuntu. Can you try it on Mingw64?
section .text
global _start ;must be declared for linker (ld)
_start: ;tell linker entry point
mov edx,len ;message length
mov ecx,msg ;message to write
mov ebx,1 ;file descriptor (stdout)
mov eax,4 ;system call number (sys_write)
int 0x80 ;call kernel
mov eax,1 ;system call number (sys_exit)
int 0x80 ;call kernel
section .data
msg db 'Hello, world!',0xa ;our dear string
len equ $ - msg ;length of our dear string

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio