transforming srec checksums

transforming srec checksums - crc32

Assume that I have a binary file that contains:
uint32_t data[] = {
0x00000000, 0x11111111, 0x22222222, 0x33333333, 0x44444444,
0x55555555, 0x66666666, 0x77777777, 0x88888888, 0x99999999};
The IAR linker generates the checksum 55 1D 81 96, which srec_cat can replicate this way:
srec_cat data.srec -crop 0x00 0x28 -Bit_Reverse -CRC32BE 0x28 -Bit_Reverse -XOR 0xff -crop 0x28 0x2c -o - -hex-dump
Given the same data, the CRC32 accelerator (IEEE-802.3 style) in my target hardware generates F8 EE 40 0B, which srec_cat can replicate this way:
srec_cat data.srec -crop 0x00 0x28 -CRC32BE 0x28 -crop 0x28 0x2c -o - -hex-dump
I cannot change what the IAR linker has generated, nor can I change the algorithm that the hardware accelerator uses. Given that, is there a way to transform the IAR-style checksum (55 1D 81 96) post-facto into the IEEE-803.2-style checksum (F8 EE 40 0B)?
I've stared at it, and I don't see anything obvious. (If need be, I can use srec_cat to replace the IAR-style checksum with the IEEE-802.3-style checksum as a post-build step...)

No, there is no way to transform one of those CRCs to get the other. You have to compute the desired CRC on the message.
For reference, the first CRC is CRC-32/ISO-HDLC. The second is CRC-32/MPEG-2.

Related

How do I implement an FFI from Rust in assembler?

My Rust code needs to make winapi FFIs and I see winapi-rs is very popular. What I need now, is to see the actual instructions of these FFIs. The binary object files are available on github (for example GLU32).
Just as an example, it contains a 663 bytes object file dsjfbs00001.o, which I'd like to disassemble and see the instructions. I've tried without giving an offset (which means it starts at 0):
objdump -b binary -Mintel,x86-64 -m i386 -D dsjfbs00001.o
This line comes from the similar question Disassembling A Flat Binary File Using objdump and I get this output (I show just the first 16 lines, it goes on for 247 lines):
dsjfbs00001.o: file format binary
Disassembly of section .data:
00000000 <.data>:
0: 64 86 07 xchg BYTE PTR fs:[rdi],al
3: 00 00 add BYTE PTR [rax],al
5: 00 00 add BYTE PTR [rax],al
7: 00 84 01 00 00 0a 00 add BYTE PTR [rcx+rax*1+0xa0000],al
e: 00 00 add BYTE PTR [rax],al
10: 00 00 add BYTE PTR [rax],al
12: 04 00 add al,0x0
14: 2e 74 65 cs je 0x7c
17: 78 74 js 0x8d
...
I have some knowledge about assembler, but here I'm at a loss. The executable code obviously doesn't start at 0 so I wonder how can I discover the correct offset?
The output shows that this is the .data section. But how does it tell this? It this a guess? A hexdump returns exactly the same bytes with no header bytes (i.e. such as an elf file would have):
0000000 8664 0007 0000 0000 0184 0000 000a 0000
0000010 0000 0004 742e 7865 0074 0000 0000 0000
Endianness aside, it starts with 0x64, 0x86, 0x07, as seen above for the xchg opcode. So how can it tell it's a .data section? And then... where's the .text section I'm interested in? It never says there's one.
From all of this I deduce that without an actual offset it's impossible to tell where the entry point is. Actually, the initial ~600 bytes contain many zeroes, while the last ~60 bytes have the typical entropy you'd expect from executable code. But I don't know how to determine this offset exactly by searching in the winapi-rs repo (the *.def files look useless to me, they just list the available routine names).
And as an additional question, would it be feasible to create those file on my own? Can't I just take/write some assembly code, produce an object file with NASM or similar, and use that for FFIs from my Rust code? Is this even possible?
Where would I start doing something like this, if I don't even have C/C++ WinAPI header files or Visual Studio?
BTW: I really need just some ~10 functions of GLU32, not the whole winapi.

Fully understanding how .exe file is executed

Goal
I want to understand how executables work. I hope that understanding one very specific example in full detail will enable me to do so. My final (perhaps too ambitious) goal is to take a hello-world .exe file (compiled with a C compiler and linked) and understand in full detail how it is loaded into memory and executed by a x86 processor. If I succeed in doing so, I want to write an article and/or make a video about it, since I have not found something like this on the internet.
Specific questions I want to ask are marked in bold. Of course any further suggestions and sources doing something similar are very welcome. Thanks a lot in advance for any help!
What I need
This Answer gives an overview of the process that C code goes through until it gets into physical memory as a programm. I'm not sure yet how much I want to look into how the C code is compiled. Is there a way to view the Assembly code a C compiler generates before assembling it? I may decide it's worth the effort to understand the processes of loading and linking. In the meantime the most important parts I need to understand are
the PA executable file format
the relation between assembler code and x86 byte-code
the process of loading (i.e. how the process RAM is prepared for execution using information from the executable file).
I have a very basic understanding of the PA format (this understanding will be outlined in the section "What I have learned so far") and I think the sources given there should be sufficient, I just need to look into it some more until I know enough to understand a basic Hello-World programm. Further sources on this topic are of course very welcome.
The translation of byte-code into assembler code (disassembly) seems to be quite difficult for x86. Nonetheless, I would love to learn more about it. How would you go about disassembling a short byte code segment?
I'm still looking for a way to view the contents of a process' memory (the virtual memory assigned to it). I've already looked into windows-kernel32.dll functions such as ReadProcessMemory but couldn't get it to work yet. Also it's strange to me that there don't seem to be (free) tools available for this. Together with an understanding of loading, I may then be able to understand how a process is run from RAM. Also I'm looking for debugging tools for assembly programmers that allow to view the entire process virtual memory conents. My current starting point of this search is this question. Do you have further advice on how I can see and understand loading and process execution from RAM?
What I have learned so far
The rest of this StackOverflow question describes what I have learned so far in some detail and giving various sources. It is meant to be reproducible and help anyone trying to understand this. However, I still do have some questions about the example I looked at so far.
PA format
In Windows, an executable file follows the PA format. The official documentation and this article give a good overview of the format. The format describes what the individual bytes in an .exe file mean. The beginning is a DOS programm (included for legacy reasons) that I will not worry about. Then comes a bunch of headers, which give information about the executable. The actual file contents are split into sections that have names, such as '.rdata'. After the file headers, there are also section headers, which tell you which parts of the file are which section and what each section does (e.g. if it contains executable code).
The headers and sections can be parsed using tools such as dumpbin (microsoft tool to look at binary files). For comparison with dumpbin output, the hex code of a file can be viewed directly with a Hex editor or even using the Powershell (command Format-Hex -Path <Path to file>).
Specific example
I performed these steps for a very simple programm, which does nothing. This is the code:
; NASM assembler programm. Does nothing. Stores string in code section.
; Adapted from stackoverflow.com/a/1029093/9988487
global _main
section .text
_main:
hlt
db 'Hello, World'
I assembled it with NASM (command nasm -fwin32 filename.asm) and linked it with the linker that comes with VS2019 (link /subsystem:console /nodefaultlib /entry:main test.obj). This is adapted from this answer, which demonstrates how to make a hello-world programm for Windows using WinAPI call. The programm runs on Windows 10 and terminates with no output. It takes about 2 sec to run, which seems very long and makes me think there may be some error somehwere?
I then looked at the dumpbin output:
D:\ASM>dumpbin test.exe /ALL
Microsoft (R) COFF/PE Dumper Version 14.22.27905.0
Copyright (C) Microsoft Corporation. All rights reserved.
Dump of file test.exe
PE signature found
File Type: EXECUTABLE IMAGE
FILE HEADER VALUES
14C machine (x86)
2 number of sections
5E96C000 time date stamp Wed Apr 15 10:04:16 2020
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
102 characteristics
Executable
32 bit word machine
OPTIONAL HEADER VALUES
10B magic # (PE32)
14.22 linker version
200 size of code
200 size of initialized data
0 size of uninitialized data
1000 entry point (00401000)
1000 base of code
2000 base of data
400000 image base (00400000 to 00402FFF)
1000 section alignment
200 file alignment
<further header values omitted ...>
SECTION HEADER #1
.text name
E virtual size
1000 virtual address (00401000 to 0040100D)
200 size of raw data
200 file pointer to raw data (00000200 to 000003FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
60000020 flags
Code
Execute Read
RAW DATA #1
00401000: F4 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 0A ôHello, World.
SECTION HEADER #2
.rdata name
58 virtual size
2000 virtual address (00402000 to 00402057)
200 size of raw data
400 file pointer to raw data (00000400 to 000005FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
40000040 flags
Initialized Data
Read Only
RAW DATA #2
00402000: 00 00 00 00 00 C0 96 5E 00 00 00 00 0D 00 00 00 .....À.^........
00402010: 3C 00 00 00 1C 20 00 00 1C 04 00 00 00 00 00 00 <.... ..........
00402020: 00 10 00 00 0E 00 00 00 2E 74 65 78 74 00 00 00 .........text...
00402030: 00 20 00 00 1C 00 00 00 2E 72 64 61 74 61 00 00 . .......rdata..
00402040: 1C 20 00 00 3C 00 00 00 2E 72 64 61 74 61 24 7A . ..<....rdata$z
00402050: 7A 7A 64 62 67 00 00 00 zzdbg...
Debug Directories
Time Type Size RVA Pointer
-------- ------- -------- -------- --------
5E96C000 coffgrp 3C 0000201C 41C
Summary
1000 .rdata
1000 .text
The file header field "characteristics" is a combination of flags. In particular 102h = 1 0000 0010b and the two set flags (according to the PE format doc) are IMAGE_FILE_EXECUTABLE_IMAGE and IMAGE_FILE_BYTES_REVERSED_HI. The latter has description
IMAGE_FILE_BYTES_REVERSED_HI:
Big endian: the MSB precedes the LSB in memory. This flag is deprecated and should be zero.
I ask myself: Why does a modern assembler and a modern linker produce a deprecated flag?
There are 2 sections in the file. The section .text was defined in the assembler code (and is the only one containing executable code, as specified in its header). I don't know what the second section '.rdata' (name seems to refer to "readable data") is or does here. Why was it created? How could I find out?
Disassembly
I used dumpbin to diassemble the .exe file (command dumpbin test.exe /DISASM). It gets the hlt correct, the 'Hello, World.' string is (perhaps unfortunately) interpreted as executable commands. I guess the disassembler can hardly be blamed for this. However, if I understand correctly (I have no practical experience in assembly programming), putting data into a code section is not unheard of (it was done in several examples that I found while looking into assembly programming). Is there a better way to disassemle this, that would be able to reproduce my assembly code better? Also, do compilers sometimes put data into code sections in this way?

In some respects this is a massively broad question that may not survive for that reason. The information is all out there on the internet, keep looking, it is not complicated, and not worthy of a paper or video.
So you have a rough idea that a compiler takes a program written in one language and converts it to another language be that assembly language or machine code or whatever.
Then there are file formats and there are many different ones that we all use the term "binary" for but again, different formats. Ideally they contain, using some form of encoding, the machine code and data or information about the data.
Going to use ARM for now, fixed length instructions easy to disassemble and read, etc.
#define ONE 1
unsigned int x;
unsigned int y = 5;
const unsigned int z = 7;
unsigned int fun ( unsigned int a )
{
return(a+ONE);
}
and gnu gcc/binutils because it is very well know, widely used, you can use it to make programs on your wintel machine. I run Linux so you will see elf not exe, but that is just a file format for what you are asking.
arm-none-eabi-gcc -O2 -c so.c -save-temps -o so.o
This toolchain (chain of tools that are linked for example compiler -> assembler -> linker) is Unix style and modular. You are going to have an assembler for a target so not sure why you would want to re-invent that, and it is so much easier to debug a compiler by looking at the assembly output than trying to go straight to machine code. But there are folks that like to climb the mountain just because it is there rather than go around and some tools go straight for machine code just because its there.
This specific compiler has this save temps feature, gcc itself is a front end program that preps for the real compiler then if asked for (if you don't say not to) will call the assembler and linker.
cat so.i
# 1 "so.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 1 "so.c"
unsigned int x;
unsigned int y = 5;
const unsigned int z = 7;
unsigned int fun ( unsigned int a )
{
return(a+1);
}
So at this point defines and includes are taken care of and its one big file to be sent to the compiler.
The compiler does its thing and turns it onto assembly language
cat so.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 2
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "so.c"
.text
.align 2
.global fun
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type fun, %function
fun:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 0, uses_anonymous_args = 0
# link register save eliminated.
add r0, r0, #1
bx lr
.size fun, .-fun
.global z
.global y
.comm x,4,4
.section .rodata
.align 2
.type z, %object
.size z, 4
z:
.word 7
.data
.align 2
.type y, %object
.size y, 4
y:
.word 5
.ident "GCC: (GNU) 9.3.0"
which then gets put into an object file, in this case, binutils, linux default, etc
file so.o
so.o: ELF 32-bit LSB relocatable, ARM, EABI5 version 1 (SYSV), not stripped
It is using an elf file format which is easy to find info on, easy to write programs to parse, etc.
I can disassemble this, note that because I am using the disassembler it tries to disassemble everything even if it isn't machine code, sticking to 32 bit arm stuff It can grind through that and when there are real instructions they are shown (aligned and not variable length as used here, so you can disassemble linearly which you cannot with a variable length instruction set and have a hope of success (like x86) you need to disassemble in execution order and then you often miss some due to the nature of the program)
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <fun>:
0: e2800001 add r0, r0, #1
4: e12fff1e bx lr
Disassembly of section .data:
00000000 <y>:
0: 00000005 andeq r0, r0, r5
Disassembly of section .rodata:
00000000 <z>:
0: 00000007 andeq r0, r0, r7
Disassembly of section .comment:
00000000 <.comment>:
0: 43434700 movtmi r4, #14080 ; 0x3700
4: 4728203a ; <UNDEFINED> instruction: 0x4728203a
8: 2029554e eorcs r5, r9, lr, asr #10
c: 2e332e39 mrccs 14, 1, r2, cr3, cr9, {1}
10: Address 0x0000000000000010 is out of bounds.
Disassembly of section .ARM.attributes:
00000000 <.ARM.attributes>:
0: 00002941 andeq r2, r0, r1, asr #18
4: 61656100 cmnvs r5, r0, lsl #2
8: 01006962 tsteq r0, r2, ror #18
c: 0000001f andeq r0, r0, pc, lsl r0
10: 00543405 subseq r3, r4, r5, lsl #8
14: 01080206 tsteq r8, r6, lsl #4
18: 04120109 ldreq r0, [r2], #-265 ; 0xfffffef7
1c: 01150114 tsteq r5, r4, lsl r1
20: 01180317 tsteq r8, r7, lsl r3
24: 011a0119 tsteq r10, r9, lsl r1
28: Address 0x0000000000000028 is out of bounds.
and yes the tool put extra stuff in there, but note primarily that I created. some code, some initialized read/write data, some initialized read/write data and some initialized read only data. The toolchain authors can use whatever names they want, they don't even have to use the term section. But from decades of history and communication and terminology .text is generally used for code (as in read only machine code AND related data), .bss for zeroed read/write data although I have seen other names, .data for initialized read/write data and this generation of this tool .rodata for read only initialized data (technically that could land in .text)
And note that they all have an address of zero. they are not linked yet.
Now this is ugly but to avoid adding any more code and if the tool lets me do it, let's link it to make a completely unusable binary (no bootstrap, etc, etc):
arm-none-eabi-ld -Ttext=0x1000 -Tdata=0x2000 so.o -o so.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000001000
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <fun>:
1000: e2800001 add r0, r0, #1
1004: e12fff1e bx lr
Disassembly of section .data:
00002000 <y>:
2000: 00000005 andeq r0, r0, r5
Disassembly of section .rodata:
00001008 <z>:
1008: 00000007 andeq r0, r0, r7
Disassembly of section .bss:
00002004 <x>:
2004: 00000000 andeq r0, r0, r0
And now it is linked. The read only items .text and .rodata landed in the .text address space in the order found in the file. The read/write items landed in the .data address space in the order found in the file.
Yes, where was .bss in the object? It is in there, it has no actual data as in bytes that are part of the object, instead it has a name and size and that it is .bss. And for whatever reason the tool does show it from the linked binary.
So back on the term binary. The so.elf binary has the bytes that go in memory that make up the program, but also file format infrastructure plus a symbol table to make the disassembly and debugging easier plus other stuff. Elf is a flexible file format gnu can use it and you get one result some other tool or version of a tool can use it and have a different file. And obviously two compilers can generate different machine code from the same source program not just due to optimizations, the job is to make a functional program in the target language and functional is the opinion of the compiler/tool author.
What about a memory image type file:
arm-none-eabi-objcopy so.elf so.bin -O binary
hexdump -C so.bin
00000000 01 00 80 e2 1e ff 2f e1 07 00 00 00 00 00 00 00 |....../.........|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001000 05 00 00 00 |....|
00001004
Now how the objcopy tool works is that it starts with the first defined loadable or whatever term you want to use byte and ends with the last one and uses (zero) padding to make the file size match so that the memory image matches from an address perspective. The asterisk means essentially 0 padding. Because we started at 0x1000 with .text and 0x2000 for .data but the first byte of this file (offset 0) is the beginning of .text and 0x1000 byte later which is offset 0x1000 in the file but we know it goes to 0x2000 in memory is the read/write stuff. Also note that the bss zeros are not in the output. The bootstrap is expected to zero those.
There is no information like where in memory this data from this file goes, etc. And if you think a bit about it what if I have one byte at a section I define goes to 0x00000000 and one byte at a section I define goes to 0x80000000 and output this file, yes that is a 0x80000001 byte file even though there are only two useful bytes of relevant information. A 2GB file to hold two bytes. This is why you don't want to output this file format until you have sorted out your linker script and tools.
Same data and two other equally old school formats with a little history of intel vs motorola
arm-none-eabi-objcopy so.elf so.hex -O ihex
cat so.hex
:08100000010080E21EFF2FE158
:0410080007000000DD
:0420000005000000D7
:0400000300001000E9
:00000001FF
arm-none-eabi-objcopy so.elf so.srec -O srec
cat so.srec
S00A0000736F2E7372656338
S10B1000010080E21EFF2FE154
S107100807000000D9
S107200005000000D3
S9031000EC
now these contain the relevant bytes, plus addresses, but not much other information, takes more than two bytes for every byte of data, but compared to a huge file with padding, a worthy trade-off. Both of these formats can be found in use today, not as much as the old days but still there.
And countless other binary file formats and a tool like objdump has a decent list of formats it can generate as well as other linkers and/or tools out there.
What is relevant about all of this is that there is a binary file format of some form that contains the bytes we need to run the program.
What format and what addresses you might ask...That is part of the operating system or the system design. In the case of Windows there are specific file formats and variations perhaps of those formats that are supported by the windows operating system, the specific version you are using. Windows has determined what the address space looks like. Operating systems like this take advantage of the MMU both for virtualizing addresses and protection. Having a virtual address space means every program can live in the same space. All programs can have an address that is zero based for example....
test.c
int main ( void )
{
return 1;
}
hello.c
int main ( void )
{
return 2;
}
gcc test.c -o test
objdump -D test
Disassembly of section .text:
00000000004003e0 <_start>:
4003e0: 31 ed xor %ebp,%ebp
4003e2: 49 89 d1 mov %rdx,%r9
4003e5: 5e pop %rsi
...
gcc hello.c -o hello
objdump -D hello
Disassembly of section .text:
00000000004003e0 <_start>:
4003e0: 31 ed xor %ebp,%ebp
4003e2: 49 89 d1 mov %rdx,%r9
same address, how is that possible won't they sit on top of each other? no virtual machine. And note this is built for a specific Linux on a specific day, etc. The toolchain has a default linker script (notice I didn't specify how to link) for this platform when the compiler was built for this target/platform.
arm-none-eabi-gcc -O2 test.c -c -o test.o
arm-none-eabi-ld test.o -o test.elf
arm-none-eabi-ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000
arm-none-eabi-objdump -D test.elf
test.elf: file format elf32-littlearm
Disassembly of section .text:
00008000 <main>:
8000: e3a00001 mov r0, #1
8004: e12fff1e bx lr
same source code, same compiler, built for a different target and system different address.
So for Windows there are definitely going to be rules for the supported binary formats and rules for address spaces that can be used, how to define those spaces in the file.
Then it is a simple matter of the operating systems launcher to read the binary file and put the loadable items into memory at those addresses (in the virtual space that the os has created for this specific program) It is very possible that a feature of the loader is to zero bss for you since the information is there. The low level programmer needs to know that to possibly deal with zeroing .bss or not.
If not you will see and may need to create a solution, unfortunately this is where you get deeper into tool specific items. While C may be somewhat standardized there are tool specific things that are not or at least are standardized by the tool/authors but no reason to assume those cross over to other tools.
.globl _start
_start:
ldr sp,sp_init
bl fun
b .
.word __bss_start__
.word __bss_end__
sp_init:
.word 0x8000
Everything about assembly language is tool specific, the mnemonics for sanity reasons no doubt will resemble the ip/processor vendors documentation which uses syntax that the tool they paid to have developed uses. But beyond that assembly language is wholly defined by the tool not the target, x86 because of its age and other things is really bad about that and this is not the Intel vs AT&T thing, just in general. Gnu assembler is well known for I would assume perhaps intentionally not making compatible languages with other assembly languages. The above is gnu assembler for arm.
Using the fun() function above, C says it should be main() but the tool doesn't care I am already typing enough here.
add a simple ram based linker script
MEMORY
{
ram : ORIGIN = 0x1000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : {
__bss_start__ = .;
*(.bss*)
} > ram
__bss_end__ = .;
}
build it all
arm-none-eabi-as start.s -o start.o
arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-ld -T sram.ld start.o so.o -o so.elf
examine
arm-none-eabi-nm so.elf
0000102c B __bss_end__
00001028 B __bss_start__
00001018 T fun
00001014 t sp_init
00001000 T _start
00001028 B x
00001024 D y
00001020 R z
arm-none-eabi-objdump -D so.elf
so.elf: file format elf32-littlearm
Disassembly of section .text:
00001000 <_start>:
1000: e59fd00c ldr sp, [pc, #12] ; 1014 <sp_init>
1004: eb000003 bl 1018 <fun>
1008: eafffffe b 1008 <_start+0x8>
100c: 00001028 andeq r1, r0, r8, lsr #32
1010: 0000102c andeq r1, r0, r12, lsr #32
00001014 <sp_init>:
1014: 00008000 andeq r8, r0, r0
00001018 <fun>:
1018: e2800001 add r0, r0, #1
101c: e12fff1e bx lr
Disassembly of section .rodata:
00001020 <z>:
1020: 00000007 andeq r0, r0, r7
Disassembly of section .data:
00001024 <y>:
1024: 00000005 andeq r0, r0, r5
Disassembly of section .bss:
00001028 <x>:
1028: 00000000 andeq r0, r0, r0
So now it is possible to add to the bootstrap a memory zeroing loop (do not use C/memset you don't create chicken and egg problems you write the bootstrap in asm) based on the start and end addresses.
Fortunately or unfortunately because the linker script is tool specific and assembly language is tool specific and they need to work together if you are letting the tools do the work for you (the sane way to do it, have fun figuring out where .bss is otherwise).
This can be done on an operating system but when you get into say microcontrollers where it all has to be on non-volatile storage (flash) well it is possible to have one that is downloaded from elsewhere (like your mouse firmware sometimes, sometimes keyboard, etc) into ram, assume flash, so how do you deal with .data??
MEMORY
{
rom : ORIGIN = 0x0000, LENGTH = 0x1000
ram : ORIGIN = 0x1000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > rom
.data : {
*(.data*)
} > ram AT > rom
.bss : {
__bss_start__ = .;
*(.bss*)
} > ram
__bss_end__ = .;
}
With gnu ld this basically says that .data's home is in ram, but the output binary formats will put it in flash/rom
so.elf so.srec -O srec
cat so.srec
S00A0000736F2E7372656338
S11300000CD09FE5030000EBFEFFFFEA04100000A4
S11300100810000000800000010080E21EFF2FE1B4
S107002007000000D1 <- z variable at address 0020
S107002405000000CF <- y variable at 0024
S9030000FC
and you have to play with the linker script more to get the tool to tell you both the ram and flash starting addresses and ending addresses or length. then add code in the bootstrap (asm not C) to copy .data from flash to ram.
Also note here per another one of your many questions.
.word __bss_start__
.word __bss_end__
sp_init:
.word 0x8000
These items are technically data. but they live in .text first and foremost because they were defined in the code that was assumed to be .text (I didn't need to state that in the asm, but could have). you will see this in x86 as well, but for fixed length like arm, mips, risc-v, etc where you cant put any old immediate/constant/linked value you want in the instruction itself you put it nearby in a "pool" and do a pc relative read to get it. You will see this for linking externals too:
extern unsigned int x;
int main ( void )
{
return x;
}
arm-none-eabi-gcc -O2 -c test.c -o test.o
arm-none-eabi-objdump -D test.o
test.o: file format elf32-littlearm
Disassembly of section .text.startup:
00000000 <main>:
0: e59f3004 ldr r3, [pc, #4] ; c <main+0xc>
4: e5930000 ldr r0, [r3]
8: e12fff1e bx lr
c: 00000000 andeq r0, r0, r0 <--- the code gets the address of the
variable from here and then reads it from memory
once linked
Disassembly of section .text:
00008000 <main>:
8000: e59f3004 ldr r3, [pc, #4] ; 800c <main+0xc>
8004: e5930000 ldr r0, [r3]
8008: e12fff1e bx lr
800c: 00018010 andeq r8, r1, r0, lsl r0
Disassembly of section .data:
00018010 <x>:
18010: 00000005 andeq r0, r0, r5
for x86
gcc -c -O2 test.c -o test.o
dwelch-desktop so # objdump -D test.o
test.o: file format elf64-x86-64
Disassembly of section .text.startup:
0000000000000000 <main>:
0: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 6 <main+0x6>
6: c3 retq
00000000004003e0 <main>:
4003e0: 8b 05 4a 0c 20 00 mov 0x200c4a(%rip),%eax # 601030 <x>
4003e6: c3 retq
If you squint is it really different? there is data nearby that the processor reads to load into a register and or use. either way, due to the nature of the instruction sets the linker modifies the instruction or nearby pool data or both.
last one:
arm-none-eabi-gcc -S test.c
cat test.s
.cpu arm7tdmi
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 1
.eabi_attribute 30, 6
.eabi_attribute 34, 0
.eabi_attribute 18, 4
.file "test.c"
.text
.align 2
.global main
.arch armv4t
.syntax unified
.arm
.fpu softvfp
.type main, %function
main:
# Function supports interworking.
# args = 0, pretend = 0, frame = 0
# frame_needed = 1, uses_anonymous_args = 0
# link register save eliminated.
str fp, [sp, #-4]!
add fp, sp, #0
ldr r3, .L3
ldr r3, [r3]
mov r0, r3
add sp, fp, #0
# sp needed
ldr fp, [sp], #4
bx lr
.L4:
.align 2
.L3:
.word x
.size main, .-main
.ident "GCC: (GNU) 9.3.0"
So can you see the assembly language, yes some tools will let you save the intermediate files and/or let you generate the assembly output of the file when compiling.
Can you have data in the code, yes there are times and reasons to have data values in the .text area not just target specific you will see this for various reasons and some toolchains put read only data there.
There are many file formats the ones used by modern operating systems have features not just for defining the bytes that make up the machine code and data values but also will include symbols and other debug information.
The file format and memory space for a program is operating system specific not language nor even target specific (Linux, Windows, MacOS on the same laptop are not expected to have the same rules despite the exact same target computer). A native toolchain for that platform has a default linker script and whatever other information required to build usable/working programs for that target. Including the supported file format.
The machine code and data items can be represented in different file formats in different ways, whether or not the operating system or loader of the target system can use that format depends on that target system.
Programs have bugs and nuances. File formats have versions and inconsistencies, you might find some elf file format reader only to find it doesn't work or prints out strange stuff when fed a perfectly good elf file that works on some system. Why are some flags being set? Perhaps those bytes got re-used or the flag to repurposed or the data structure changed or a tool is using it differently or in a non-standard way (think mov 20h,ax) and another tool that is not compatible can't understand or gets lucky and gets close enough.
Asking "why" questions at Stack Overflow is not very useful, the odds of finding the individual that wrote the thing are very low, better odds of asking the place you got the tool from and following that hoping the person is still alive and willing to be bothered. And 99.999(lots of 9s)% there is no global set of godly rules that the thing was written under/for. General it was some dude just felt like it that is why they did what they did, no real reason, laziness, a bug, intentionally trying to break someone else's tool. All the way up to a large committee of people with an opinion voted on it on a particular day in a particular room and that's why (and we know what we get when we design by committee or try to write specs that nobody conforms to).
I know you are on Windows and I don't have a Windows machine handy and am on Linux. But the gnu/binutils and clang/llvm tools are readily available and have a rich set of tools like readelf, nm, objdump, etc. That assist in examining things, a good tool is going to have that at least internally for the developers so they can debug the output of the tool to a certain quality level. gnu folks made tools and made them available for everyone, and while it takes time to sort through them and their features they are very powerful for the things you are trying to understand.
You are NOT going to find a good x86 disassembler, they are all crap simply because of the nature of the beast. It is a variable length instruction set, so by definition unless you are executing you cant sort it out correctly. You must disassemble in execution order from a known good entry point to have half a chance, and then for various reasons there are code paths you cannot see that way (think jump tables for example, or dlls or so files). The BEST solution is to have a very accurate/perfect emulator/simulator and run the code and perform all the actions/gyrations you need to do to get it to cover all the code paths, and have that tool record instructions from data and where each is located or each linear section without a branch.
The good side of this is that a lot of code is compiled today using tools that are not trying to hide anything. In the old days for various reasons you would see hand written asm that intentionally tried to prevent disassembly or due to other factors (hand editing a binary rom image for a video game the day before the trade show, go disassemble some of the classic roms).
mov r0,#0
cmp r0,#0
jz somewhere
.word 0x12345678
A disassembler isn't going to figure this out, some might add a case for that then
mov r0,#0
nop
nop
xor r0,#1
nop
nop
xor r0,#3
xor r0,#2
cmp r0,#0
jz somewhere
.word 0x12345678
and it thinks that data is an instruction, for variable length that is super hard for a disassembler to resolve a decent one will at least detect collisions where the non opcode part of the instruction is branched to and/or an opcode part of an instruction shows up later as additional bytes in some other instruction. The tool cant resolve it a human has to.
Even with arm and mips and having 32 and 16 bit instructions, risc-v with variable sized instructions, etc...
Very often gnu's disassembler will get tripped up with x86.

I don't think I'll be able to answer to everything. I am a beginner too so I may say some things not exact. But, I'll try my best and I think I can bring you some things.
No, compilers do not put data in code sections (correct me if I am wrong). There is the section .data (for initialized data) and section .bss (for uninitialized data).
I think, I'll better show you an example of a program which prints hello world (for linux because it's much simpler and I don't know how to do with windows. in x64 but it's like x86. Just the names of the syscalls and the registers that are different. x64 is for 64 bits and x86 for 32 bits).
BITS 64 ;not obligatory but I prefer
section .data
msg db "hello world" ;the message
len equ $-msg ;the length of msg
section .text
global _start
_start: ;the entry point
mov rax, 1 ;syscall 1 to print something
mov rdi, 1 ;1 for stdout
mov rsi, msg ;the message
mov rdx, len ;length in rdx
syscall
mov rax, 60 ;exit syscall
mov rdi, 0 ;exit with 0
syscall
(https://tio.run/#assembly-nasm if you don't want to use a VM. I advise you to look for WSL + vscode if you are using windows. you will have linux in your windows and vscode has an extension to have an access to the files in windows) but
If you wanna disassemble the code or see what is the memory, you can use gdb or radare2 in linux. For windows, there are other tools such as ghidra, IDA, olly dbg..
I don't know any way to make the compiler create a better assembly code. but it doesn't mean it doesn't exist.
I have never made anything for windows. However, to link my object file, I use ld (I don't know if it will be helpful).
ld object.o -o compiledprogram
I don't have time right now to continue writing so I can't advise you any courses right now.. I'll see later.
Hope it has helped you.

Answers to questions in your text:
1. You can see process execution step by step and process memory with debugger. I used OllyDbg for learning assembly, it's free and powerful debugger.
2. Process is loaded by Windows kernel after calling NtCreateUserProcess so I think that you would need kernel debugging to see how it is done.
3. Code that is debugged in OllyDbg is automatically disassembled.
4. You can put read-only data in ".text" section. You can change section flags to make it writable, then code and data can be mixed. Some compilers may merge ".text" and ".rdata" sections.
I would recommend that you read about PE imports, exports, relocations and resources in that order. If you want to see easiest possible i386 PE helloworld you can check my hello_world_pe_i386_dynamic.exe program here: https://github.com/pajacol/hello-world. I wrote it entirely in binary file editor. It contains only required data structures. This executable is position independent and can be loaded at any address without relocations.

gcc x86-32 stack alignment and calling printf

To the best of my knowledge, x86-64 requires the stack to be 16-byte aligned before a call, while gcc with -m32 doesn't require this for main.
I have the following testing code:
.data
intfmt: .string "int: %d\n"
testint: .int 20
.text
.globl main
main:
mov %esp, %ebp
push testint
push $intfmt
call printf
mov %ebp, %esp
ret
Build with as --32 test.S -o test.o && gcc -m32 test.o -o test. I am aware that syscall write exists, but to my knowledge it cannot print ints and floats the way printf can.
After entering main, a 4 byte return address is on the stack. Then interpreting this code naively, the two push calls each put 4 bytes on the stack, so call needs another 4 byte value pushed to be aligned.
Here is the objdump of the binary generated by gas and gcc:
0000053d <main>:
53d: 89 e5 mov %esp,%ebp
53f: ff 35 1d 20 00 00 pushl 0x201d
545: 68 14 20 00 00 push $0x2014
54a: e8 fc ff ff ff call 54b <main+0xe>
54f: 89 ec mov %ebp,%esp
551: c3 ret
552: 66 90 xchg %ax,%ax
554: 66 90 xchg %ax,%ax
556: 66 90 xchg %ax,%ax
558: 66 90 xchg %ax,%ax
55a: 66 90 xchg %ax,%ax
55c: 66 90 xchg %ax,%ax
55e: 66 90 xchg %ax,%ax
I am very confused about the push instructions generated.
If two 4 byte values are pushed, how is alignment achieved?
Why is 0x2014 pushed instead of 0x14? What is 0x201d?
What does call 54b even achieve? Output of hd matches objdump. Why is this different in gdb? Is this the dynamic linker?
B+>│0x5655553d <main> mov %esp,%ebp │
│0x5655553f <main+2> pushl 0x5655701d │
│0x56555545 <main+8> push $0x56557014 │
│0x5655554a <main+13> call 0xf7e222d0 <printf> │
│0x5655554f <main+18> mov %ebp,%esp │
│0x56555551 <main+20> ret
Resources on what goes on when a binary is actually executed are appreciated, since I don't know what's actually going on and the tutorials I've read don't cover it. I'm in the process of reading through How programs get run: ELF binaries.

The i386 System V ABI does guarantee / require 16 byte stack alignment before a call, like I said at the top of my answer that you linked. (Unless you're calling a private helper function, in which case you can make up your own rules for alignment, arg-passing, and which registers are clobbered for that function.)
Functions are allowed to crash or misbehave if you violate this ABI requirement, but are not required to. e.g. scanf in x86-64 Ubuntu glibc (as compiled by recent gcc) only recently started doing that: scanf Segmentation faults when called from a function that doesn't change RSP
Functions can depend on stack alignment for performance (to align a double or array of doubles to avoid cache-line splits when accessing them).
Usually the only case where a function depends on stack alignment for correctness is when compiled to use SSE/SSE2, so it can use 16-byte alignment-required loads/stores to copy a struct or array (movaps or movdqa), or to actually auto-vectorize a loop over a local array.
I think Ubuntu doesn't compile their 32-bit libraries with SSE (except functions like memcpy that use runtime dispatching), so they can still work on ancient CPUs like Pentium II. Multiarch libraries on an x86-64 system should assume SSE2, but with 4-byte pointers it's less likely that 32-bit functions would have 16 byte structs to copy.
Anyway, whatever the reason, obviously printf in your 32-bit build of glibc doesn't actually depend on 16-byte stack alignment for correctness, so it doesn't fault even when you misalign the stack.
Why is 0x2014 pushed instead of 0x14? What is 0x201d?
0x14 (decimal 20) is the value in memory at that location. It will be loaded at runtime, because you used push r/m32, not push $20 (or an assemble time constant like .equ testint, 20 or testint = 20).
You used gcc -m32 to make a PIE (Position Independent Executable), which is relocated at runtime, because that's the default on Ubuntu's gcc.
0x2014 is the offset relative to the start of the file. If you disassemble at runtime after running the program, you'll see a real address.
Same for call 54b. It's presuambly a call to the PLT (which is near the start of the file / text segment, hence the low address).
If you disassembled with objdump -drwC, you'd see symbol relocation info. (I like -Mintel as well, but beware it's MASM-like, not NASM).
You can link with gcc -m32 -no-pie to make classic position-dependent executables. I'd definitely recommend that especially for 32-bit code, and especially if you're compiling C, use gcc -m32 -no-pie -fno-pie to get non-PIE code-gen as well as linking into a non-PIE executable. (see 32-bit absolute addresses no longer allowed in x86-64 Linux? for more about PIEs.)

How to access the PC pointer (using assembly) in AVR-libc?

I am trying to write some conditional jumps in AVR assembly using AVR-gcc. According to AVR instruction set manual, the brxx instructions take in an operand k, and jumps to PC+k+1. Also, according to the tutorial PDF from http://www.avrbeginners.net/new/tutorials/jumps-calls-and-the-stack/, I should be able to use the PC operand to jump like this:
brne PC+2
However, when I write such test code:
#include <avr/io.h>
.section .text
.global main ; Note [5]
main:
sbi _SFR_IO_ADDR(DDRA), PA0
sbi _SFR_IO_ADDR(PORTA), PA0
ldi 16, 0xFF
cpi 16, 0xFF
breq PC + 2
cbi _SFR_IO_ADDR(PORTA), PA0
rjmp end
end:
rjmp end
I get this error:
avr-gcc -mmcu="atmega16" -DF_CPU="16000000UL" -O0 main.S -o main.o
/tmp/ccAa2ySf.o: In function `main':
(.text+0x8): undefined reference to `PC'
collect2: ld returned 1 exit status
make: *** [main.o] Error 1
Apparently PC is not defined in AVR-libc. Then how am I going to do such condition branch? Thanks!
Update 1
I found this question How can I jump relative to the PC using the gnu assembler for AVR? and found that the syntax for gnu as is breq .+2. However, I get the same error as that question have. When I disassemble using avr-objdump -d main.o, I do get
74: 01 f0 breq .+0 ; 0x76
Which is the same symptom as that question. I will try using linker script, but I have no experience in that.
Update 2
Actually I found that if I use even numbers in the breq instruction, like breq .+2 or breq .+4, the objdump shows correct result. However, if I use odd numbers, it will become breq .+0. Can someone explain why?

OK, the answer is totally rewritten now. This is what I understand from the objdump of compiled C codes. Firstly, binutils uses byte addressing, not word addressing, for the program counter, and starts at the instruction right after the current one. This is explained in the following code:
#include <avr/io.h>
.section .text
.global main
main:
sbi _SFR_IO_ADDR(DDRA), PA0
sbi _SFR_IO_ADDR(PORTA), PA0
ldi 16, 0xFF
cpi 16, 0xFF
breq .+4 ;; If we are executing here
cbi _SFR_IO_ADDR(PORTA), PA0 ;; This is .+0, will be skipped
cbi _SFR_IO_ADDR(PORTA), PA0 ;; This is .+2, will be skipped
cbi _SFR_IO_ADDR(PORTA), PA0 ;; This is .+4, which will be executed
rjmp end
end:
rjmp end
Apparently, the PC width has nothing to do with relative address. It only affects the maximum PC value, either 0xFF or 0xFFF, so no matter what AVR platform I am compiling for, binutils uses two bytes for an instruction.
P.S. I think, if the only way I can know how a compiler works is to observe how it works, probably that means poor documentation? Or maybe I just don't know when to start. If someone see this, could you help pointing some useful books about 'this kind of things'? (I don't even know how to describe it) Thanks!

An 8-bit MCU does not mean the assembly instructions are encoded as an 8-bit opcode. From the ATmega16 specification. Most AVR instructions have a single 16-bit word format. On the contratry even if the ATmega are 8-bits MCUs the instructions used are encoded as 16-bits opcodes. Look at the "AVR Instruction Set". This is the reason the program counter (PC) behaves as such (only assignments to 16-bit/2-byte aligned addresses). If it were able to be set to an 8-bit/1-byte aligned address it will try to execute an invalid opcode! Here's a thing for you to do. Compile your example above to an object file. Then disassemble the file (use objdump -D) and look at the generated disassembly. The offsets of the instructions should be 16-bit aligned.

Then how am I going to do such condition branch?
Just define a label and branch to it. The assembler will calculate the offset for you!
brne some_label2
; code1
some_label2:
; code2
In the case when the branch target is out of reach, do a jumpity-jump on the reversed condition:
breq some_label1
[r]jmp some_labe2
some_label1:
; code1
some_label2:
; code2
The GNU assembler also supports a special kind of labels, which is just some number, and you can use the same label more than once. The jump target is the first to be found in forward direction resp. backward direction:
1:
; code 1
brne 1b ; jump to label 1 above (backwards)
brcc 1f ; jump to label 1 below (forwards)
; code 2
1:
This might be useful when you are writing assembly macros that contain local labels.
Specifically to be used in assembly macros, there is also pseudo variable \# which is increased with every macro use, and thus can also be used to declare labels without conflicts:
.macro loop reg
.Lloop\#:
dec \reg
brne .Lloop\#
.endm
loop r16
loop r16
How to access the PC Pointer
If you really need the value of the program counter for some obscure reason, you can
rcall .
#ifdef __AVR_3_BYTE_PC__
pop r18
#endif
pop r17
pop r16
and you have the word-address of the code location right after the rcall. Symbol . is the assembler's "current location".
Depending on the situation, it might be easier to just define a label and take the address of it:
main:
ldi r16,lo8(main) ; Byte-address, low byte
ldi r17,hi8(main) ; Byte-address, high byte
ldi r18,hh8(main) ; Byte-address, highest byte
ldi r19,pm_lo8(main) ; Word-address, low byte
ldi r20,pm_hi8(main) ; Word-address, high byte
ldi r21,pm_hh8(main) ; Word-address, highest byte
ldi r22,lo8(gs(main)) ; Word-address where the linker will
ldi r23,hi8(gs(main)) ; generate a stub as needed.

How to write and execute PURE machine code manually without containers like EXE or ELF?

I just need a hello world demo to see how machine code actually works.
Though windows' EXE and linux' ELF is near machine code,but it's not PURE
How can I write/execute PURE machine code?

You can write in PURE machine code manually WITHOUT ASSEMBLY
Linux/ELF: https://github.com/XlogicX/m2elf. This is still a work in progress, I just started working on this yesterday.
Source file for "Hello World" would look like this:
b8 21 0a 00 00 #moving "!\n" into eax
a3 0c 10 00 06 #moving eax into first memory location
b8 6f 72 6c 64 #moving "orld" into eax
a3 08 10 00 06 #moving eax into next memory location
b8 6f 2c 20 57 #moving "o, W" into eax
a3 04 10 00 06 #moving eax into next memory location
b8 48 65 6c 6c #moving "Hell" into eax
a3 00 10 00 06 #moving eax into next memory location
b9 00 10 00 06 #moving pointer to start of memory location into ecx
ba 10 00 00 00 #moving string size into edx
bb 01 00 00 00 #moving "stdout" number to ebx
b8 04 00 00 00 #moving "print out" syscall number to eax
cd 80 #calling the linux kernel to execute our print to stdout
b8 01 00 00 00 #moving "sys_exit" call number to eax
cd 80 #executing it via linux sys_call
WIN/MZ/PE:
shellcode2exe.py (takes asciihex shellcode and creates a legit MZ PE exe file) script location:
https://web.archive.org/web/20140725045200/http://zeltser.com/reverse-malware/shellcode2exe.py.txt
dependency:
https://github.com/radare/toys/tree/master/InlineEgg
extract
python setup.py build
sudo python setup.py install

Real Machine Code
What you need to run the test: Linux x86 or x64 (in my case I am using Ubuntu x64)
Let's Start
This Assembly (x86) moves the value 666 into the eax register:
movl $666, %eax
ret
Let's make the binary representation of it:
Opcode movl (movl is a mov with operand size 32) in binary is = 1011
Instruction width in binary is = 1
Register eax in binary is = 000
Number 666 in signed 32 bits binary is = 00000000 00000000 00000010 10011010
666 converted to little endian is = 10011010 00000010 00000000 00000000
Instruction ret (return) in binary is = 11000011
So finally our pure binary instructions will look like this:
1011(movl)1(width)000(eax)10011010000000100000000000000000(666)
11000011(ret)
Putting it all together:
1011100010011010000000100000000000000000
11000011
For executing it the binary code has to be placed in a memory page with execution privileges, we can do that using the following C code:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
/* Allocate size bytes of executable memory. */
unsigned char *alloc_exec_mem(size_t size)
{
void *ptr;
ptr = mmap(0, size, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANON, -1, 0);
if (ptr == MAP_FAILED) {
perror("mmap");
exit(1);
}
return ptr;
}
/* Read up to buffer_size bytes, encoded as 1's and 0's, into buffer. */
void read_ones_and_zeros(unsigned char *buffer, size_t buffer_size)
{
unsigned char byte = 0;
int bit_index = 0;
int c;
while ((c = getchar()) != EOF) {
if (isspace(c)) {
continue;
} else if (c != '0' && c != '1') {
fprintf(stderr, "error: expected 1 or 0!\n");
exit(1);
}
byte = (byte << 1) | (c == '1');
bit_index++;
if (bit_index == 8) {
if (buffer_size == 0) {
fprintf(stderr, "error: buffer full!\n");
exit(1);
}
*buffer++ = byte;
--buffer_size;
byte = 0;
bit_index = 0;
}
}
if (bit_index != 0) {
fprintf(stderr, "error: left-over bits!\n");
exit(1);
}
}
int main()
{
typedef int (*func_ptr_t)(void);
func_ptr_t func;
unsigned char *mem;
int x;
mem = alloc_exec_mem(1024);
func = (func_ptr_t) mem;
read_ones_and_zeros(mem, 1024);
x = (*func)();
printf("function returned %d\n", x);
return 0;
}
Source: https://www.hanshq.net/files/ones-and-zeros_42.c
We can compile it using:
gcc source.c -o binaryexec
To execute it:
./binaryexec
Then we pass the first sets of instructions:
1011100010011010000000100000000000000000
press enter
and pass the return instruction:
11000011
press enter
finally ctrl+d to end the program and get the output:
function returned 666

Everyone knows that the application we usually wrote is run on the operating system. And managed by it.
It means that the operating system is run on the machine. So I think that is PURE machine code which you said.
So, you need to study how an operating system works.
Here is some NASM assembly code for a boot sector which can print "Hello world" in PURE.
org
xor ax, ax
mov ds, ax
mov si, msg
boot_loop:lodsb
or al, al
jz go_flag
mov ah, 0x0E
int 0x10
jmp boot_loop
go_flag:
jmp go_flag
msg db 'hello world', 13, 10, 0
times 510-($-$$) db 0
db 0x55
db 0xAA
And you can find more resources here: http://wiki.osdev.org/Main_Page.
END.
If you had installed nasm and had a floppy, You can
nasm boot.asm -f bin -o boot.bin
dd if=boot.bin of=/dev/fd0
Then, you can boot from this floppy and you will see the message.
(NOTE: you should make the first boot of your computer the floppy.)
In fact, I suggest you run that code in full virtual machine, like: bochs, virtualbox etc.
Because it is hard to find a machines with a floppy.
So, the steps are
First, you should need to install a full virtual machine.
Second, create a visual floppy by commend: bximage
Third, write bin file to that visual floppy.
Last, start your visual machine from that visual floppy.
NOTE: In https://wiki.osdev.org , there are some basic information about that topic.

It sounds like you're looking for the old 16-bit DOS .COM file format. The bytes of a .COM file are loaded at offset 100h in the program segment (limiting them to a maximum size of 64k - 256 bytes), and the CPU simply started executing at offset 100h. There are no headers or any required information of any kind, just raw CPU instructions.

The OS is not running the instructions, the CPU does (except if we're talking about a virtual machine OS, which do exist, I'm thinking about Forth or such things). The OS however does require some metainformation to know, that a file does in fact contain executable code, and how it expects its environment to look like. ELF is not just near machine code. It is machine code, together with some information for the OS to know that it's supposed to put the CPU to actually execute that thing.
If you want something simpler than ELF but *nix, have a look at the a.out format, which is much simpler. Traditionally *nix C compilers do (still) write their executable to a file called a.out, if no output name is specified.

The next program is an Hello World program I wrote in Machine Code 16 bit (intel 8086), If you want to know machine code, I suggest that you learn Assembly first, because every line of code in Assembly is converted to A code line in Machine Code. For well I know I am from the few people in the world, still programming in Machine Code, instead of Assembly.
BTW, To run it, save the file with a ".com" extension and run on DOSBOX!
So, this is an Hello World Program.

When targeting an embedded system you can make a binary image of the rom or ram that is strictly the instructions and associated data from the program. And often can write that binary into a flash/rom and run it.
Operating systems want to know more than that, and developers often want to leave more than that in their file so they can debug or do other things with it later (disassemble with some recognizable symbol names). Also, embedded or on an operating system you may need to separate .text from .data from .bss from .rodata, etc and file formats like .elf provide a mechanism for that, and the preferred use case is to load that elf with some sort of loader be it the operating system or something programming the rom and ram of a microcontroller.
.exe has some header info as well. As mentioned .com didnt it loaded at address 0x100h and branched there.
to create a raw binary from an executable, with a gcc created elf file for example you can do something like
objcopy file.elf -O binary file.bin
If the program is segmented (.text, .data, etc) and those segments are not back to back the binary can get quite large. Again using embedded as an example if the rom is at 0x00000000 and data or bss is at 0x20000000 even if your program only has 4 bytes of data objcopy will create a 0x20000004 byte file filling in the gap between .text and .data (as it should because that is what you asked it to do).
What is it you are trying to do? Reading a elf or intel hex or srec file are quite trivial and from that you can see all the bits and bytes of the binary. Or disassembling the elf or whatever will also show you that in a human readable form. (objdump -D file.elf > file.list)

With pure machine code, you can use any language that has an ability to write files.
even visual basic.net can write 8,16,32,64 bit while interchanging between the int types while it writes.
You can even set up to have vb write out machine code in a loop as needed
for something like setpixel, where x,y changes and you have your argb colors.
or, create your vb.net program regularly in windows, and use NGEN.exe to make a native code file of your program. It creates pure machine code specific to ia-32 all in one shot throwing the JIT debugger aside.

This are nice responses, but why someone would want to do this might guide the answer better. I think the most important reason is to get full control of their machine, especially over its cache writing, for maximum performance, and prevent any OS from sharing the processor or virtualizing your code (thus slowing it down) or especially in these days snooping on your code as well. As far as I can tell, assembler doesn't handle these issues and M$/Intel and other companies treat this like an infringement or "for hackers." This is very wrong headed however. If your assembler code is handed over to an OS or proprietary hardware, true optimization (potentially at GHz frequencies) will be out of reach. This is an very important issue with regards to science and technology, as our computers cannot be used to their full potential without hardware optimization, and are often computing several orders of magnitude below it. There probably is some workaround or some open-source hardware that enables this but I have yet to find it. Penny for anyones thoughts.

On Windows--at least 32bit Windows--you can execute RAW INSTRUCTIONS using a .com file.
For instance, if you take this string and save it in notepad with a .com extension:
X5O!P%#AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*
It will print a string and set off your antivirus software.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio